A modular Retrieval-Augmented Generation (RAG) repository with swappable pipelines and persistent vector stores (ChromaDB). Supports:
- Basic text RAG (GROQ)
- Multi-modal RAG (text + images via CLIP + GPT‑4.1)
- LangGraph RAG (two-node graph: retrieve → generate)
- rag-ubac: Role-based access control (UBAC) RAG
- Cache-RAG: Intelligent caching of Q&A pairs with LangGraph orchestration
- Agentic-RAG: Tool-using LangGraph ReAct pipeline with resume retriever, web search, and currency conversion
- HyDE RAG: Hypothetical Document Embedding (HyDE) pipeline — generate a concise hypothetical document that answers the query, retrieve relevant documents using the HyDE embedding, then produce a grounded final answer using the retrieved context.
- PDF ingestion and chunking
- Persistent vector storage (Chroma PersistentClient) with per-type collections:
basic_rag_collection,multi_modal_collection,langgraph_collection,agentic_rag_collection,cache_rag_collection,rag_ubac_collection,hyde_collection.
- Modular retrievers, prompts, and pipelines
- GROQ LLM for basic RAG; OpenAI GPT‑4.1 for multi‑modal; GROQ for LangGraph
- CLI for vectorizing, querying, inspecting, listing, and deleting collections
- Grounded prompts to reduce hallucinations
Repository
- Quick start
- From the repo root, run the setup script to create data folders, a virtual environment (
.venv), and install the repo in editable mode:bash ./setup.sh source .venv/bin/activate
- Data layout
- Place your PDFs in:
data/source_data/basic-rag/data/source_data/multi-modal/data/source_data/langgraph/data/source_data/rag-ubac/data/source_data/agentic-rag/data/source_data/hyde/
- Environment (.env)
- GROQ (basic, langgraph):
GROQ_API_KEY=your_groq_key_here - SerpAPI (agentic web search tool) & Exchangerate API (you can use other preferred APIs as well):
SERPAPI_API_KEY=your_serpapi_key_here EXCHANGE_RATE_API=your_api_key - OpenAI (multi‑modal):
OPENAI_API_KEY=your_openai_key_here - Optional (silence tokenizers warning):
TOKENIZERS_PARALLELISM=false
- Manual install (alternative to setup.sh)
pip install -r requirements.txt
-
Basic RAG
python main.py --rag_type basic-rag python main.py --rag_type basic-rag -v # (re-)vectorize python main.py --rag_type basic-rag --info # collection info -
Multi‑Modal RAG
python main.py --rag_type multi-modal python main.py --rag_type multi-modal -v python main.py --rag_type multi-modal --info -
LangGraph RAG
python main.py --rag_type langgraph python main.py --rag_type langgraph -v python main.py --rag_type langgraph --info -
Agentic-RAG (tool-using agent: 1.resume retriever + 2.web search + 3.currency)
python main.py --rag_type agentic-rag python main.py --rag_type agentic-rag -v # (re-)vectorize resume data python main.py --rag_type agentic-rag --info # collection info -
HyDE RAG
python main.py --rag_type hyde python main.py --rag_type hyde -v # (re-)vectorize HyDE data (creates/updates `hyde_collection`) python main.py --rag_type hyde --info # collection info -
rag-ubac You will be prompted to enter your role (executive/hr/junior). Answers are restricted by role-based access. UBAC uses metadata filters; re-run vectorization after updating FILE_ACCESS_METADATA.
python main.py --rag_type rag-ubac --vectorize python main.py --rag_type rag-ubac python main.py --rag_type rag-ubac --info -
Cache-RAG
python main.py --rag_type cache-rag python main.py --rag_type cache-rag --vectorize python main.py --rag_type cache-rag --clear-cache python main.py --rag_type cache-rag --info -
Manage collections (ChromaDB)
python main.py --rag_type basic-rag --list-collections python main.py --rag_type basic-rag --delete-collection -
Interactive session: type
/exitor/quitto finish.
Data directory is inferred from the RAG type:
data/source_data/{basic-rag | multi-modal | langgraph | rag-ubac | agentic-rag}
- Prompts enforce context-only answers. If no relevant context is retrieved, the system replies: "I am a helpful assitant for you to assist with the internal knowledge base; No related contents retrived for the provided query - Try modifying your query for assistance."
projects/retriever/basic_rag_retriever.pymulti_modal_retriever.pylanggraph_retriever.pyrag_ubac_retriever.pycache_rag_retriever.pyagentic_rag_retriever.py
projects/pipeline/basic_rag_pipeline.pymulti_modal_rag_pipeline.pylanggraph_rag_pipeline.pyrag_ubac_pipeline.pycache_rag_pipeline.pyagentic_rag_pipeline.pyhyde_rag_pipeline.py
projects/prompts/prompts.py(basic)multi_modal_prompts.pylanggraph_prompts.pyhyde_prompts.py
shared/utils/pdf_utils.py(PyMuPDF)chroma_utils.py(PersistentClient, collection helpers)rag_ubac_scripts.py
shared/configs/static.py(FILE_ACCESS_METADATA, VALID_ROLES, RAG_UBAC_TYPE)
shared/tools/web_search_tool.py,currency_converter_tool.py,agentic_retriever_tool.py
- Basic RAG:
docs/tutorials/basic-rag-tutorial.md - Multi‑Modal RAG:
docs/tutorials/multi-modal-rag.md - RAG using Langgraph:
docs/tutorials/langgraph-rag.md - RAG-UBAC tutorial: see
docs/tutorials/rag-ubac-tutorial.md - Cache-RAG:
docs/tutorials/cache-rag-tutorial.md - Agentic-RAG:
docs/tutorials/agentic-rag-tutorial.md