A lightweight local PDF question-answering system built around the RAG (Retrieval-Augmented Generation) workflow. The project demonstrates how to turn unstructured PDF documents into a searchable knowledge base, retrieve relevant context with vector search, optionally rerank results, and generate final answers through an external LLM API.
This project is designed as a concise but complete RAG demo suitable for learning, portfolio presentation, and technical review. It focuses on the full retrieval pipeline rather than product-level complexity.
Core workflow:
- Load local PDF files.
- Split long text into smaller chunks.
- Convert chunks into embeddings with a Hugging Face model.
- Store and retrieve vectors with FAISS.
- Optionally rerank retrieved chunks with a cross-encoder.
- Assemble a grounded prompt from retrieved context.
- Call an OpenAI-compatible external LLM API to generate the final answer.
- Local PDF ingestion with
PyPDFLoader - Configurable chunking strategy
- Vector retrieval powered by FAISS
- Optional two-stage retrieval with reranking
- Streamlit chat-style interface
- External LLM integration through OpenAI-compatible API endpoints
- Environment-variable based API configuration
- Simple structure for fast understanding and demonstration
- Python
- Streamlit
- LangChain
- FAISS
- Hugging Face Embeddings
- Sentence Transformers CrossEncoder
- PyPDF
- Requests
PDF files
-> document loading
-> text chunking
-> embedding generation
-> FAISS vector index
-> similarity retrieval
-> optional cross-encoder reranking
-> prompt assembly
-> external LLM answer generationrag-pdf-qa-system/
├── app.py
├── requirements.txt
├── README.md
├── .gitignore
├── pdfs/ # put your local PDF files here
└── faiss_index/ # generated locally after building the vector index- Demonstrates an end-to-end RAG pipeline rather than isolated API calls
- Separates retrieval and generation clearly, which makes the architecture easy to explain
- Uses practical open-source components commonly seen in real prototyping workflows
- Includes optional reranking, which shows awareness of retrieval quality optimization
- Keeps the implementation compact enough for reviewers to inspect quickly
pip install -r requirements.txtPut one or more PDF files into the pdfs/ directory.
If you want the app to generate final answers through an external LLM, set these environment variables:
RAG_API_BASE=https://your-api-base.example/v1
RAG_API_KEY=your_api_key
RAG_MODEL_NAME=your_model_nameIf these values are not configured, the system can still demonstrate retrieval and prompt assembly.
streamlit run app.py- Start the Streamlit app.
- Select a PDF from the sidebar.
- Build or rebuild the vector index.
- Ask questions in the chat input.
- Inspect retrieved context blocks below the chat area.
The sidebar supports adjusting the main retrieval parameters:
Chunk SizeChunk OverlapTop-K- Embedding model name
- Whether reranking is enabled
- Reranker model name
- Fetch-K for candidate recall
These controls make it easy to compare retrieval strategies during demos.
- What is the main topic of this PDF?
- Summarize the core functions described in the document.
- What installation or configuration steps are mentioned?
- What important parameters, models, or hardware details appear in the document?
faiss_index/is generated locally and should not be committed.pdfs/is intentionally excluded from version control to avoid uploading private documents.- API keys are never stored in the repository.
- This repository is intended as a clean public demo version.
- Support multi-document indexing and source filtering
- Add file upload support in the UI
- Add source citation highlighting in answers
- Persist separate indexes for different document sets
- Introduce evaluation metrics for retrieval quality
This project is provided for learning, demonstration, and portfolio use.