Skip to content

adavid13/ai-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Document AI Agent

A RAG-based, AI-powered, Question & Answer agent that reads Word documents (.docx) and answers questions faithfully, with built-in hallucination detection and monitoring.

Features

  • 📄 Document Ingestion: Load and process multiple Word documents
  • 🔍 Semantic Search: Find relevant content using vector embeddings
  • 🤖 Flexible LLM Backend: Use OpenAI API (cloud) or local models (llama-cpp)
  • Faithfulness Monitoring: Detect and flag potential hallucinations
  • 📊 Citation Tracking: Answers include source citations
  • 🌐 API Service: REST API for integration with other agents

Table of Contents


Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Run with OpenAI (cloud)
export OPENAI_API_KEY=sk-your-key-here
python main.py --cloud --docs ./your_documents/

# OR run with local LLM (no API key needed)
python main.py --local --docs ./your_documents/

Installation

Prerequisites

  • Python 3.9 or higher
  • 8GB+ RAM (16GB+ recommended for local LLM)
  • NVIDIA GPU with CUDA (optional, for faster local inference)

Step 1: Create Virtual Environment

python -m venv venv

# Linux/Mac
source venv/bin/activate

# Windows
venv\Scripts\activate

Step 2: Install Dependencies

pip install -r requirements.txt

Step 3 (Optional): Enable GPU Acceleration for Local LLM

# For NVIDIA GPUs with CUDA 12.1
pip uninstall llama-cpp-python -y
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu121

# For CUDA 11.8
pip install llama-cpp-python --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu118

Usage

Interactive Mode

Ask questions about your documents in an interactive session:

# Using OpenAI (recommended for best quality)
python main.py --cloud --docs ./documents/

# Using local LLM on CPU
python main.py --local --docs ./documents/

# Using local LLM on GPU (faster)
python main.py --local --gpu-layers -1 --docs ./documents/

Interactive Commands

Command Description
<question> Ask a question about the documents
debug <question> Debug retrieval for a question
search <text> Search for text in all chunks
browse Interactive chunk browser
status Show agent status
add <path> Add more documents
validate Run faithfulness validation tests
quit Exit the program

Example Session

$ python main.py --cloud --docs ./documents/

📁 Found 5 .docx files in: ./documents/
📚 Ingesting 5 documents...
✓ Processed: report.docx (45 chunks)
✓ Processed: manual.docx (120 chunks)
...
✓ Added 342 chunks to vector store

Question: What was the revenue in 2024?

📝 Answer:
According to the annual report, the company revenue in 2024 was $5.2 million [Source 1], 
representing a 15% increase from the previous year [Source 1].

📊 Faithfulness: 0.92
   Citation: 0.85
   NLI: 1.00
   Claims: 0.90
   Confidence: 0.95

⏱️ Time: 1250ms

Server Mode

Run as an API service for other agents to consume:

# Start server
python main.py --mode server --cloud --port 8000

# Or with local LLM
python main.py --mode server --local --gpu-layers -1 --port 8000

The API will be available at http://localhost:8000.


Configuration

Command Line Arguments

Argument Description Default
--mode Run mode: interactive or server interactive
--cloud Use OpenAI API -
--local Use local LLM (llama-cpp) -
--docs Path to documents (directory, pattern, or file) -
--gpu-layers GPU layers for local LLM (0=CPU, -1=all) 0
--model-repo HuggingFace repo for local model See config.py
--model-file GGUF filename for local model See config.py
--port Server port 8000
--host Server host 0.0.0.0

Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API key (required for cloud mode)
OPENAI_MODEL OpenAI model name (default: gpt-5.2)
LLM_BACKEND Default backend: openai or local
LOCAL_MODEL_REPO HuggingFace repo ID for local model
LOCAL_MODEL_FILE GGUF filename
GPU_LAYERS Default GPU layers
EMBEDDING_MODEL Sentence transformer model for embeddings

API Reference

Endpoints

GET /health

Health check endpoint.

curl http://localhost:8000/health

Response:

{"status": "healthy", "llm": "Cloud (OpenAI: gpt-5.2)"}

GET /status

Get agent status and statistics.

curl http://localhost:8000/status

Response:

{
  "status": "ready",
  "llm_backend": "cloud",
  "llm_description": "Cloud (OpenAI: gpt-5.2)",
  "ingested_documents": 5,
  "total_chunks": 342
}

POST /ingest

Ingest documents into the agent.

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{"file_paths": ["/path/to/doc1.docx", "/path/to/doc2.docx"]}'

Response:

{
  "success": true,
  "documents_ingested": 2,
  "details": [
    {"filename": "doc1.docx", "chunks": 45},
    {"filename": "doc2.docx", "chunks": 67}
  ]
}

POST /query

Query the documents.

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What was the revenue in 2024?", "require_high_confidence": true}'

Response:

{
  "question": "What was the revenue in 2024?",
  "answer": "The company revenue in 2024 was $5.2 million [Source 1]...",
  "sources": [...],
  "faithfulness": {
    "overall_score": 0.92,
    "citation_coverage": 0.85,
    "nli_entailment_score": 1.0,
    "claim_support_ratio": 0.9,
    "confidence_score": 0.95,
    "warnings": []
  },
  "abstained": false,
  "processing_time_ms": 1250.5
}

POST /query/simple

Simple query returning just the answer.

curl -X POST "http://localhost:8000/query/simple?question=What%20is%20the%20revenue"

Response:

{
  "answer": "The company revenue was $5.2 million.",
  "confidence": 0.92,
  "is_reliable": true,
  "source_count": 3
}

Python Client Example

import requests

class DocumentQAClient:
    def __init__(self, base_url="http://localhost:8000"):
        self.base_url = base_url
    
    def query(self, question, require_high_confidence=False):
        response = requests.post(
            f"{self.base_url}/query",
            json={
                "question": question,
                "require_high_confidence": require_high_confidence
            }
        )
        return response.json()

# Usage
client = DocumentQAClient()
result = client.query("What was the revenue in 2024?")

if result["faithfulness"]["overall_score"] >= 0.7:
    print("Answer:", result["answer"])
else:
    print("Low confidence answer - verify manually")

Faithfulness Monitoring

The agent monitors answer faithfulness using multiple strategies:

Metric Description Weight
Citation Coverage Are claims properly cited with [Source N]? 20%
NLI Entailment Does the source text logically support the answer? 30%
Claim Support Are individual claims verified against sources? 30%
Confidence Model's self-assessed confidence 20%

Interpreting Scores

Score Interpretation
≥ 0.8 ✅ High confidence - answer is reliable
0.6 - 0.8 ⚠️ Moderate confidence - review recommended
< 0.6 ❌ Low confidence - likely issues

Warnings

The agent may generate warnings such as:

  • "Low citation coverage - many claims are not cited"
  • "NLI check found potentially unsupported statements"
  • "Some claims could not be verified against sources"
  • "Low model confidence in the answer"

Project Structure

document_qa_agent/
├── main.py                 # Entry point
├── config.py               # Configuration
├── requirements.txt        # Dependencies
├── models/
│   └── schemas.py          # Pydantic data models
├── core/
│   ├── document_processor.py  # Document loading & chunking
│   ├── vector_store.py        # Embeddings & search
│   ├── llm_client.py          # LLM abstraction (OpenAI/local)
│   ├── generator.py           # Answer generation
│   ├── faithfulness.py        # Hallucination detection
│   └── query_rewriter.py      # Query expansion
├── agent/
│   └── qa_agent.py         # Main agent orchestration
├── service/
│   └── api.py              # FastAPI REST service
└── tests/
    └── test_faithfulness.py   # Validation tests

Troubleshooting

"OPENAI_API_KEY not set"

# Set the environment variable
export OPENAI_API_KEY=sk-your-key-here

# Or use local mode instead
python main.py --local --docs ./documents/

"Collection expecting embedding with dimension of X, got Y"

The embedding model changed. Clear the vector store:

rm -rf ./chroma_db

"CUDA out of memory"

Reduce GPU layers:

# Try fewer layers
python main.py --local --gpu-layers 20 --docs ./documents/

# Or use CPU only
python main.py --local --gpu-layers 0 --docs ./documents/

"Cannot connect to Ollama" (if using Ollama)

This project uses llama-cpp-python, not Ollama. Use --local flag:

python main.py --local --docs ./documents/

Local model downloads are slow

Models are cached in ./models/. First run downloads the model (~4-8GB). Subsequent runs use the cache.

JSON parsing errors with Qwen3

Qwen3 has a "thinking mode" that can interfere with JSON output. Use Qwen2.5 instead:

python main.py --local \
  --model-repo Qwen/Qwen2.5-14B-Instruct-GGUF \
  --model-file qwen2.5-14b-instruct-q4_k_m.gguf \
  --docs ./documents/

Running Tests

# Run faithfulness validation
python -m tests.test_faithfulness

# Or in interactive mode
python main.py --cloud --docs ./documents/
> validate

Acknowledgments


---

## `QUICKSTART.md` (Optional - One Page Version)

```markdown
# Quick Start Guide

## 1. Install

```bash
pip install -r requirements.txt

2. Run

Option A: Cloud (OpenAI) - Best Quality

export OPENAI_API_KEY=sk-your-key-here
python main.py --cloud --docs ./your_documents/

Option B: Local - No API Key Needed

# CPU
python main.py --local --docs ./your_documents/

# GPU (faster)
python main.py --local --gpu-layers -1 --docs ./your_documents/

3. Ask Questions

Question: What is the main topic of the documents?

📝 Answer:
The documents primarily discuss...

📊 Faithfulness: 0.89

Commands

  • Type a question to ask
  • debug <question> - Debug search
  • search <text> - Find text in documents
  • status - Show stats
  • quit - Exit

API Mode

python main.py --mode server --cloud --port 8000

Then query:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the revenue?"}'

---

## File: `.env.example`

```bash
# Copy this to .env and fill in your values

# OpenAI API (for cloud mode)
OPENAI_API_KEY=sk-your-key-here
OPENAI_MODEL=gpt-5.2

# Local LLM settings (for local mode)
LOCAL_MODEL_REPO=Qwen/Qwen2.5-14B-Instruct-GGUF
LOCAL_MODEL_FILE=qwen2.5-14b-instruct-q4_k_m.gguf
GPU_LAYERS=-1

# Embedding model
EMBEDDING_MODEL=nomic-ai/nomic-embed-text-v2-moe

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages