Skip to content

qbtrix/knowledge-base

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Knowledge Base

An LLM-powered knowledge engine that builds structured, searchable wikis from codebases. Inspired by Karpathy's LLM Wiki pattern.

What It Does

Point it at a Python project → it produces a rich, interlinked wiki with architectural context, not just API docs.

kb build ./my-project --output docs/wiki/

Before: "GroupService has 13 methods" After: "GroupService is the business logic layer for chat groups. It handles creation, membership, agent assignment, and DMs. Authorization is owner-based — only the group owner can add/remove members. Uses batch queries to avoid N+1 problems when populating member details."

How It Works

1. SCAN        Python's ast module extracts classes, functions, imports, docstrings
2. GRAPH       Import analysis maps who depends on whom across all files
3. COMPILE     LLM writes rich prose from AST data + raw code (or AST-only fallback)
4. LINK        Import graph creates backlinks between articles
5. INDEX       Concept graph tracks shared entities across articles
6. EXPORT      Markdown files with frontmatter → docs/wiki/ or any directory

The LLM is used only in step 3 — everything else is pure Python. Without an LLM, you still get a functional wiki (just method signatures instead of prose).

Install

uv pip install -e .

# With LLM compilation (recommended)
uv pip install -e ".[anthropic]"

# With search
uv pip install -e ".[search]"

# Everything
uv pip install -e ".[all]"

Quick Start

# Build a wiki from a Python project
kb build ./src/myproject --scope myproject --output docs/wiki/

# Search
kb search "authentication" --scope myproject

# Show a specific article
kb show group_service --scope myproject

# Check wiki health
kb lint --scope myproject

# Stats
kb stats --scope myproject

# Ingest a URL or file
kb ingest https://docs.example.com/guide
kb ingest ./ARCHITECTURE.md

Use as a Library

from knowledge_base import KnowledgeEngine
from knowledge_base.compiler import AnthropicBackend

# With LLM compilation
engine = KnowledgeEngine(
    scope="myproject",
    backend=AnthropicBackend(model="claude-haiku-4-5-20251001"),
)

# Build from codebase
articles = await engine.build_from_code("./src/myproject")

# Search
results = await engine.search("how does auth work")

# Get context for an AI agent's system prompt
context = await engine.search_context("group membership", limit=3, max_chars=4000)

# Health check
issues = await engine.lint()

Bring Your Own LLM

Implement the CompilerBackend protocol:

from knowledge_base.compiler import CompilerBackend

class MyBackend:
    async def complete(self, prompt: str, system_prompt: str = "") -> str:
        # Call your LLM here
        return await my_llm.generate(prompt, system=system_prompt)

engine = KnowledgeEngine(scope="myproject", backend=MyBackend())

What Gets Generated

Each article is a markdown file with JSON frontmatter:

---
{
  "title": "GroupService — Group and Channel Business Logic",
  "summary": "Stateless service for group CRUD, membership, agent assignment, and DMs...",
  "concepts": ["GroupService", "workspace scoping", "Beanie ODM", "event-driven"],
  "categories": ["chat domain", "service layer", "CRUD"],
  "backlinks": ["schemas", "group-model", "errors", "message-service"],
  "word_count": 1332,
  "version": 1
}
---

## Purpose
The group_service module encapsulates all group-related business logic...

## Key Classes and Methods
### GroupService
...

## Authorization and Security
...

## Dependencies and Integration
...

Architecture

knowledge_base/
  __init__.py       KnowledgeEngine — main API
  models.py         RawDoc, WikiArticle, Concept, CodeModule, LintIssue
  store.py          File-based persistence (markdown + JSON)
  compiler.py       CompilerBackend protocol + AnthropicBackend
  code_compiler.py  AST parser, import graph, code-specific prompts
  indexer.py        Concept graph, backlinks, categories
  search.py         BM25 keyword search
  linter.py         LLM-powered health checks
  cli.py            Click CLI (kb command)

Three-Layer Design (Karpathy Pattern)

Layer What Where
Raw Sources Original code files, URLs, text ~/.knowledge-base/{scope}/raw/
Compiled Wiki LLM-written articles with frontmatter ~/.knowledge-base/{scope}/wiki/
Index Concept graph, categories, metadata ~/.knowledge-base/{scope}/index.json

Knowledge Graph

Articles are connected through:

  • Import backlinks — if module A imports module B, their articles link
  • Shared concepts — "Beanie ODM" appears in 17 articles, connecting them
  • Categories — "service layer", "data model", "API router" group related articles

Integration with Claude Code

Add to your project's CLAUDE.md:

## Knowledge Base
A codebase wiki lives at `docs/wiki/`. Read relevant articles before modifying modules.

Auto-rebuild on commits via .claude/hooks/:

#!/bin/bash
# .claude/hooks/kb-rebuild.sh
CHANGED=$(git diff --name-only HEAD~1 HEAD | grep "^src/" | head -1)
[ -z "$CHANGED" ] && exit 0
kb build ./src --scope myproject --output docs/wiki/

Integration with PocketPaw

The knowledge-base package integrates with PocketPaw's enterprise cloud module:

  • Agent context injection — KB articles injected into agent system prompt based on user query
  • Wiki pocket template — interactive KB browser rendered via Ripple UI
  • Agent-scoped KB — each agent can have its own knowledge base
  • Workspace-scoped KB — shared knowledge across a team

License

MIT

About

Build living wikis from codebases — AST analysis + LLM compilation turns code into searchable, interlinked knowledge.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages