Skip to content

chore: add AI agent knowledge bank [DATA-32873]#18

Open
quocnguyendinh wants to merge 3 commits intomasterfrom
chore/add-knowledge-bank
Open

chore: add AI agent knowledge bank [DATA-32873]#18
quocnguyendinh wants to merge 3 commits intomasterfrom
chore/add-knowledge-bank

Conversation

@quocnguyendinh
Copy link
Copy Markdown
Collaborator

@quocnguyendinh quocnguyendinh commented Apr 3, 2026

Summary

  • Add CLAUDE.md at project root (auto-loaded by Claude Code every session) with essential architecture, patterns, and conventions inline
  • Add knowledge/ folder with detailed reference docs covering architecture, design patterns, conventions, database layer, configuration, testing, and CLI

Jira

DATA-32873

Test plan

  • Verify CLAUDE.md is loaded automatically in new Claude Code sessions
  • Verify knowledge/ files are accurate against current codebase

🤖 Generated with Claude Code

quocnguyendinh and others added 2 commits April 3, 2026 17:34
Add CLAUDE.md and knowledge/ folder documenting architecture, design
patterns, conventions, and component layers to enable AI coding agents
to follow existing patterns when working on the codebase.

Refs: DATA-32873

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move knowledge files from knowledge/ to .claude/rules/ so they are
automatically loaded into every Claude Code session. Replace root
CLAUDE.md with .claude/CLAUDE.md.

Refs: DATA-32873

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an AI-agent-oriented documentation “knowledge bank” under .claude/, intended to be auto-loaded in Claude Code sessions to capture Diffa’s architecture, conventions, and operational patterns in one place.

Changes:

  • Added .claude/CLAUDE.md as the entrypoint and multiple .claude/rules/*.md reference docs.
  • Documented key layers (CLI → managers → services → repositories), configuration resolution, DB access patterns, and testing patterns.
  • Captured project conventions (naming, imports, error-handling, exit codes, dependencies).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
.claude/CLAUDE.md Claude entrypoint doc referencing auto-loaded rules content.
.claude/rules/architecture.md High-level architecture, layers, and end-to-end data-diff flow.
.claude/rules/cli.md CLI surface area and command/option documentation.
.claude/rules/configuration.md ConfigManager/DBConfig hierarchy and config precedence.
.claude/rules/conventions.md Naming/import/type-hint/logging/error-handling conventions.
.claude/rules/database-layer.md DB connection strategies, ORM/schema notes, query/upsert/locking patterns.
.claude/rules/design-patterns.md Repository/service/manager and other design patterns used in the codebase.
.claude/rules/testing.md Pytest setup, fixtures, mocking patterns, and what is/isn’t tested.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

```python
# src/diffa/db/diffa_check.py
class DiffaCheckService:
def __init__(self, config_manager: ConfigManager):
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the DiffaCheckService example, self.config_manager is referenced but never assigned (and differs from the actual implementation which sets self.config_manager = config_manager). Update the snippet so it matches the real constructor wiring and doesn’t imply an undefined attribute.

Suggested change
def __init__(self, config_manager: ConfigManager):
def __init__(self, config_manager: ConfigManager):
self.config_manager = config_manager

Copilot uses AI. Check for mistakes.
```python
# src/diffa/managers/check_manager.py
class CheckManager:
def __init__(self, config_manager: ConfigManager):
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CheckManager example, SourceTargetService(self.cm) / DiffaCheckService(self.cm) references self.cm but the snippet doesn’t show self.cm = config_manager (as in the real code). Adjust the snippet to avoid an undefined attribute and reflect the actual initialization order.

Suggested change
def __init__(self, config_manager: ConfigManager):
def __init__(self, config_manager: ConfigManager):
self.cm = config_manager

Copilot uses AI. Check for mistakes.
self.conn = DiffaConnection(self.db_config.get_db_config())

def get_latest_check(self, source_database, ...) -> Optional[dict]: ...
def get_invalid_checks(self, source_database, ...) -> List[DiffaCheckSchema]: ...
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Repository Pattern snippet says get_invalid_checks(...) -> List[DiffaCheckSchema], but the current implementation is a generator (it yields items) and returns serialized dicts via model_dump(). Please align the documented return type/behavior with the code to avoid misleading consumers of this doc.

Suggested change
def get_invalid_checks(self, source_database, ...) -> List[DiffaCheckSchema]: ...
def get_invalid_checks(self, source_database, ...) -> Iterator[dict]: ...

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
# Diffa

A CLI tool that compares row counts between a Source database and a Target database (data warehouse), per day, optionally by dimension columns. Results are persisted in a Diffa state database.

All architecture, design patterns, conventions, and component documentation is auto-loaded from `.claude/rules/`.
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description mentions adding a root-level CLAUDE.md and a knowledge/ folder, but the change adds .claude/CLAUDE.md and .claude/rules/ instead. Either adjust the description or add/move the files so the implementation matches what reviewers should expect.

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +58
```python
def get_last_check_date(self) -> date: ...
def get_invalid_checks(self, ...) -> List[DiffaCheckSchema]: ...
def get_counts(self, last_check_date: date, invalid_check_dates: Iterable[date]) -> Iterable[CountCheck]: ...
```
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Type Hints examples, SourceTargetService.get_counts(...) is shown as returning a single Iterable[CountCheck], but the actual API is consumed as a 2-tuple (source_counts, target_counts) (see CheckManager.compare_tables). Consider updating the example signature to reflect the tuple return type so readers don’t copy an incorrect contract.

Copilot uses AI. Check for mistakes.
Comment on lines +53 to +69
### DiffaCheck

```
Table: diffa.diffa_checks
Columns: id (UUID PK), source_database, source_schema, source_table,
target_database, target_schema, target_table, check_date,
source_count, target_count, is_valid, diff_count, updated_at
```

### DiffaCheckRun

```
Table: diffa.diffa_check_runs
Columns: run_id (UUID PK), source_database, source_schema, source_table,
target_database, target_schema, target_table, status, updated_at
Unique index: (source_*, target_*) WHERE status = 'RUNNING'
```
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Table: diffa.diffa_checks / diffa.diffa_check_runs” column lists omit created_at, but Alembic migrations create both created_at and updated_at columns for these tables (e.g., src/diffa/migrations/versions/2f7730d11b26... and 9710c39cc078...). Either include created_at in the documented schema or explicitly state that created_at exists in the DB but is not mapped in data_models.py ORM classes.

Copilot uses AI. Check for mistakes.
@quocnguyendinh quocnguyendinh changed the title chore: add AI agent knowledge bank chore: add AI agent knowledge bank [DATA-32873] Apr 3, 2026
Each rule file now has a changelog table at the bottom tracking
date, PR, and description of changes for decision traceability.

Refs: DATA-32873

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants