chore: add AI agent knowledge bank [DATA-32873] by quocnguyendinh · Pull Request #18 · Kaligo/diffa

quocnguyendinh · 2026-04-03T10:34:30Z

Summary

Add CLAUDE.md at project root (auto-loaded by Claude Code every session) with essential architecture, patterns, and conventions inline
Add knowledge/ folder with detailed reference docs covering architecture, design patterns, conventions, database layer, configuration, testing, and CLI

Jira

Test plan

Verify CLAUDE.md is loaded automatically in new Claude Code sessions
Verify knowledge/ files are accurate against current codebase

🤖 Generated with Claude Code

Add CLAUDE.md and knowledge/ folder documenting architecture, design patterns, conventions, and component layers to enable AI coding agents to follow existing patterns when working on the codebase. Refs: DATA-32873 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move knowledge files from knowledge/ to .claude/rules/ so they are automatically loaded into every Claude Code session. Replace root CLAUDE.md with .claude/CLAUDE.md. Refs: DATA-32873 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds an AI-agent-oriented documentation “knowledge bank” under .claude/, intended to be auto-loaded in Claude Code sessions to capture Diffa’s architecture, conventions, and operational patterns in one place.

Changes:

Added .claude/CLAUDE.md as the entrypoint and multiple .claude/rules/*.md reference docs.
Documented key layers (CLI → managers → services → repositories), configuration resolution, DB access patterns, and testing patterns.
Captured project conventions (naming, imports, error-handling, exit codes, dependencies).

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`.claude/CLAUDE.md`	Claude entrypoint doc referencing auto-loaded rules content.
`.claude/rules/architecture.md`	High-level architecture, layers, and end-to-end data-diff flow.
`.claude/rules/cli.md`	CLI surface area and command/option documentation.
`.claude/rules/configuration.md`	ConfigManager/DBConfig hierarchy and config precedence.
`.claude/rules/conventions.md`	Naming/import/type-hint/logging/error-handling conventions.
`.claude/rules/database-layer.md`	DB connection strategies, ORM/schema notes, query/upsert/locking patterns.
`.claude/rules/design-patterns.md`	Repository/service/manager and other design patterns used in the codebase.
`.claude/rules/testing.md`	Pytest setup, fixtures, mocking patterns, and what is/isn’t tested.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T10:47:06Z

.claude/rules/design-patterns.md

+```python
+# src/diffa/db/diffa_check.py
+class DiffaCheckService:
+    def __init__(self, config_manager: ConfigManager):


In the DiffaCheckService example, self.config_manager is referenced but never assigned (and differs from the actual implementation which sets self.config_manager = config_manager). Update the snippet so it matches the real constructor wiring and doesn’t imply an undefined attribute.

Suggested change

def __init__(self, config_manager: ConfigManager):

def __init__(self, config_manager: ConfigManager):

self.config_manager = config_manager

Copilot · 2026-04-03T10:47:07Z

.claude/rules/design-patterns.md

+```python
+# src/diffa/managers/check_manager.py
+class CheckManager:
+    def __init__(self, config_manager: ConfigManager):


In the CheckManager example, SourceTargetService(self.cm) / DiffaCheckService(self.cm) references self.cm but the snippet doesn’t show self.cm = config_manager (as in the real code). Adjust the snippet to avoid an undefined attribute and reflect the actual initialization order.

Suggested change

def __init__(self, config_manager: ConfigManager):

def __init__(self, config_manager: ConfigManager):

self.cm = config_manager

Copilot · 2026-04-03T10:47:07Z

.claude/rules/design-patterns.md

+        self.conn = DiffaConnection(self.db_config.get_db_config())
+
+    def get_latest_check(self, source_database, ...) -> Optional[dict]: ...
+    def get_invalid_checks(self, source_database, ...) -> List[DiffaCheckSchema]: ...


The Repository Pattern snippet says get_invalid_checks(...) -> List[DiffaCheckSchema], but the current implementation is a generator (it yields items) and returns serialized dicts via model_dump(). Please align the documented return type/behavior with the code to avoid misleading consumers of this doc.

Suggested change

def get_invalid_checks(self, source_database, ...) -> List[DiffaCheckSchema]: ...

def get_invalid_checks(self, source_database, ...) -> Iterator[dict]: ...

Copilot · 2026-04-03T10:47:07Z

.claude/CLAUDE.md

+# Diffa
+
+A CLI tool that compares row counts between a Source database and a Target database (data warehouse), per day, optionally by dimension columns. Results are persisted in a Diffa state database.
+
+All architecture, design patterns, conventions, and component documentation is auto-loaded from `.claude/rules/`.


PR description mentions adding a root-level CLAUDE.md and a knowledge/ folder, but the change adds .claude/CLAUDE.md and .claude/rules/ instead. Either adjust the description or add/move the files so the implementation matches what reviewers should expect.

Copilot · 2026-04-03T10:47:08Z

.claude/rules/conventions.md

+```python
+def get_last_check_date(self) -> date: ...
+def get_invalid_checks(self, ...) -> List[DiffaCheckSchema]: ...
+def get_counts(self, last_check_date: date, invalid_check_dates: Iterable[date]) -> Iterable[CountCheck]: ...
+```


In the Type Hints examples, SourceTargetService.get_counts(...) is shown as returning a single Iterable[CountCheck], but the actual API is consumed as a 2-tuple (source_counts, target_counts) (see CheckManager.compare_tables). Consider updating the example signature to reflect the tuple return type so readers don’t copy an incorrect contract.

Copilot · 2026-04-03T10:47:08Z

.claude/rules/database-layer.md

+### DiffaCheck
+
+```
+Table: diffa.diffa_checks
+Columns: id (UUID PK), source_database, source_schema, source_table,
+         target_database, target_schema, target_table, check_date,
+         source_count, target_count, is_valid, diff_count, updated_at
+```
+
+### DiffaCheckRun
+
+```
+Table: diffa.diffa_check_runs
+Columns: run_id (UUID PK), source_database, source_schema, source_table,
+         target_database, target_schema, target_table, status, updated_at
+Unique index: (source_*, target_*) WHERE status = 'RUNNING'
+```


The “Table: diffa.diffa_checks / diffa.diffa_check_runs” column lists omit created_at, but Alembic migrations create both created_at and updated_at columns for these tables (e.g., src/diffa/migrations/versions/2f7730d11b26... and 9710c39cc078...). Either include created_at in the documented schema or explicitly state that created_at exists in the DB but is not mapped in data_models.py ORM classes.

Each rule file now has a changelog table at the bottom tracking date, PR, and description of changes for decision traceability. Refs: DATA-32873 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

quocnguyendinh and others added 2 commits April 3, 2026 17:34

quocnguyendinh self-assigned this Apr 3, 2026

quocnguyendinh requested a review from Copilot April 3, 2026 10:41

Copilot started reviewing on behalf of quocnguyendinh April 3, 2026 10:42 View session

Copilot AI reviewed Apr 3, 2026

View reviewed changes

quocnguyendinh changed the title ~~chore: add AI agent knowledge bank~~ chore: add AI agent knowledge bank [DATA-32873] Apr 3, 2026

chore: add changelog section to each knowledge bank rule file

abd8c71

Each rule file now has a changelog table at the bottom tracking date, PR, and description of changes for decision traceability. Refs: DATA-32873 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add AI agent knowledge bank [DATA-32873]#18

chore: add AI agent knowledge bank [DATA-32873]#18
quocnguyendinh wants to merge 3 commits intomasterfrom
chore/add-knowledge-bank

quocnguyendinh commented Apr 3, 2026 •

edited by atlassian bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def __init__(self, config_manager: ConfigManager):
	def __init__(self, config_manager: ConfigManager):
	self.config_manager = config_manager

	def __init__(self, config_manager: ConfigManager):
	def __init__(self, config_manager: ConfigManager):
	self.cm = config_manager

	def get_invalid_checks(self, source_database, ...) -> List[DiffaCheckSchema]: ...
	def get_invalid_checks(self, source_database, ...) -> Iterator[dict]: ...

Conversation

quocnguyendinh commented Apr 3, 2026 • edited by atlassian bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Jira

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

quocnguyendinh commented Apr 3, 2026 •

edited by atlassian bot

Loading