Conversation
Completed slices: - S01: S01 - S02: S02 - S03: S03 - S04: S04 - S05: S05 GSD-Milestone: M001 Branch: milestone/M001
- Exclude **/*.test.ts from tsconfig.json to avoid type errors in mocks - Fix Theme interface compatibility with @mariozechner/pi-coding-agent - Fix bun:sqlite import for Node.js compatibility - Update all import paths from .ts to .js for NodeNext resolution - Add index signatures to HitlSession and HitlEvent for SQLite queries
- Create test-helpers/mock-theme.ts with properly typed Theme mock - Update all test files to use shared Theme mock - Fix metrics.test.ts UI context with full ExtensionUIContext interface - Fix bun:sqlite types using inline DatabaseInstance type - Remove **/*.test.ts from tsconfig exclude - All 129 tests pass, tsc --noEmit clean Closes M002
…on time M004 milestone: HITL metrics extension. Features: - Session lifecycle tracking (start, active, closed, interrupted) - HITL event recording with duration, question count, selected options - Time categorization (HITL time, agent time, idle time) - /metrics command with stats, recent sessions, session timeline - Timeline visualization using Unicode block characters - Session resume (2-hour window for interrupted sessions) - Cross-session aggregation via project-scoped storage Storage: - SQLite at ~/.kimchi/metrics/<project-hash>/hitl.db - Tables: hitl_sessions, hitl_events, hitl_permission_events - Activity tracking in session row (hitl_time_ms, agent_time_ms, idle_time_ms) Stats (getSessionStats) use scalar subqueries to avoid cartesian product inflation from JOINs. Timeline (buildTimeline/buildTimelineFromActivities) falls back to activity events when no HITL events exist. Testing: - 172/172 tests pass across 11 test files - test-hitl-simulation.ts: programmatic E2E simulation - smoke.test.ts: full event lifecycle (start→event→close→query)
Kimchi Code Review
Summary📊 Review Score: 86/100 (overall code quality — 0 lowest, 100 highest) 🧪 Tests: yes — Comprehensive test coverage including unit tests for formatters, integration tests for session manager and storage, E2E tests for full lifecycle, and hook tests for extension events. Tests cover edge cases like empty databases, error conditions, and concurrent session handling. 📝 Found 6 issue(s). See inline comments for details. What to expectKimchi will analyze the changes in this pull request and post:
The review typically completes within a few minutes. This comment will be updated once the review is ready. Interact with Kimchi
ConfigurationReviews are configured by your organization admin. Powered by Kimchi — AI-powered code review by CAST AI |
There was a problem hiding this comment.
📊 Review Score: 86/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 4/5 (1 = trivial, 5 = very complex)
🧪 Tests: yes — Comprehensive test coverage including unit tests for formatters, integration tests for session manager and storage, E2E tests for full lifecycle, and hook tests for extension events. Tests cover edge cases like empty databases, error conditions, and concurrent session handling.
📝 Found 6 issue(s). See inline comments for details.
| } | ||
|
|
||
| // Accumulate agent time (exclude HITL tools tracked separately) | ||
| if (toolName !== HITL_TOOL_NAME) { |
There was a problem hiding this comment.
recordUserInput() calls endIdlePeriod() before checking if this.db is non-null. If the database failed to initialize or was closed but isIdle remained true, endIdlePeriod() will crash when accessing this.db! (non-null assertion) on line 179.
💡 Suggestion: Add a null check for this.db before calling idle-related methods, or move the if (this.isIdle) block inside the if (this.db && this.session) guard clause.
| this.startIdlePeriod() | ||
| } | ||
|
|
||
| /** Check if we should start idle tracking */ |
There was a problem hiding this comment.
recordToolStart() calls endIdlePeriod() unconditionally when isIdle is true, but the database might be null if initialization failed. This bypasses the null check on line 168.
💡 Suggestion: Move the idle check inside the database availability block, or ensure endIdlePeriod() validates this.db before use.
| // In practice, idle is detected lazily when next activity arrives | ||
| } | ||
|
|
||
| private startIdlePeriod(): void { |
There was a problem hiding this comment.
startIdlePeriod() uses this.db! non-null assertion without verifying this.db is initialized. If called when the database is unavailable (e.g., after init failure), this will throw a runtime error.
💡 Suggestion: Add an explicit null check: if (!this.db || !this.session) return; at the start of the method.
| const dbPath = options.dbPath ?? resolve(STORAGE_DIR, hash, "hitl.db") | ||
|
|
||
| const { existsSync } = await import("node:fs") | ||
| if (!existsSync(dbPath)) { |
There was a problem hiding this comment.
ℹ️
All database and query errors are caught and silently converted to the user-friendly message "No HITL data recorded yet". This masks actual errors like database corruption, permission denied, or disk full, potentially hiding production issues.
💡 Suggestion: Distinguish between "no data" (empty result) and "error" states. Log the full error stack server-side while showing a generic error message to the user, or include error details in the console warning.
| // Ignore | ||
| } | ||
| }) | ||
|
|
There was a problem hiding this comment.
ℹ️🔧 Maintainability
Misaligned indentation on expect(stats.total_hitl_time_ms).toBe(0) breaks consistent code formatting.
💡 Suggestion: Fix indentation to align with surrounding expect statements.
| manager.recordToolStart(event.toolCallId, event.toolName) | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : String(error) | ||
| console.warn(`[HITL] Error in tool_execution_start handler: ${message}`) |
There was a problem hiding this comment.
ℹ️🏗️ Design
Multiple type assertions (as { input?: ... }, as { result?: ... }) bypass TypeScript type safety. If the event structure from the framework changes, these will fail silently at runtime.
💡 Suggestion: Define proper TypeScript interfaces for the event payloads and validate them at runtime (e.g., using zod or io-ts) instead of casting.
Fixes tsc --noEmit error: TS7006: Parameter 'event' implicitly has an 'any' type The mock's on() and registerCommand() callbacks lacked typed parameters, which tsc enforces in --noEmit mode (used by build:binary).
Captures JSONL events via --print --mode json, parses tool_execution_start/end events, and verifies the metrics DB contains correct hitl_events rows.
The hitl-metrics extension was creating its own random UUID for each session instead of using the actual kimchi session ID. This caused DB rows to have IDs like '8b53017e-...' while session files had '019dd92e-...' — completely unrelated, making it impossible to correlate session files with metrics DB data. The fix threads the real session ID from ctx.sessionManager.sessionId through manager.init() to getOrCreateSession(), which now accepts an optional sessionId override. When provided, getOrCreateSession uses it directly instead of generating a random UUID. Verified: session file IDs now match DB rows exactly.
Co-Authored-By: Kimchi <noreply@kimchi.dev>
Kimchi Summary
What changed
Introduces a new HITL (Human-in-the-Loop) metrics extension that instruments agent sessions to track time spent in agent execution, human interaction, and idle states. Adds the
/metricscommand to display aggregated statistics, recent sessions, and an ASCII timeline of the most recent closed session.Why
Provides observability into HITL interaction patterns and session efficiency by persisting telemetry data locally for analysis.
Key changes
extensions/hitl-metrics/commands/metrics.ts: Implements/metricscommand with three-category time tracking (agent, HITL, idle) and timeline visualization viagetMetricsOutput().extensions/hitl-metrics/formatters.ts: Adds formatting utilities for duration (formatDuration), metrics tables (formatMetrics), session rows (formatSessionRow), and ASCII timeline blocks (formatTimelineSection).extensions/hitl-metrics/index.ts: Registers extension hooks (session_start,tool_execution_start,tool_result,session_shutdown,input) to capture lifecycle events.extensions/hitl-metrics/session-manager.ts: Manages session state and SQLite persistence at~/.kimchi/metrics/<hash>/hitl.db.extensions/hitl-metrics/storage/: Database abstraction (HitlDatabase,projectHash) with tableshitl_sessions,hitl_events, andactivity_events..gitignore: Excludes.kimchi/local config and GSD baseline artifacts.metrics.test.ts,formatters.test.ts,hooks.test.ts), integration tests (e2e.test.ts), and E2E harnesses (e2e-acp.ts,e2e-rpc.py) verifying DB write/read cycles and metrics accuracy.Impact
~/.kimchi/metrics/<project_hash>/hitl.dbon first run; auto-initializes schema.ask_user_questionstool calls, extracting question counts and selected options; skips "cached" results ≤50ms to avoid noise.agent_time_ms,hitl_time_ms, andidle_time_msbuckets based on tool start/result and user input events.