Hitl metrics by zilvinasu · Pull Request #107 · castai/kimchi-dev

zilvinasu · 2026-04-29T09:21:33Z

Kimchi Summary

What changed

Introduces a new HITL (Human-in-the-Loop) metrics extension that instruments agent sessions to track time spent in agent execution, human interaction, and idle states. Adds the /metrics command to display aggregated statistics, recent sessions, and an ASCII timeline of the most recent closed session.

Why

Provides observability into HITL interaction patterns and session efficiency by persisting telemetry data locally for analysis.

Key changes

extensions/hitl-metrics/commands/metrics.ts: Implements /metrics command with three-category time tracking (agent, HITL, idle) and timeline visualization via getMetricsOutput().
extensions/hitl-metrics/formatters.ts: Adds formatting utilities for duration (formatDuration), metrics tables (formatMetrics), session rows (formatSessionRow), and ASCII timeline blocks (formatTimelineSection).
extensions/hitl-metrics/index.ts: Registers extension hooks (session_start, tool_execution_start, tool_result, session_shutdown, input) to capture lifecycle events.
extensions/hitl-metrics/session-manager.ts: Manages session state and SQLite persistence at ~/.kimchi/metrics/<hash>/hitl.db.
extensions/hitl-metrics/storage/: Database abstraction (HitlDatabase, projectHash) with tables hitl_sessions, hitl_events, and activity_events.
.gitignore: Excludes .kimchi/ local config and GSD baseline artifacts.
Test coverage: Adds unit tests (metrics.test.ts, formatters.test.ts, hooks.test.ts), integration tests (e2e.test.ts), and E2E harnesses (e2e-acp.ts, e2e-rpc.py) verifying DB write/read cycles and metrics accuracy.

Impact

Creates local SQLite database ~/.kimchi/metrics/<project_hash>/hitl.db on first run; auto-initializes schema.
Non-breaking: gracefully handles missing project directories (logs warning) and headless mode (prints to console).
Records durations for ask_user_questions tool calls, extracting question counts and selected options; skips "cached" results ≤50ms to avoid noise.
Categorizes session time into agent_time_ms, hitl_time_ms, and idle_time_ms buckets based on tool start/result and user input events.

Completed slices: - S01: S01 - S02: S02 - S03: S03 - S04: S04 - S05: S05 GSD-Milestone: M001 Branch: milestone/M001

- Exclude **/*.test.ts from tsconfig.json to avoid type errors in mocks - Fix Theme interface compatibility with @mariozechner/pi-coding-agent - Fix bun:sqlite import for Node.js compatibility - Update all import paths from .ts to .js for NodeNext resolution - Add index signatures to HitlSession and HitlEvent for SQLite queries

- Create test-helpers/mock-theme.ts with properly typed Theme mock - Update all test files to use shared Theme mock - Fix metrics.test.ts UI context with full ExtensionUIContext interface - Fix bun:sqlite types using inline DatabaseInstance type - Remove **/*.test.ts from tsconfig exclude - All 129 tests pass, tsc --noEmit clean Closes M002

…on time M004 milestone: HITL metrics extension. Features: - Session lifecycle tracking (start, active, closed, interrupted) - HITL event recording with duration, question count, selected options - Time categorization (HITL time, agent time, idle time) - /metrics command with stats, recent sessions, session timeline - Timeline visualization using Unicode block characters - Session resume (2-hour window for interrupted sessions) - Cross-session aggregation via project-scoped storage Storage: - SQLite at ~/.kimchi/metrics/<project-hash>/hitl.db - Tables: hitl_sessions, hitl_events, hitl_permission_events - Activity tracking in session row (hitl_time_ms, agent_time_ms, idle_time_ms) Stats (getSessionStats) use scalar subqueries to avoid cartesian product inflation from JOINs. Timeline (buildTimeline/buildTimelineFromActivities) falls back to activity events when no HITL events exist. Testing: - 172/172 tests pass across 11 test files - test-hitl-simulation.ts: programmatic E2E simulation - smoke.test.ts: full event lifecycle (start→event→close→query)

kimchi-review · 2026-04-29T09:21:40Z

Kimchi Code Review

Property	Value
Commit	`9f0b81e`
Author	@zilvinasu
Files changed	0
Review status	Completed
Comments	6 (3 info, 3 warning)
Duration	57s

Summary

📊 Review Score: 86/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 4/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage including unit tests for formatters, integration tests for session manager and storage, E2E tests for full lifecycle, and hook tests for extension events. Tests cover edge cases like empty databases, error conditions, and concurrent session handling.

📝 Found 6 issue(s). See inline comments for details.

What to expect

Kimchi will analyze the changes in this pull request and post:

A summary of the overall changes
Inline comments on specific lines with findings categorized by issue type

The review typically completes within a few minutes. This comment will be updated once the review is ready.

Interact with Kimchi

@kimchi review — re-trigger a full review on the latest commit
@kimchi summary — regenerate the PR summary
@kimchi ignore — skip this PR (no review will be posted)
Reply to any inline comment to ask follow-up questions or request clarification

Configuration

Reviews are configured by your organization admin.
Review instructions, excluded directories, and severity thresholds can be adjusted per repository in the Kimchi dashboard.

Powered by Kimchi — AI-powered code review by CAST AI

kimchi-review

📊 Review Score: 86/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 4/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage including unit tests for formatters, integration tests for session manager and storage, E2E tests for full lifecycle, and hook tests for extension events. Tests cover edge cases like empty databases, error conditions, and concurrent session handling.

📝 Found 6 issue(s). See inline comments for details.

kimchi-review · 2026-04-29T09:22:40Z

+		}
+
+		// Accumulate agent time (exclude HITL tools tracked separately)
+		if (toolName !== HITL_TOOL_NAME) {


⚠️🐛 Bug

recordUserInput() calls endIdlePeriod() before checking if this.db is non-null. If the database failed to initialize or was closed but isIdle remained true, endIdlePeriod() will crash when accessing this.db! (non-null assertion) on line 179.

💡 Suggestion: Add a null check for this.db before calling idle-related methods, or move the if (this.isIdle) block inside the if (this.db && this.session) guard clause.

kimchi-review · 2026-04-29T09:22:40Z

+		this.startIdlePeriod()
+	}
+
+	/** Check if we should start idle tracking */


⚠️🐛 Bug

recordToolStart() calls endIdlePeriod() unconditionally when isIdle is true, but the database might be null if initialization failed. This bypasses the null check on line 168.

💡 Suggestion: Move the idle check inside the database availability block, or ensure endIdlePeriod() validates this.db before use.

kimchi-review · 2026-04-29T09:22:40Z

+		// In practice, idle is detected lazily when next activity arrives
+	}
+
+	private startIdlePeriod(): void {


⚠️🐛 Bug

startIdlePeriod() uses this.db! non-null assertion without verifying this.db is initialized. If called when the database is unavailable (e.g., after init failure), this will throw a runtime error.

💡 Suggestion: Add an explicit null check: if (!this.db || !this.session) return; at the start of the method.

kimchi-review · 2026-04-29T09:22:40Z

+	const dbPath = options.dbPath ?? resolve(STORAGE_DIR, hash, "hitl.db")
+
+	const { existsSync } = await import("node:fs")
+	if (!existsSync(dbPath)) {


ℹ️⚠️ Error Handling

All database and query errors are caught and silently converted to the user-friendly message "No HITL data recorded yet". This masks actual errors like database corruption, permission denied, or disk full, potentially hiding production issues.

💡 Suggestion: Distinguish between "no data" (empty result) and "error" states. Log the full error stack server-side while showing a generic error message to the user, or include error details in the console warning.

kimchi-review · 2026-04-29T09:22:40Z

+			// Ignore
+		}
+	})
+


ℹ️🔧 Maintainability

Misaligned indentation on expect(stats.total_hitl_time_ms).toBe(0) breaks consistent code formatting.

💡 Suggestion: Fix indentation to align with surrounding expect statements.

kimchi-review · 2026-04-29T09:22:40Z

+			manager.recordToolStart(event.toolCallId, event.toolName)
+		} catch (error) {
+			const message = error instanceof Error ? error.message : String(error)
+			console.warn(`[HITL] Error in tool_execution_start handler: ${message}`)


ℹ️🏗️ Design

Multiple type assertions (as { input?: ... }, as { result?: ... }) bypass TypeScript type safety. If the event structure from the framework changes, these will fail silently at runtime.

💡 Suggestion: Define proper TypeScript interfaces for the event payloads and validate them at runtime (e.g., using zod or io-ts) instead of casting.

Fixes tsc --noEmit error: TS7006: Parameter 'event' implicitly has an 'any' type The mock's on() and registerCommand() callbacks lacked typed parameters, which tsc enforces in --noEmit mode (used by build:binary).

Captures JSONL events via --print --mode json, parses tool_execution_start/end events, and verifies the metrics DB contains correct hitl_events rows.

The hitl-metrics extension was creating its own random UUID for each session instead of using the actual kimchi session ID. This caused DB rows to have IDs like '8b53017e-...' while session files had '019dd92e-...' — completely unrelated, making it impossible to correlate session files with metrics DB data. The fix threads the real session ID from ctx.sessionManager.sessionId through manager.init() to getOrCreateSession(), which now accepts an optional sessionId override. When provided, getOrCreateSession uses it directly instead of generating a random UUID. Verified: session file IDs now match DB rows exactly.

Co-Authored-By: Kimchi <noreply@kimchi.dev>

zilvinasu and others added 5 commits April 28, 2026 08:15

chore: init gsd

b134d1e

feat: HITL Metrics Tracker

fbd0976

Completed slices: - S01: S01 - S02: S02 - S03: S03 - S04: S04 - S05: S05 GSD-Milestone: M001 Branch: milestone/M001

kimchi-review Bot reviewed Apr 29, 2026

View reviewed changes

Glosoli and others added 5 commits April 29, 2026 12:23

Merge remote-tracking branch 'origin/master' into hitl-metrics

8b8437a

fix: add explicit types to mock callback params in hooks.test.ts

9a98172

Fixes tsc --noEmit error: TS7006: Parameter 'event' implicitly has an 'any' type The mock's on() and registerCommand() callbacks lacked typed parameters, which tsc enforces in --noEmit mode (used by build:binary).

test: add ACP E2E test harness for HITL metrics

dc049c5

Captures JSONL events via --print --mode json, parses tool_execution_start/end events, and verifies the metrics DB contains correct hitl_events rows.

Add .kimchi/ to .gitignore and commit hitl-metrics changes

99940ad

Co-Authored-By: Kimchi <noreply@kimchi.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hitl metrics#107

Hitl metrics#107
zilvinasu wants to merge 10 commits intomasterfrom
hitl-metrics

zilvinasu commented Apr 29, 2026 •

edited by kimchi-review Bot

Loading

Uh oh!

kimchi-review Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

kimchi-review Bot left a comment

Uh oh!

kimchi-review Bot Apr 29, 2026

Uh oh!

kimchi-review Bot Apr 29, 2026

Uh oh!

kimchi-review Bot Apr 29, 2026

Uh oh!

kimchi-review Bot Apr 29, 2026

Uh oh!

kimchi-review Bot Apr 29, 2026

Uh oh!

kimchi-review Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zilvinasu commented Apr 29, 2026 • edited by kimchi-review Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kimchi Summary

What changed

Why

Key changes

Impact

Uh oh!

kimchi-review Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kimchi Code Review

Summary

Uh oh!

kimchi-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

kimchi-review Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

kimchi-review Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

kimchi-review Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

kimchi-review Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

kimchi-review Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

kimchi-review Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zilvinasu commented Apr 29, 2026 •

edited by kimchi-review Bot

Loading

kimchi-review Bot commented Apr 29, 2026 •

edited

Loading