Skip to content

Hitl metrics#107

Draft
zilvinasu wants to merge 10 commits intomasterfrom
hitl-metrics
Draft

Hitl metrics#107
zilvinasu wants to merge 10 commits intomasterfrom
hitl-metrics

Conversation

@zilvinasu
Copy link
Copy Markdown
Contributor

@zilvinasu zilvinasu commented Apr 29, 2026


Kimchi Summary

What changed

Introduces a new HITL (Human-in-the-Loop) metrics extension that instruments agent sessions to track time spent in agent execution, human interaction, and idle states. Adds the /metrics command to display aggregated statistics, recent sessions, and an ASCII timeline of the most recent closed session.

Why

Provides observability into HITL interaction patterns and session efficiency by persisting telemetry data locally for analysis.

Key changes

  • extensions/hitl-metrics/commands/metrics.ts: Implements /metrics command with three-category time tracking (agent, HITL, idle) and timeline visualization via getMetricsOutput().
  • extensions/hitl-metrics/formatters.ts: Adds formatting utilities for duration (formatDuration), metrics tables (formatMetrics), session rows (formatSessionRow), and ASCII timeline blocks (formatTimelineSection).
  • extensions/hitl-metrics/index.ts: Registers extension hooks (session_start, tool_execution_start, tool_result, session_shutdown, input) to capture lifecycle events.
  • extensions/hitl-metrics/session-manager.ts: Manages session state and SQLite persistence at ~/.kimchi/metrics/<hash>/hitl.db.
  • extensions/hitl-metrics/storage/: Database abstraction (HitlDatabase, projectHash) with tables hitl_sessions, hitl_events, and activity_events.
  • .gitignore: Excludes .kimchi/ local config and GSD baseline artifacts.
  • Test coverage: Adds unit tests (metrics.test.ts, formatters.test.ts, hooks.test.ts), integration tests (e2e.test.ts), and E2E harnesses (e2e-acp.ts, e2e-rpc.py) verifying DB write/read cycles and metrics accuracy.

Impact

  • Creates local SQLite database ~/.kimchi/metrics/<project_hash>/hitl.db on first run; auto-initializes schema.
  • Non-breaking: gracefully handles missing project directories (logs warning) and headless mode (prints to console).
  • Records durations for ask_user_questions tool calls, extracting question counts and selected options; skips "cached" results ≤50ms to avoid noise.
  • Categorizes session time into agent_time_ms, hitl_time_ms, and idle_time_ms buckets based on tool start/result and user input events.

zilvinasu and others added 5 commits April 28, 2026 08:15
Completed slices:
- S01: S01
- S02: S02
- S03: S03
- S04: S04
- S05: S05

GSD-Milestone: M001
Branch: milestone/M001
- Exclude **/*.test.ts from tsconfig.json to avoid type errors in mocks
- Fix Theme interface compatibility with @mariozechner/pi-coding-agent
- Fix bun:sqlite import for Node.js compatibility
- Update all import paths from .ts to .js for NodeNext resolution
- Add index signatures to HitlSession and HitlEvent for SQLite queries
- Create test-helpers/mock-theme.ts with properly typed Theme mock
- Update all test files to use shared Theme mock
- Fix metrics.test.ts UI context with full ExtensionUIContext interface
- Fix bun:sqlite types using inline DatabaseInstance type
- Remove **/*.test.ts from tsconfig exclude
- All 129 tests pass, tsc --noEmit clean

Closes M002
…on time

M004 milestone: HITL metrics extension.

Features:
- Session lifecycle tracking (start, active, closed, interrupted)
- HITL event recording with duration, question count, selected options
- Time categorization (HITL time, agent time, idle time)
- /metrics command with stats, recent sessions, session timeline
- Timeline visualization using Unicode block characters
- Session resume (2-hour window for interrupted sessions)
- Cross-session aggregation via project-scoped storage

Storage:
- SQLite at ~/.kimchi/metrics/<project-hash>/hitl.db
- Tables: hitl_sessions, hitl_events, hitl_permission_events
- Activity tracking in session row (hitl_time_ms, agent_time_ms, idle_time_ms)

Stats (getSessionStats) use scalar subqueries to avoid cartesian product
inflation from JOINs. Timeline (buildTimeline/buildTimelineFromActivities)
falls back to activity events when no HITL events exist.

Testing:
- 172/172 tests pass across 11 test files
- test-hitl-simulation.ts: programmatic E2E simulation
- smoke.test.ts: full event lifecycle (start→event→close→query)
@kimchi-review
Copy link
Copy Markdown

kimchi-review Bot commented Apr 29, 2026

Kimchi Code Review

Property Value
Commit 9f0b81e
Author @zilvinasu
Files changed 0
Review status Completed
Comments 6 (3 info, 3 warning)
Duration 57s

Summary

📊 Review Score: 86/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 4/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage including unit tests for formatters, integration tests for session manager and storage, E2E tests for full lifecycle, and hook tests for extension events. Tests cover edge cases like empty databases, error conditions, and concurrent session handling.

📝 Found 6 issue(s). See inline comments for details.

What to expect

Kimchi will analyze the changes in this pull request and post:

  • A summary of the overall changes
  • Inline comments on specific lines with findings categorized by issue type

The review typically completes within a few minutes. This comment will be updated once the review is ready.

Interact with Kimchi
  • @kimchi review — re-trigger a full review on the latest commit
  • @kimchi summary — regenerate the PR summary
  • @kimchi ignore — skip this PR (no review will be posted)
  • Reply to any inline comment to ask follow-up questions or request clarification
Configuration

Reviews are configured by your organization admin.
Review instructions, excluded directories, and severity thresholds can be adjusted per repository in the Kimchi dashboard.


Powered by Kimchi — AI-powered code review by CAST AI

Copy link
Copy Markdown

@kimchi-review kimchi-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📊 Review Score: 86/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 4/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage including unit tests for formatters, integration tests for session manager and storage, E2E tests for full lifecycle, and hook tests for extension events. Tests cover edge cases like empty databases, error conditions, and concurrent session handling.

📝 Found 6 issue(s). See inline comments for details.

}

// Accumulate agent time (exclude HITL tools tracked separately)
if (toolName !== HITL_TOOL_NAME) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️🐛 Bug

recordUserInput() calls endIdlePeriod() before checking if this.db is non-null. If the database failed to initialize or was closed but isIdle remained true, endIdlePeriod() will crash when accessing this.db! (non-null assertion) on line 179.

💡 Suggestion: Add a null check for this.db before calling idle-related methods, or move the if (this.isIdle) block inside the if (this.db && this.session) guard clause.

this.startIdlePeriod()
}

/** Check if we should start idle tracking */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️🐛 Bug

recordToolStart() calls endIdlePeriod() unconditionally when isIdle is true, but the database might be null if initialization failed. This bypasses the null check on line 168.

💡 Suggestion: Move the idle check inside the database availability block, or ensure endIdlePeriod() validates this.db before use.

// In practice, idle is detected lazily when next activity arrives
}

private startIdlePeriod(): void {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️🐛 Bug

startIdlePeriod() uses this.db! non-null assertion without verifying this.db is initialized. If called when the database is unavailable (e.g., after init failure), this will throw a runtime error.

💡 Suggestion: Add an explicit null check: if (!this.db || !this.session) return; at the start of the method.

const dbPath = options.dbPath ?? resolve(STORAGE_DIR, hash, "hitl.db")

const { existsSync } = await import("node:fs")
if (!existsSync(dbPath)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️⚠️ Error Handling

All database and query errors are caught and silently converted to the user-friendly message "No HITL data recorded yet". This masks actual errors like database corruption, permission denied, or disk full, potentially hiding production issues.

💡 Suggestion: Distinguish between "no data" (empty result) and "error" states. Log the full error stack server-side while showing a generic error message to the user, or include error details in the console warning.

// Ignore
}
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️🔧 Maintainability

Misaligned indentation on expect(stats.total_hitl_time_ms).toBe(0) breaks consistent code formatting.

💡 Suggestion: Fix indentation to align with surrounding expect statements.

manager.recordToolStart(event.toolCallId, event.toolName)
} catch (error) {
const message = error instanceof Error ? error.message : String(error)
console.warn(`[HITL] Error in tool_execution_start handler: ${message}`)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ℹ️🏗️ Design

Multiple type assertions (as { input?: ... }, as { result?: ... }) bypass TypeScript type safety. If the event structure from the framework changes, these will fail silently at runtime.

💡 Suggestion: Define proper TypeScript interfaces for the event payloads and validate them at runtime (e.g., using zod or io-ts) instead of casting.

Glosoli and others added 5 commits April 29, 2026 12:23
Fixes tsc --noEmit error:
  TS7006: Parameter 'event' implicitly has an 'any' type

The mock's on() and registerCommand() callbacks lacked typed parameters,
which tsc enforces in --noEmit mode (used by build:binary).
Captures JSONL events via --print --mode json, parses tool_execution_start/end
events, and verifies the metrics DB contains correct hitl_events rows.
The hitl-metrics extension was creating its own random UUID for each
session instead of using the actual kimchi session ID. This caused DB
rows to have IDs like '8b53017e-...' while session files had
'019dd92e-...' — completely unrelated, making it impossible to
correlate session files with metrics DB data.

The fix threads the real session ID from ctx.sessionManager.sessionId
through manager.init() to getOrCreateSession(), which now accepts an
optional sessionId override. When provided, getOrCreateSession uses it
directly instead of generating a random UUID.

Verified: session file IDs now match DB rows exactly.
Co-Authored-By: Kimchi <noreply@kimchi.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants