feat(subagent): implement Anthropic-style soft token budget#129
feat(subagent): implement Anthropic-style soft token budget#129igor-susic1 wants to merge 13 commits intomasterfrom
Conversation
Kimchi Code Review
Summary📊 Review Score: 92/100 (overall code quality — 0 lowest, 100 highest) 🧪 Tests: yes — Comprehensive test coverage added for 📝 Found 2 issue(s). See inline comments for details. What to expectKimchi will analyze the changes in this pull request and post:
The review typically completes within a few minutes. This comment will be updated once the review is ready. Interact with Kimchi
ConfigurationReviews are configured by your organization admin. Powered by Kimchi — AI-powered code review by CAST AI |
There was a problem hiding this comment.
📊 Review Score: 88/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 3/5 (1 = trivial, 5 = very complex)
🧪 Tests: yes — Comprehensive test coverage added for resolveBudgetConfig logic (default hard limits, explicit overrides, null handling) and buildSubagentSystemPrompt budget section formatting. Tests validate both happy paths and edge cases like zero/negative inputs.
📝 Found 3 issue(s). See inline comments for details.
|
@kimchi review |
|
🔄 Starting review on |
There was a problem hiding this comment.
📊 Review Score: 92/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 3/5 (1 = trivial, 5 = very complex)
🧪 Tests: yes — Comprehensive test coverage added for parseSubagentBudgetFromEnv (validating NaN, Infinity, negative, and empty string handling) and resolveBudgetConfig (including NaN/Infinity propagation and default hard limit calculation at 150%). Integration tests for buildSubagentSystemPrompt verify correct budget section formatting. The budget enforcement logic in spawnSubagent is covered implicitly through the parameter resolution tests.
📝 Found 2 issue(s). See inline comments for details.
71d7e92 to
fa28d7f
Compare
Replace the instant-kill token budget with a three-tier advisory system:
- 80%: warning injected into model output ("Consider wrapping up")
- 100%: exceeded notice ("Finishing current action, then stopping")
- 150%: hard kill via SIGTERM (or explicit hardTokenBudget)
Changes:
- Add TokenBudgetConfig + BudgetState interfaces
- Add resolveBudgetConfig() helper
- Add hardTokenBudget parameter to SubagentParams
- checkBudget() state machine in spawnSubagent
- Inject budget into subagent system prompt via env vars
- Add SubagentBudgetInfo to buildSubagentSystemPrompt
- formatBudgetSection renders soft/hard limits in prompt
- Update system prompt guidance: omit budgets by default,
warn about attachment token costs, explain soft vs hard
Tests:
- resolveBudgetConfig: 5 cases (null, zero, default 150%, explicit, fallback)
- prompt-transformer: 3 cases (no budget, soft+hard, soft only)
All 96 tests pass. Type-check clean.
Co-Authored-By: Kimchi <noreply@kimchi.dev>
- resolveBudgetConfig: guard against NaN/Infinity for both tokenBudget and hardTokenBudget - spawnSubagent: use conditional spreading to avoid setting empty env vars - prompt-enrichment: replace inline Number() parsing with parseSubagentBudgetFromEnv validator - parseSubagentBudgetFromEnv: new exported helper that validates env var strings with Number.isFinite() and positive checks, returns undefined on any invalid input - Add 4 NaN/Infinity tests for resolveBudgetConfig - Add 10 validation tests for parseSubagentBudgetFromEnv (undefined, empty, NaN, non-numeric, negative, valid, valid with invalid hard limit) Total: 110 tests pass. Type-check clean. Co-Authored-By: Kimchi <noreply@kimchi.dev>
Extract the budget state machine from the spawnSubagent closure into
a pure exported function checkBudgetState(). This makes the runtime
budget logic independently unit-testable.
- checkBudgetState(input, output, config, currentState) → { state, warning, kill }
- Cloned soft/exceeded/hard-kill logic unchanged from the closure
- Export SoftBudgetState, TokenBudgetConfig for test types
- spawnSubagent delegates to checkBudgetState for warnings and kill decision
Tests:
- no budget config → no-op
- 80% trigger → warning
- no double-warning from warning state
- no warning from exceeded state
- 100% trigger → exceeded notice
- 150% trigger → hard kill
- kill from normal state if jumped past limit
- state unchanged between 80%–100%
- state unchanged below 80%
Total: 119 tests pass. Type-check clean.
Co-Authored-By: Kimchi <noreply@kimchi.dev>
…tConfig Guard against the user setting a hardTokenBudget lower than the soft (tokenBudget) limit, which would cause the hard kill to fire before any soft warnings, silently suppressing the 80% warning and exceeded notice. - resolveBudgetConfig: Math.max(hardLimit, tokenBudget) so hard is always >= soft - add tests for clamping below-soft case and equal-to-soft case - all 762 unit tests pass Co-Authored-By: Kimchi <noreply@kimchi.dev>
Removes the nested checkBudget closure inside spawnSubagent and inlines the budget-check logic directly into processLine. This eliminates a TS2304 false positive where the CI TypeScript compiler could not resolve symbols (hideThinkingBlock, filterOutputTags, stripOutputTagWrappers) inside the closure, even though local tsc accepted it. The inlining produces identical runtime behavior. Co-Authored-By: Kimchi <noreply@kimchi.dev>
The previous documentation falsely claimed the subagent "receives a warning at 80% and a wrap-up notice at 100%, giving it a chance to finish gracefully." This was never true — the warnings are injected into the parent's output stream, not sent back to the subagent. The subagent only knows the budget limits from its static system prompt and has no runtime usage feedback. Changes: - orchestrator-system-prompt.ts: Reword tokenBudget section to state that the subagent knows limits from its system prompt only, with no runtime feedback. Warnings are for the parent, not the child. - prompt-transformer.ts formatBudgetSection: Reword subagent budget section to tell the subagent it won't receive runtime usage updates and should size its work to fit the budget upfront. - subagent.ts tokenBudget description: Clarify that (1) only uncached input + output tokens count (cache-read excluded), (2) warnings are shown in parent output, (3) subagent knows limits from system prompt only, with no real-time feedback. - Deleted stray untracked kimchi-session-*.html files from working tree. All tests pass. Lint + typecheck clean. Co-Authored-By: Kimchi <noreply@kimchi.dev>
The previous stdin-pipe approach (commit 2477f2e) was broken: opening the subagent's stdin as a pipe caused pi-coding-agent's print-mode startup to block forever inside readPipedStdin, which waits for stdin EOF before proceeding. The parent never closes its write end during the subagent's lifetime, so the subagent never reached its model loop. This commit replaces that mechanism with an in-subagent turn_end handler that needs no IPC. The subagent reads its own usage data (already flowing through the pi-coding-agent event bus), maintains a local state machine, and on transitions across the soft-budget thresholds (80%, 100%) injects a steering user message into its own conversation via pi.sendMessage. The model sees the warning before its next LLM call. Parent side (subagent.ts): - Revert stdio to ["ignore", "pipe", "pipe"] — fixes the startup hang. - Remove KIMCHI_SUBAGENT_SUPPORTS_BUDGET_FEEDBACK env var (unused now). - Remove the stdin-write block; keep the existing parent-side warning text in `accumulated` for the human watching kimchi's terminal. - Hard-cap kill path is unchanged (still the safety net). Subagent side (budget-feedback.ts, rewritten): - Listens on pi.on("turn_end") instead of process.stdin readline. - Pure helpers parseBudgetConfig / nextBudgetState / buildWarningText for unit-testability. - Edge-triggered: only injects on state transitions (not every turn). - Cache-read tokens excluded, mirroring the parent's resolveBudgetConfig. - display: false — warning enters the model's context but not the UI. Docs: - orchestrator-system-prompt: add latency caveat ("delivered between tool rounds, not mid-tool"). - prompt-transformer formatBudgetSection: same caveat. - subagent.ts tokenBudget description: same caveat. Tests: - 22 unit tests in budget-feedback.test.ts covering pure helpers and the turn_end handler (no-op gates, edge transitions, cache-read exclusion, normal-to-exceeded jump, missing usage). - All existing tests still pass: 784 total, lint + typecheck clean. Verified end-to-end: subagent boots cleanly with stdin ignored, turn_end handler fires after each turn, warning/exceeded messages inject correctly when usage crosses thresholds, no re-injection on subsequent turns in the same state.
b98b0ff to
3bcaf3e
Compare
… prompt, and hard concurrency cap
- budget-feedback.ts: always activates in subagent mode; emits actionable
usage reports every turn even when no tokenBudget is set. Reports include
concrete STOP instructions (>50K tokens/turn threshold).
- prompt-transformer.ts: {{BUDGET}} now renders a ## Token Usage Tracking
section with 4 actionable rules when no budget is set. When budget is set,
renders ## Token Budget with 5 discipline rules (pace at 50%, stop at 80%,
hard ceiling non-negotiable, return early to parent for fresh agent).
- subagent-system-prompt.ts: opening paragraph now frames token efficiency
as the PRIMARY CONSTRAINT and explicitly says do NOT over-investigate.
- subagent.ts: adds MAX_CONCURRENT_SUBAGENTS = 50 hard cap enforced at
execute time. Returns an error if cap is reached. Tracks active count
with ++/-- around spawnSubagent() including .finally() cleanup.
- budget-feedback.test.ts: adds tests for usage reporting without budget.
All tests pass (781/781 + 3 skipped).
- Add maxTurns parameter (default 40) to subagent tool. When exceeded,
kills the subagent with 'max_turns_exceeded' reason.
- Add output loop detection: if model says 'summary' 3+ times while
continuing to make tool calls, kills with 'output_loop' reason.
- Add 'max_turns_exceeded' and 'output_loop' to SubagentFailureReason.
- Add tests for resolveBudgetConfig (was missing coverage).
These limits prevent runaway subagents that have finished their task
but keep making tool calls in a loop (observed: 524+ turns, 20M+
tokens consumed after first '{summary}' response).
Co-Authored-By: Kimchi <noreply@kimchi.dev>
After review, 20 concurrent subagents is a more conservative default that still allows parallel research/analysis while preventing fork-bombing the host. Can be raised later if needed. Co-Authored-By: Kimchi <noreply@kimchi.dev>
Replace the instant-kill token budget with a three-tier advisory system:
Changes:
Tests:
All 96 tests pass. Type-check clean.
Kimchi Summary
What changed
Implements a two-tier token budgeting system for subagents with soft advisory limits (warnings at 80%, wrap-up at 100%) and hard kill ceilings (default 150% of soft). Adds real-time budget feedback injected into subagent conversations after each turn, plus safety limits on maximum turns and concurrent subagents.
Why
Prevents runaway token consumption in delegated subagent processes by giving the model visibility into its burn rate before hitting hard limits, while providing hard guardrails for the parent process.
Key changes
src/extensions/budget-feedback.ts: New extension that accumulates input/output tokens (excluding cache-read) after each turn and injectsbudget_warningorbudget_exceededmessages into the conversation when crossing 80% or 100% of the soft budgetsrc/extensions/subagent.ts:resolveBudgetConfigandcheckBudgetStatefor soft/hard budget state machinemaxTurnsparameter (default 40) and output loop detection (triggers when model reports "summary" 3+ times but continues making tool calls)MAX_CONCURRENT_SUBAGENTSlimit (20)KIMCHI_SUBAGENT_SOFT_BUDGETandKIMCHI_SUBAGENT_HARD_BUDGETenvironment variablessrc/extensions/orchestration/prompt-transformer/: AddSubagentBudgetInfointerface andformatBudgetSectionto include token discipline guidelines in subagent system prompts; update orchestrator prompts to explain soft vs. hard budget semanticssrc/cli.ts: RegisterbudgetFeedbackExtensionin the CLI extension pipelineImpact
tokenBudgetparameter is now a soft advisory cap rather than a hard kill threshold; use the newhardTokenBudgetparameter for strict enforcementhardTokenBudget(hard ceiling) andmaxTurns(turn limit) added to the subagent tool schema"max_turns_exceeded"or"output_loop"failure reasons