feat(subagent): implement Anthropic-style soft token budget by igor-susic1 · Pull Request #129 · castai/kimchi-dev

igor-susic1 · 2026-05-04T14:14:53Z

Replace the instant-kill token budget with a three-tier advisory system:

80%: warning injected into model output ("Consider wrapping up")
100%: exceeded notice ("Finishing current action, then stopping")
150%: hard kill via SIGTERM (or explicit hardTokenBudget)

Changes:

Add TokenBudgetConfig + BudgetState interfaces
Add resolveBudgetConfig() helper
Add hardTokenBudget parameter to SubagentParams
checkBudget() state machine in spawnSubagent
Inject budget into subagent system prompt via env vars
Add SubagentBudgetInfo to buildSubagentSystemPrompt
formatBudgetSection renders soft/hard limits in prompt
Update system prompt guidance: omit budgets by default, warn about attachment token costs, explain soft vs hard

Tests:

resolveBudgetConfig: 5 cases (null, zero, default 150%, explicit, fallback)
prompt-transformer: 3 cases (no budget, soft+hard, soft only)

All 96 tests pass. Type-check clean.

Kimchi Summary

What changed

Implements a two-tier token budgeting system for subagents with soft advisory limits (warnings at 80%, wrap-up at 100%) and hard kill ceilings (default 150% of soft). Adds real-time budget feedback injected into subagent conversations after each turn, plus safety limits on maximum turns and concurrent subagents.

Why

Prevents runaway token consumption in delegated subagent processes by giving the model visibility into its burn rate before hitting hard limits, while providing hard guardrails for the parent process.

Key changes

src/extensions/budget-feedback.ts: New extension that accumulates input/output tokens (excluding cache-read) after each turn and injects budget_warning or budget_exceeded messages into the conversation when crossing 80% or 100% of the soft budget
src/extensions/subagent.ts:
- Add resolveBudgetConfig and checkBudgetState for soft/hard budget state machine
- Add maxTurns parameter (default 40) and output loop detection (triggers when model reports "summary" 3+ times but continues making tool calls)
- Enforce MAX_CONCURRENT_SUBAGENTS limit (20)
- Pass budget configuration to child processes via KIMCHI_SUBAGENT_SOFT_BUDGET and KIMCHI_SUBAGENT_HARD_BUDGET environment variables
src/extensions/orchestration/prompt-transformer/: Add SubagentBudgetInfo interface and formatBudgetSection to include token discipline guidelines in subagent system prompts; update orchestrator prompts to explain soft vs. hard budget semantics
src/cli.ts: Register budgetFeedbackExtension in the CLI extension pipeline

Impact

Breaking change: The tokenBudget parameter is now a soft advisory cap rather than a hard kill threshold; use the new hardTokenBudget parameter for strict enforcement
New subagent parameters: hardTokenBudget (hard ceiling) and maxTurns (turn limit) added to the subagent tool schema
Behavior change: Subagents now receive injected budget status messages between turn boundaries (not mid-tool) when crossing thresholds; cache-read tokens are excluded from budget calculations
New failure reasons: Subagents may now exit with "max_turns_exceeded" or "output_loop" failure reasons

kimchi-review · 2026-05-04T14:14:57Z

Kimchi Code Review

Property	Value
Commit	`afd8425`
Author	@igor-susic1
Files changed	0
Review status	Completed
Comments	2 (2 info)
Duration	41s

Summary

📊 Review Score: 92/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 3/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage added for parseSubagentBudgetFromEnv (validating NaN, Infinity, negative, and empty string handling) and resolveBudgetConfig (including NaN/Infinity propagation and default hard limit calculation at 150%). Integration tests for buildSubagentSystemPrompt verify correct budget section formatting. The budget enforcement logic in spawnSubagent is covered implicitly through the parameter resolution tests.

📝 Found 2 issue(s). See inline comments for details.

What to expect

Kimchi will analyze the changes in this pull request and post:

A summary of the overall changes
Inline comments on specific lines with findings categorized by issue type

The review typically completes within a few minutes. This comment will be updated once the review is ready.

Interact with Kimchi

@kimchi review — re-trigger a full review on the latest commit
@kimchi summary — regenerate the PR summary
@kimchi ignore — skip this PR (no review will be posted)
Reply to any inline comment to ask follow-up questions or request clarification

Configuration

Reviews are configured by your organization admin.
Review instructions, excluded directories, and severity thresholds can be adjusted per repository in the Kimchi dashboard.

Powered by Kimchi — AI-powered code review by CAST AI

kimchi-review

📊 Review Score: 88/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 3/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage added for resolveBudgetConfig logic (default hard limits, explicit overrides, null handling) and buildSubagentSystemPrompt budget section formatting. Tests validate both happy paths and edge cases like zero/negative inputs.

📝 Found 3 issue(s). See inline comments for details.

igor-susic1 · 2026-05-04T15:19:28Z

@kimchi review

kimchi-review · 2026-05-04T15:19:32Z

🔄 Starting review on afd8425…
Triggered by @igor-susic1 via the command.

kimchi-review

📊 Review Score: 92/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 3/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage added for parseSubagentBudgetFromEnv (validating NaN, Infinity, negative, and empty string handling) and resolveBudgetConfig (including NaN/Infinity propagation and default hard limit calculation at 150%). Integration tests for buildSubagentSystemPrompt verify correct budget section formatting. The budget enforcement logic in spawnSubagent is covered implicitly through the parameter resolution tests.

📝 Found 2 issue(s). See inline comments for details.

Replace the instant-kill token budget with a three-tier advisory system: - 80%: warning injected into model output ("Consider wrapping up") - 100%: exceeded notice ("Finishing current action, then stopping") - 150%: hard kill via SIGTERM (or explicit hardTokenBudget) Changes: - Add TokenBudgetConfig + BudgetState interfaces - Add resolveBudgetConfig() helper - Add hardTokenBudget parameter to SubagentParams - checkBudget() state machine in spawnSubagent - Inject budget into subagent system prompt via env vars - Add SubagentBudgetInfo to buildSubagentSystemPrompt - formatBudgetSection renders soft/hard limits in prompt - Update system prompt guidance: omit budgets by default, warn about attachment token costs, explain soft vs hard Tests: - resolveBudgetConfig: 5 cases (null, zero, default 150%, explicit, fallback) - prompt-transformer: 3 cases (no budget, soft+hard, soft only) All 96 tests pass. Type-check clean. Co-Authored-By: Kimchi <noreply@kimchi.dev>

- resolveBudgetConfig: guard against NaN/Infinity for both tokenBudget and hardTokenBudget - spawnSubagent: use conditional spreading to avoid setting empty env vars - prompt-enrichment: replace inline Number() parsing with parseSubagentBudgetFromEnv validator - parseSubagentBudgetFromEnv: new exported helper that validates env var strings with Number.isFinite() and positive checks, returns undefined on any invalid input - Add 4 NaN/Infinity tests for resolveBudgetConfig - Add 10 validation tests for parseSubagentBudgetFromEnv (undefined, empty, NaN, non-numeric, negative, valid, valid with invalid hard limit) Total: 110 tests pass. Type-check clean. Co-Authored-By: Kimchi <noreply@kimchi.dev>

Extract the budget state machine from the spawnSubagent closure into a pure exported function checkBudgetState(). This makes the runtime budget logic independently unit-testable. - checkBudgetState(input, output, config, currentState) → { state, warning, kill } - Cloned soft/exceeded/hard-kill logic unchanged from the closure - Export SoftBudgetState, TokenBudgetConfig for test types - spawnSubagent delegates to checkBudgetState for warnings and kill decision Tests: - no budget config → no-op - 80% trigger → warning - no double-warning from warning state - no warning from exceeded state - 100% trigger → exceeded notice - 150% trigger → hard kill - kill from normal state if jumped past limit - state unchanged between 80%–100% - state unchanged below 80% Total: 119 tests pass. Type-check clean. Co-Authored-By: Kimchi <noreply@kimchi.dev>

…tConfig Guard against the user setting a hardTokenBudget lower than the soft (tokenBudget) limit, which would cause the hard kill to fire before any soft warnings, silently suppressing the 80% warning and exceeded notice. - resolveBudgetConfig: Math.max(hardLimit, tokenBudget) so hard is always >= soft - add tests for clamping below-soft case and equal-to-soft case - all 762 unit tests pass Co-Authored-By: Kimchi <noreply@kimchi.dev>

Removes the nested checkBudget closure inside spawnSubagent and inlines the budget-check logic directly into processLine. This eliminates a TS2304 false positive where the CI TypeScript compiler could not resolve symbols (hideThinkingBlock, filterOutputTags, stripOutputTagWrappers) inside the closure, even though local tsc accepted it. The inlining produces identical runtime behavior. Co-Authored-By: Kimchi <noreply@kimchi.dev>

The previous documentation falsely claimed the subagent "receives a warning at 80% and a wrap-up notice at 100%, giving it a chance to finish gracefully." This was never true — the warnings are injected into the parent's output stream, not sent back to the subagent. The subagent only knows the budget limits from its static system prompt and has no runtime usage feedback. Changes: - orchestrator-system-prompt.ts: Reword tokenBudget section to state that the subagent knows limits from its system prompt only, with no runtime feedback. Warnings are for the parent, not the child. - prompt-transformer.ts formatBudgetSection: Reword subagent budget section to tell the subagent it won't receive runtime usage updates and should size its work to fit the budget upfront. - subagent.ts tokenBudget description: Clarify that (1) only uncached input + output tokens count (cache-read excluded), (2) warnings are shown in parent output, (3) subagent knows limits from system prompt only, with no real-time feedback. - Deleted stray untracked kimchi-session-*.html files from working tree. All tests pass. Lint + typecheck clean. Co-Authored-By: Kimchi <noreply@kimchi.dev>

The previous stdin-pipe approach (commit 2477f2e) was broken: opening the subagent's stdin as a pipe caused pi-coding-agent's print-mode startup to block forever inside readPipedStdin, which waits for stdin EOF before proceeding. The parent never closes its write end during the subagent's lifetime, so the subagent never reached its model loop. This commit replaces that mechanism with an in-subagent turn_end handler that needs no IPC. The subagent reads its own usage data (already flowing through the pi-coding-agent event bus), maintains a local state machine, and on transitions across the soft-budget thresholds (80%, 100%) injects a steering user message into its own conversation via pi.sendMessage. The model sees the warning before its next LLM call. Parent side (subagent.ts): - Revert stdio to ["ignore", "pipe", "pipe"] — fixes the startup hang. - Remove KIMCHI_SUBAGENT_SUPPORTS_BUDGET_FEEDBACK env var (unused now). - Remove the stdin-write block; keep the existing parent-side warning text in `accumulated` for the human watching kimchi's terminal. - Hard-cap kill path is unchanged (still the safety net). Subagent side (budget-feedback.ts, rewritten): - Listens on pi.on("turn_end") instead of process.stdin readline. - Pure helpers parseBudgetConfig / nextBudgetState / buildWarningText for unit-testability. - Edge-triggered: only injects on state transitions (not every turn). - Cache-read tokens excluded, mirroring the parent's resolveBudgetConfig. - display: false — warning enters the model's context but not the UI. Docs: - orchestrator-system-prompt: add latency caveat ("delivered between tool rounds, not mid-tool"). - prompt-transformer formatBudgetSection: same caveat. - subagent.ts tokenBudget description: same caveat. Tests: - 22 unit tests in budget-feedback.test.ts covering pure helpers and the turn_end handler (no-op gates, edge transitions, cache-read exclusion, normal-to-exceeded jump, missing usage). - All existing tests still pass: 784 total, lint + typecheck clean. Verified end-to-end: subagent boots cleanly with stdin ignored, turn_end handler fires after each turn, warning/exceeded messages inject correctly when usage crosses thresholds, no re-injection on subsequent turns in the same state.

… prompt, and hard concurrency cap - budget-feedback.ts: always activates in subagent mode; emits actionable usage reports every turn even when no tokenBudget is set. Reports include concrete STOP instructions (>50K tokens/turn threshold). - prompt-transformer.ts: {{BUDGET}} now renders a ## Token Usage Tracking section with 4 actionable rules when no budget is set. When budget is set, renders ## Token Budget with 5 discipline rules (pace at 50%, stop at 80%, hard ceiling non-negotiable, return early to parent for fresh agent). - subagent-system-prompt.ts: opening paragraph now frames token efficiency as the PRIMARY CONSTRAINT and explicitly says do NOT over-investigate. - subagent.ts: adds MAX_CONCURRENT_SUBAGENTS = 50 hard cap enforced at execute time. Returns an error if cap is reached. Tracks active count with ++/-- around spawnSubagent() including .finally() cleanup. - budget-feedback.test.ts: adds tests for usage reporting without budget. All tests pass (781/781 + 3 skipped).

- Add maxTurns parameter (default 40) to subagent tool. When exceeded, kills the subagent with 'max_turns_exceeded' reason. - Add output loop detection: if model says 'summary' 3+ times while continuing to make tool calls, kills with 'output_loop' reason. - Add 'max_turns_exceeded' and 'output_loop' to SubagentFailureReason. - Add tests for resolveBudgetConfig (was missing coverage). These limits prevent runaway subagents that have finished their task but keep making tool calls in a loop (observed: 524+ turns, 20M+ tokens consumed after first '{summary}' response). Co-Authored-By: Kimchi <noreply@kimchi.dev>

After review, 20 concurrent subagents is a more conservative default that still allows parallel research/analysis while preventing fork-bombing the host. Can be raised later if needed. Co-Authored-By: Kimchi <noreply@kimchi.dev>

kimchi-review Bot reviewed May 4, 2026

View reviewed changes

Comment thread src/extensions/subagent.ts

Comment thread src/extensions/orchestration/prompt-enrichment.ts

Comment thread src/extensions/subagent.ts

kimchi-review Bot reviewed May 4, 2026

View reviewed changes

Comment thread src/extensions/subagent.ts

Comment thread src/extensions/subagent.ts

igor-susic1 force-pushed the LLM-1502-budgeting-change branch from 71d7e92 to fa28d7f Compare May 5, 2026 06:20

igor-susic1 and others added 9 commits May 5, 2026 10:40

Resolve rebase

2f81e7f

fix: message for exceeding the limit

3bcaf3e

igor-susic1 force-pushed the LLM-1502-budgeting-change branch from b98b0ff to 3bcaf3e Compare May 5, 2026 08:47

igor-susic1 and others added 4 commits May 5, 2026 10:57

fix: lint

ef303cd

feat(subagent): lower hard concurrency cap from 50 to 20

d94172c

After review, 20 concurrent subagents is a more conservative default that still allows parallel research/analysis while preventing fork-bombing the host. Can be raised later if needed. Co-Authored-By: Kimchi <noreply@kimchi.dev>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(subagent): implement Anthropic-style soft token budget#129

feat(subagent): implement Anthropic-style soft token budget#129
igor-susic1 wants to merge 13 commits intomasterfrom
LLM-1502-budgeting-change

igor-susic1 commented May 4, 2026 •

edited by kimchi-review Bot

Loading

Uh oh!

kimchi-review Bot commented May 4, 2026 •

edited

Loading

Uh oh!

kimchi-review Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

igor-susic1 commented May 4, 2026

Uh oh!

kimchi-review Bot commented May 4, 2026

Uh oh!

kimchi-review Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igor-susic1 commented May 4, 2026 • edited by kimchi-review Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kimchi Summary

What changed

Why

Key changes

Impact

Uh oh!

kimchi-review Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Kimchi Code Review

Summary

Uh oh!

kimchi-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

igor-susic1 commented May 4, 2026

Uh oh!

kimchi-review Bot commented May 4, 2026

Uh oh!

kimchi-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

igor-susic1 commented May 4, 2026 •

edited by kimchi-review Bot

Loading

kimchi-review Bot commented May 4, 2026 •

edited

Loading