Skip to content

feat(subagent): implement Anthropic-style soft token budget#129

Open
igor-susic1 wants to merge 13 commits intomasterfrom
LLM-1502-budgeting-change
Open

feat(subagent): implement Anthropic-style soft token budget#129
igor-susic1 wants to merge 13 commits intomasterfrom
LLM-1502-budgeting-change

Conversation

@igor-susic1
Copy link
Copy Markdown
Contributor

@igor-susic1 igor-susic1 commented May 4, 2026

Replace the instant-kill token budget with a three-tier advisory system:

  • 80%: warning injected into model output ("Consider wrapping up")
  • 100%: exceeded notice ("Finishing current action, then stopping")
  • 150%: hard kill via SIGTERM (or explicit hardTokenBudget)

Changes:

  • Add TokenBudgetConfig + BudgetState interfaces
  • Add resolveBudgetConfig() helper
  • Add hardTokenBudget parameter to SubagentParams
  • checkBudget() state machine in spawnSubagent
  • Inject budget into subagent system prompt via env vars
  • Add SubagentBudgetInfo to buildSubagentSystemPrompt
  • formatBudgetSection renders soft/hard limits in prompt
  • Update system prompt guidance: omit budgets by default, warn about attachment token costs, explain soft vs hard

Tests:

  • resolveBudgetConfig: 5 cases (null, zero, default 150%, explicit, fallback)
  • prompt-transformer: 3 cases (no budget, soft+hard, soft only)

All 96 tests pass. Type-check clean.


Kimchi Summary

What changed

Implements a two-tier token budgeting system for subagents with soft advisory limits (warnings at 80%, wrap-up at 100%) and hard kill ceilings (default 150% of soft). Adds real-time budget feedback injected into subagent conversations after each turn, plus safety limits on maximum turns and concurrent subagents.

Why

Prevents runaway token consumption in delegated subagent processes by giving the model visibility into its burn rate before hitting hard limits, while providing hard guardrails for the parent process.

Key changes

  • src/extensions/budget-feedback.ts: New extension that accumulates input/output tokens (excluding cache-read) after each turn and injects budget_warning or budget_exceeded messages into the conversation when crossing 80% or 100% of the soft budget
  • src/extensions/subagent.ts:
    • Add resolveBudgetConfig and checkBudgetState for soft/hard budget state machine
    • Add maxTurns parameter (default 40) and output loop detection (triggers when model reports "summary" 3+ times but continues making tool calls)
    • Enforce MAX_CONCURRENT_SUBAGENTS limit (20)
    • Pass budget configuration to child processes via KIMCHI_SUBAGENT_SOFT_BUDGET and KIMCHI_SUBAGENT_HARD_BUDGET environment variables
  • src/extensions/orchestration/prompt-transformer/: Add SubagentBudgetInfo interface and formatBudgetSection to include token discipline guidelines in subagent system prompts; update orchestrator prompts to explain soft vs. hard budget semantics
  • src/cli.ts: Register budgetFeedbackExtension in the CLI extension pipeline

Impact

  • Breaking change: The tokenBudget parameter is now a soft advisory cap rather than a hard kill threshold; use the new hardTokenBudget parameter for strict enforcement
  • New subagent parameters: hardTokenBudget (hard ceiling) and maxTurns (turn limit) added to the subagent tool schema
  • Behavior change: Subagents now receive injected budget status messages between turn boundaries (not mid-tool) when crossing thresholds; cache-read tokens are excluded from budget calculations
  • New failure reasons: Subagents may now exit with "max_turns_exceeded" or "output_loop" failure reasons

@kimchi-review
Copy link
Copy Markdown

kimchi-review Bot commented May 4, 2026

Kimchi Code Review

Property Value
Commit afd8425
Author @igor-susic1
Files changed 0
Review status Completed
Comments 2 (2 info)
Duration 41s

Summary

📊 Review Score: 92/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 3/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage added for parseSubagentBudgetFromEnv (validating NaN, Infinity, negative, and empty string handling) and resolveBudgetConfig (including NaN/Infinity propagation and default hard limit calculation at 150%). Integration tests for buildSubagentSystemPrompt verify correct budget section formatting. The budget enforcement logic in spawnSubagent is covered implicitly through the parameter resolution tests.

📝 Found 2 issue(s). See inline comments for details.

What to expect

Kimchi will analyze the changes in this pull request and post:

  • A summary of the overall changes
  • Inline comments on specific lines with findings categorized by issue type

The review typically completes within a few minutes. This comment will be updated once the review is ready.

Interact with Kimchi
  • @kimchi review — re-trigger a full review on the latest commit
  • @kimchi summary — regenerate the PR summary
  • @kimchi ignore — skip this PR (no review will be posted)
  • Reply to any inline comment to ask follow-up questions or request clarification
Configuration

Reviews are configured by your organization admin.
Review instructions, excluded directories, and severity thresholds can be adjusted per repository in the Kimchi dashboard.


Powered by Kimchi — AI-powered code review by CAST AI

Copy link
Copy Markdown

@kimchi-review kimchi-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📊 Review Score: 88/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 3/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage added for resolveBudgetConfig logic (default hard limits, explicit overrides, null handling) and buildSubagentSystemPrompt budget section formatting. Tests validate both happy paths and edge cases like zero/negative inputs.

📝 Found 3 issue(s). See inline comments for details.

Comment thread src/extensions/subagent.ts
Comment thread src/extensions/orchestration/prompt-enrichment.ts
Comment thread src/extensions/subagent.ts
@igor-susic1
Copy link
Copy Markdown
Contributor Author

@kimchi review

@kimchi-review
Copy link
Copy Markdown

kimchi-review Bot commented May 4, 2026

🔄 Starting review on afd8425
Triggered by @igor-susic1 via the command.

Copy link
Copy Markdown

@kimchi-review kimchi-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📊 Review Score: 92/100 (overall code quality — 0 lowest, 100 highest)
⏱️ Estimated effort to review: 3/5 (1 = trivial, 5 = very complex)

🧪 Tests: yes — Comprehensive test coverage added for parseSubagentBudgetFromEnv (validating NaN, Infinity, negative, and empty string handling) and resolveBudgetConfig (including NaN/Infinity propagation and default hard limit calculation at 150%). Integration tests for buildSubagentSystemPrompt verify correct budget section formatting. The budget enforcement logic in spawnSubagent is covered implicitly through the parameter resolution tests.

📝 Found 2 issue(s). See inline comments for details.

Comment thread src/extensions/subagent.ts
Comment thread src/extensions/subagent.ts
@igor-susic1 igor-susic1 force-pushed the LLM-1502-budgeting-change branch from 71d7e92 to fa28d7f Compare May 5, 2026 06:20
igor-susic1 and others added 9 commits May 5, 2026 10:40
Replace the instant-kill token budget with a three-tier advisory system:

- 80%: warning injected into model output ("Consider wrapping up")
- 100%: exceeded notice ("Finishing current action, then stopping")
- 150%: hard kill via SIGTERM (or explicit hardTokenBudget)

Changes:
- Add TokenBudgetConfig + BudgetState interfaces
- Add resolveBudgetConfig() helper
- Add hardTokenBudget parameter to SubagentParams
- checkBudget() state machine in spawnSubagent
- Inject budget into subagent system prompt via env vars
- Add SubagentBudgetInfo to buildSubagentSystemPrompt
- formatBudgetSection renders soft/hard limits in prompt
- Update system prompt guidance: omit budgets by default,
  warn about attachment token costs, explain soft vs hard

Tests:
- resolveBudgetConfig: 5 cases (null, zero, default 150%, explicit, fallback)
- prompt-transformer: 3 cases (no budget, soft+hard, soft only)

All 96 tests pass. Type-check clean.

Co-Authored-By: Kimchi <noreply@kimchi.dev>
- resolveBudgetConfig: guard against NaN/Infinity for both tokenBudget and hardTokenBudget
- spawnSubagent: use conditional spreading to avoid setting empty env vars
- prompt-enrichment: replace inline Number() parsing with parseSubagentBudgetFromEnv validator
- parseSubagentBudgetFromEnv: new exported helper that validates env var strings with Number.isFinite() and positive checks, returns undefined on any invalid input
- Add 4 NaN/Infinity tests for resolveBudgetConfig
- Add 10 validation tests for parseSubagentBudgetFromEnv (undefined, empty, NaN, non-numeric, negative, valid, valid with invalid hard limit)

Total: 110 tests pass. Type-check clean.

Co-Authored-By: Kimchi <noreply@kimchi.dev>
Extract the budget state machine from the spawnSubagent closure into
a pure exported function checkBudgetState(). This makes the runtime
budget logic independently unit-testable.

- checkBudgetState(input, output, config, currentState) → { state, warning, kill }
- Cloned soft/exceeded/hard-kill logic unchanged from the closure
- Export SoftBudgetState, TokenBudgetConfig for test types
- spawnSubagent delegates to checkBudgetState for warnings and kill decision

Tests:
- no budget config → no-op
- 80% trigger → warning
- no double-warning from warning state
- no warning from exceeded state
- 100% trigger → exceeded notice
- 150% trigger → hard kill
- kill from normal state if jumped past limit
- state unchanged between 80%–100%
- state unchanged below 80%

Total: 119 tests pass. Type-check clean.

Co-Authored-By: Kimchi <noreply@kimchi.dev>
…tConfig

Guard against the user setting a hardTokenBudget lower than the soft
(tokenBudget) limit, which would cause the hard kill to fire before
any soft warnings, silently suppressing the 80% warning and exceeded
notice.

- resolveBudgetConfig: Math.max(hardLimit, tokenBudget) so hard is always >= soft
- add tests for clamping below-soft case and equal-to-soft case
- all 762 unit tests pass

Co-Authored-By: Kimchi <noreply@kimchi.dev>
Removes the nested checkBudget closure inside spawnSubagent and inlines
the budget-check logic directly into processLine. This eliminates a
TS2304 false positive where the CI TypeScript compiler could not resolve
symbols (hideThinkingBlock, filterOutputTags, stripOutputTagWrappers)
inside the closure, even though local tsc accepted it. The inlining
produces identical runtime behavior.

Co-Authored-By: Kimchi <noreply@kimchi.dev>
The previous documentation falsely claimed the subagent "receives a
warning at 80% and a wrap-up notice at 100%, giving it a chance to
finish gracefully." This was never true — the warnings are injected into
the parent's output stream, not sent back to the subagent. The subagent
only knows the budget limits from its static system prompt and has no
runtime usage feedback.

Changes:

- orchestrator-system-prompt.ts: Reword tokenBudget section to state that
the subagent knows limits from its system prompt only, with no runtime
feedback. Warnings are for the parent, not the child.

- prompt-transformer.ts formatBudgetSection: Reword subagent budget section
to tell the subagent it won't receive runtime usage updates and should
size its work to fit the budget upfront.

- subagent.ts tokenBudget description: Clarify that (1) only uncached
input + output tokens count (cache-read excluded), (2) warnings are
shown in parent output, (3) subagent knows limits from system prompt
only, with no real-time feedback.

- Deleted stray untracked kimchi-session-*.html files from working tree.

All tests pass. Lint + typecheck clean.

Co-Authored-By: Kimchi <noreply@kimchi.dev>
   The previous stdin-pipe approach (commit 2477f2e) was broken: opening the
   subagent's stdin as a pipe caused pi-coding-agent's print-mode startup to
   block forever inside readPipedStdin, which waits for stdin EOF before
   proceeding. The parent never closes its write end during the subagent's
   lifetime, so the subagent never reached its model loop.

   This commit replaces that mechanism with an in-subagent turn_end handler
   that needs no IPC. The subagent reads its own usage data (already flowing
   through the pi-coding-agent event bus), maintains a local state machine,
   and on transitions across the soft-budget thresholds (80%, 100%) injects
   a steering user message into its own conversation via pi.sendMessage.
   The model sees the warning before its next LLM call.

   Parent side (subagent.ts):
   - Revert stdio to ["ignore", "pipe", "pipe"] — fixes the startup hang.
   - Remove KIMCHI_SUBAGENT_SUPPORTS_BUDGET_FEEDBACK env var (unused now).
   - Remove the stdin-write block; keep the existing parent-side warning
     text in `accumulated` for the human watching kimchi's terminal.
   - Hard-cap kill path is unchanged (still the safety net).

   Subagent side (budget-feedback.ts, rewritten):
   - Listens on pi.on("turn_end") instead of process.stdin readline.
   - Pure helpers parseBudgetConfig / nextBudgetState / buildWarningText
     for unit-testability.
   - Edge-triggered: only injects on state transitions (not every turn).
   - Cache-read tokens excluded, mirroring the parent's resolveBudgetConfig.
   - display: false — warning enters the model's context but not the UI.

   Docs:
   - orchestrator-system-prompt: add latency caveat ("delivered between
     tool rounds, not mid-tool").
   - prompt-transformer formatBudgetSection: same caveat.
   - subagent.ts tokenBudget description: same caveat.

   Tests:
   - 22 unit tests in budget-feedback.test.ts covering pure helpers and
     the turn_end handler (no-op gates, edge transitions, cache-read
     exclusion, normal-to-exceeded jump, missing usage).
   - All existing tests still pass: 784 total, lint + typecheck clean.

   Verified end-to-end: subagent boots cleanly with stdin ignored,
   turn_end handler fires after each turn, warning/exceeded messages
   inject correctly when usage crosses thresholds, no re-injection on
   subsequent turns in the same state.
@igor-susic1 igor-susic1 force-pushed the LLM-1502-budgeting-change branch from b98b0ff to 3bcaf3e Compare May 5, 2026 08:47
igor-susic1 and others added 4 commits May 5, 2026 10:57
… prompt, and hard concurrency cap

- budget-feedback.ts: always activates in subagent mode; emits actionable
  usage reports every turn even when no tokenBudget is set. Reports include
  concrete STOP instructions (>50K tokens/turn threshold).

- prompt-transformer.ts: {{BUDGET}} now renders a ## Token Usage Tracking
  section with 4 actionable rules when no budget is set. When budget is set,
  renders ## Token Budget with 5 discipline rules (pace at 50%, stop at 80%,
  hard ceiling non-negotiable, return early to parent for fresh agent).

- subagent-system-prompt.ts: opening paragraph now frames token efficiency
  as the PRIMARY CONSTRAINT and explicitly says do NOT over-investigate.

- subagent.ts: adds MAX_CONCURRENT_SUBAGENTS = 50 hard cap enforced at
  execute time. Returns an error if cap is reached. Tracks active count
  with ++/-- around spawnSubagent() including .finally() cleanup.

- budget-feedback.test.ts: adds tests for usage reporting without budget.

All tests pass (781/781 + 3 skipped).
- Add maxTurns parameter (default 40) to subagent tool. When exceeded,
  kills the subagent with 'max_turns_exceeded' reason.
- Add output loop detection: if model says 'summary' 3+ times while
  continuing to make tool calls, kills with 'output_loop' reason.
- Add 'max_turns_exceeded' and 'output_loop' to SubagentFailureReason.
- Add tests for resolveBudgetConfig (was missing coverage).

These limits prevent runaway subagents that have finished their task
but keep making tool calls in a loop (observed: 524+ turns, 20M+
tokens consumed after first '{summary}' response).

Co-Authored-By: Kimchi <noreply@kimchi.dev>
After review, 20 concurrent subagents is a more conservative default
that still allows parallel research/analysis while preventing
fork-bombing the host. Can be raised later if needed.

Co-Authored-By: Kimchi <noreply@kimchi.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant