Skip to content

Cap append-only JSONL logs via a shared dream-driven retention pass#461

Merged
rockfordlhotka merged 3 commits into
mainfrom
claude/rockbot-agent-log-growth-Trynw
Jun 5, 2026
Merged

Cap append-only JSONL logs via a shared dream-driven retention pass#461
rockfordlhotka merged 3 commits into
mainfrom
claude/rockbot-agent-log-growth-Trynw

Conversation

@rockfordlhotka

Copy link
Copy Markdown
Member

The skill-usage, tool-call, feedback, skill-resource-usage, and
wisp-executions JSONL logs were append-only with no rotation, so they
grew without bound (per-session directories accumulated files forever;
the two global files grew line-by-line indefinitely).

Add a shared JsonlLogRetention helper plus an opt-in IPrunableLog
contract. Each file-backed store implements IPrunableLog and delegates
to the helper with its own on-disk layout: per-session directories drop
aged session files then cap file count; single-file logs trim to a
trailing line budget (serialized against their writer). DreamService
resolves all registered IPrunableLog instances and prunes them once per
cycle, before the memory-count early-return so retention runs every
cycle. New DreamOptions knobs (LogRetentionEnabled, MaxFileAge,
MaxFilesPerDirectory, MaxLinesPerFile) control the policy; no new
background service is introduced.

https://claude.ai/code/session_012sTRRT47bKJwQBSksmFuLm

claude and others added 3 commits June 3, 2026 06:00
The skill-usage, tool-call, feedback, skill-resource-usage, and
wisp-executions JSONL logs were append-only with no rotation, so they
grew without bound (per-session directories accumulated files forever;
the two global files grew line-by-line indefinitely).

Add a shared JsonlLogRetention helper plus an opt-in IPrunableLog
contract. Each file-backed store implements IPrunableLog and delegates
to the helper with its own on-disk layout: per-session directories drop
aged session files then cap file count; single-file logs trim to a
trailing line budget (serialized against their writer). DreamService
resolves all registered IPrunableLog instances and prunes them once per
cycle, before the memory-count early-return so retention runs every
cycle. New DreamOptions knobs (LogRetentionEnabled, MaxFileAge,
MaxFilesPerDirectory, MaxLinesPerFile) control the policy; no new
background service is introduced.

https://claude.ai/code/session_012sTRRT47bKJwQBSksmFuLm
The retention knobs added by this PR were never bound to configuration —
WithDreaming() took no callback, so DreamOptions (including the new
LogRetention* values, and the pre-existing Dream:CronSchedule) were always
defaults. Bind the Dream section in Program.cs and surface the four knobs as
agent.logRetention.* Helm values + Dream__* ConfigMap keys. The ConfigMap uses
`dig` rather than `default` so an explicit false/0 (which disable a dimension)
are honoured instead of being swallowed as "empty".

Values are sized from observed live traffic: maxFileAge 30d (floored at the
widest dream query window so pruning never starves a pass), maxFilesPerDirectory
1000 (backstop; age pruning is the real control), maxLinesPerFile 10000 (~11 MB
for the wisp log at ~1.1 KB/line, vs the never-trimming 50k code default).

Also close the persistent-session gap: age/count pruning never reaps a
continuously-written {sessionId}.jsonl (blazor-session, cli-session) because it
is never aged out and never the oldest file — on the live cluster the UI
session's tool-call log alone was 2 MB and growing. Add
JsonlLogRetention.TrimSessionFilesAsync, which line-trims each surviving session
file to MaxLinesPerFile while holding the writer's own per-session semaphore so a
trim can never race an append; wire it into the three per-session stores after
their age/count prune. A byte-size gate skips the tiny ephemeral files unread.

Tests: per-session line-trim (over-budget trimmed, under-budget skipped, correct
lock key, store integration) and config-binding regression (TimeSpan/bool/0
shapes from the ConfigMap). Docs updated (dream-service, agent-host, values).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tag for the log-retention image deployed to the live cluster for testing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 28c7e0a into main Jun 5, 2026
1 check passed
@rockfordlhotka rockfordlhotka deleted the claude/rockbot-agent-log-growth-Trynw branch June 5, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants