Summary
FileWispExecutionLog stores every wisp execution as one line in a single append-only wisp-executions.jsonl. Several hot-path operations call ReadAllAsync, which reads and JSON-deserializes the entire file on every call:
FindRecentFailureAsync — invoked for every successful wisp (retry detection in SpawnWispsExecutor.LogExecutionAsync).
QueryRecentAsync — invoked by eager scheduled-task promotion for every successful patrol/* wisp.
GetCanonicalBodyAsync — definition-body lookup.
So the per-wisp logging cost grows linearly with the file size, making total cost O(n²) as the log accumulates. A burst of wisp activity (or a runaway, as in the 2026-06-05 incident) makes every subsequent wisp progressively more expensive — full-file read + parse — which both burns CPU and amplifies any spike.
Why it matters
This is the last open item from the 2026-06-05 runaway investigation. The trim-loop fix (#462) and the dispatch circuit breaker (#463) stop the cause of unbounded dispatch; the JSONL retention pass (#461) caps the file size. But even at a bounded size, re-parsing the whole file on every successful wisp is wasteful, and it actively worsened the incident (each new wisp scanned a 400k-line file). Retention bounds the blast radius; it does not remove the O(n) per-wisp read.
Evidence
FileWispExecutionLog.ReadAllAsync → File.ReadAllLinesAsync + per-line JsonSerializer.Deserialize over the whole file.
- Called from
FindRecentFailureAsync / QueryRecentAsync / GetCanonicalBodyAsync, all on the success path of SpawnWispsExecutor.
Proposed directions (for discussion)
- In-memory index/cache of recent records (bounded ring or last-N-by-timestamp), refreshed on append, so
FindRecentFailureAsync/QueryRecentAsync don't touch disk per call. The retention pass already bounds the on-disk size; the in-memory view can mirror it.
- Tail-read instead of full-read for the "recent" queries — read only the last N KB / N lines, since all three consumers want recent records, not the full history.
- Index by definition hash (e.g. a companion map) so
FindRecentFailureAsync/GetCanonicalBodyAsync are O(1)-ish rather than full scans.
- Longer term: move off a single flat JSONL to a small embedded store if query patterns grow.
Any of these should come with a benchmark/test asserting per-append cost stays flat as the log grows.
Related
Summary
FileWispExecutionLogstores every wisp execution as one line in a single append-onlywisp-executions.jsonl. Several hot-path operations callReadAllAsync, which reads and JSON-deserializes the entire file on every call:FindRecentFailureAsync— invoked for every successful wisp (retry detection inSpawnWispsExecutor.LogExecutionAsync).QueryRecentAsync— invoked by eager scheduled-task promotion for every successfulpatrol/*wisp.GetCanonicalBodyAsync— definition-body lookup.So the per-wisp logging cost grows linearly with the file size, making total cost O(n²) as the log accumulates. A burst of wisp activity (or a runaway, as in the 2026-06-05 incident) makes every subsequent wisp progressively more expensive — full-file read + parse — which both burns CPU and amplifies any spike.
Why it matters
This is the last open item from the 2026-06-05 runaway investigation. The trim-loop fix (#462) and the dispatch circuit breaker (#463) stop the cause of unbounded dispatch; the JSONL retention pass (#461) caps the file size. But even at a bounded size, re-parsing the whole file on every successful wisp is wasteful, and it actively worsened the incident (each new wisp scanned a 400k-line file). Retention bounds the blast radius; it does not remove the O(n) per-wisp read.
Evidence
FileWispExecutionLog.ReadAllAsync→File.ReadAllLinesAsync+ per-lineJsonSerializer.Deserializeover the whole file.FindRecentFailureAsync/QueryRecentAsync/GetCanonicalBodyAsync, all on the success path ofSpawnWispsExecutor.Proposed directions (for discussion)
FindRecentFailureAsync/QueryRecentAsyncdon't touch disk per call. The retention pass already bounds the on-disk size; the in-memory view can mirror it.FindRecentFailureAsync/GetCanonicalBodyAsyncare O(1)-ish rather than full scans.Any of these should come with a benchmark/test asserting per-append cost stays flat as the log grows.
Related