Skip to content

FileWispExecutionLog.ReadAllAsync re-parses the entire JSONL on every successful wisp (O(n²) under load) #465

@rockfordlhotka

Description

@rockfordlhotka

Summary

FileWispExecutionLog stores every wisp execution as one line in a single append-only wisp-executions.jsonl. Several hot-path operations call ReadAllAsync, which reads and JSON-deserializes the entire file on every call:

  • FindRecentFailureAsync — invoked for every successful wisp (retry detection in SpawnWispsExecutor.LogExecutionAsync).
  • QueryRecentAsync — invoked by eager scheduled-task promotion for every successful patrol/* wisp.
  • GetCanonicalBodyAsync — definition-body lookup.

So the per-wisp logging cost grows linearly with the file size, making total cost O(n²) as the log accumulates. A burst of wisp activity (or a runaway, as in the 2026-06-05 incident) makes every subsequent wisp progressively more expensive — full-file read + parse — which both burns CPU and amplifies any spike.

Why it matters

This is the last open item from the 2026-06-05 runaway investigation. The trim-loop fix (#462) and the dispatch circuit breaker (#463) stop the cause of unbounded dispatch; the JSONL retention pass (#461) caps the file size. But even at a bounded size, re-parsing the whole file on every successful wisp is wasteful, and it actively worsened the incident (each new wisp scanned a 400k-line file). Retention bounds the blast radius; it does not remove the O(n) per-wisp read.

Evidence

  • FileWispExecutionLog.ReadAllAsyncFile.ReadAllLinesAsync + per-line JsonSerializer.Deserialize over the whole file.
  • Called from FindRecentFailureAsync / QueryRecentAsync / GetCanonicalBodyAsync, all on the success path of SpawnWispsExecutor.

Proposed directions (for discussion)

  • In-memory index/cache of recent records (bounded ring or last-N-by-timestamp), refreshed on append, so FindRecentFailureAsync/QueryRecentAsync don't touch disk per call. The retention pass already bounds the on-disk size; the in-memory view can mirror it.
  • Tail-read instead of full-read for the "recent" queries — read only the last N KB / N lines, since all three consumers want recent records, not the full history.
  • Index by definition hash (e.g. a companion map) so FindRecentFailureAsync/GetCanonicalBodyAsync are O(1)-ish rather than full scans.
  • Longer term: move off a single flat JSONL to a small embedded store if query patterns grow.

Any of these should come with a benchmark/test asserting per-append cost stays flat as the log grows.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions