FE-847: DX introspection tier 2#202
Conversation
PR SummaryMedium Risk Overview Runtime changes: Reviewed by Cursor Bugbot for commit f100a96. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Pull request overview
This PR advances FE-847 “Tier-2 DX introspection” and the turn-boundary choreography layer by introducing (a) assistant-visible watermark projection + continuity entry taxonomy, (b) a prepareNextTurn reconciler scaffold (worldUpdate + drains + mention staleness hooks), (c) mention-ledger capture at submit-time, (d) a Tier-2 real-boot faux harness and dev-only introspection debug cache mirroring, and (e) plumbing these seams into Pi extension lifecycle boundaries and RPC session methods.
Changes:
- Add assistant-visible watermark + continuity-entry classifier, plus session-boundary origination helper (
startAssistantTurn) and reconciler (prepareNextTurn). - Introduce mention ledger extraction/resolution utilities and start recording mentions on
session.submitMessage. - Expand DX tooling: Tier-2 harness for real boot + faux turn + transcript inspection; introspection debug-cache mirroring of system prompt and select tool results; lifecycle pipeline wiring.
Reviewed changes
Copilot reviewed 37 out of 37 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/session/start-assistant-turn.ts | New assistant-origination decision helper + context seed insertion logic. |
| src/session/start-assistant-turn.test.ts | Unit coverage for origination and tail-debt classification behavior. |
| src/session/prepare-next-turn.ts | New pre-turn reconciler computing worldUpdate, drains, and mention-staleness hints. |
| src/session/prepare-next-turn.test.ts | Unit coverage for prepareNextTurn, drains, mention staleness, and guard retry loop. |
| src/session/mention-ledger.ts | New mention parsing, submit-time resolution to stable ids, and staleness hint helpers. |
| src/session/mention-ledger.test.ts | Unit tests for handle extraction, resolution, and staleness emission. |
| src/session/README.md | Session-domain docs updated to describe turn-boundary choreography ownership. |
| src/projections/session/continuity-entry-classifier.ts | New shared taxonomy for watermark carriers vs continuity-only vs debt-bearing entries. |
| src/projections/session/assistant-visible-watermark.ts | New projection for assistant-visible {specId, lsn} watermark + safe comparisons. |
| src/projections/session/assistant-visible-watermark.test.ts | Unit tests locking carrier set and cross-spec failure behavior. |
| src/projections/README.md | Projection ledger updated for new watermark and classifier projections. |
| src/rpc/methods/session.ts | Thread origination seeding into session.triggerExchange and append mention ledger on submit. |
| src/dev/tier-2-harness.ts | Tier-2 real-boot harness via runBrunchTui, faux-provider turn, transcript capture, resume fixture helper. |
| src/dev/tier-2-harness.test.ts | Tier-2 harness tests + scaffold describe.skip coverage map for FE-847 invariants. |
| src/dev/README.md | Dev-loop docs updated with Tier-2 real-boot loop and proof ownership ledger. |
| src/dev/index.ts | Re-export Tier-2 harness helpers from dev front door. |
| src/dev/faux-harness.ts | Capture provider contexts for Tier-1 assertions; allow passing resourceLoader/settingsManager for real composed payloads. |
| src/dev/faux-harness.test.ts | New Tier-1 provider-context capture assertions + Brunch-composed payload capture proof. |
| src/app/brunch-tui.ts | Thread dev introspection options + debug-cache location into TUI boot when BRUNCH_DEV is enabled. |
| src/app/brunch-tui.test.ts | Adjust tests for introspection debugCache and new event capture expectations; remove now-moved boot seam test. |
| src/.pi/README.md | Document session-boundary pipeline ordering and graph watermark stamping. |
| src/.pi/extensions/session/lifecycle.ts | Introduce ordered session-boundary pipeline and wire it to session_start/before_agent_start/assistant message start. |
| src/.pi/extensions/session/lifecycle.test.ts | Unit tests for pipeline ordering and event registration. |
| src/.pi/extensions/introspection/README.md | Update docs to include tool_result mirroring + .brunch/debug cache behavior. |
| src/.pi/extensions/introspection/index.ts | Add debug-cache mirroring on before_provider_request and tool_result events. |
| src/.pi/extensions/introspection/debug-cache.ts | New .brunch/debug/ cache writer for system prompt and selected tool text results. |
| src/.pi/extensions/graph/index.ts | Stamp watermark carriers for own mutations + full graph-overview reads. |
| src/.pi/brunch-pi-extensions.ts | Wire prepareNextTurn into the session boundary pipeline and add a before_provider_request continuity guard. |
| src/.pi/tests/prompting.test.ts | Update promptContext shape in tests (readiness grade removal). |
| src/.pi/tests/introspection.test.ts | Add debug-cache mirroring tests and update event registration expectations. |
| src/.pi/tests/graph-tools.test.ts | Assert watermark-carrier entries are appended for mutate_graph and read_graph overview. |
| src/.pi/tests/extension-registry.test.ts | Assert boundary-prep wiring and before_provider_request guard-only behavior. |
| memory/SPEC.md | Lock D76–D78 and related invariants for continuity/origination choreography. |
| memory/PLAN.md | Update plan/frontier definitions and FE-847 slice map for Tier-2 chassis and closures. |
| memory/cards/turn-boundary-reconciliation--continuity-chain.md | New closure card for reconciliation/watermark/mention end-to-end proof and compaction watermark preservation. |
| memory/cards/kick-and-context-seeding--honest-origination.md | New closure card for origination + context seeding end-to-end proof. |
| HANDOFF.md | New volatile handoff capturing FE-847 sequencing and scaffold edge cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (message?.role === 'toolResult') { | ||
| const toolName = typeof message.toolName === 'string' ? message.toolName : ''; | ||
| return toolName.startsWith('request_') && responseStatus(message) !== 'answered'; | ||
| } | ||
| return false; | ||
| } | ||
| return false; | ||
| } | ||
|
|
||
| function responseStatus(message: Record<string, unknown>): string | undefined { | ||
| const details = isRecord(message.details) | ||
| ? message.details | ||
| : isRecord(message.data) | ||
| ? message.data | ||
| : undefined; | ||
| return typeof details?.status === 'string' ? details.status : undefined; | ||
| } | ||
|
|
||
| function messageRecord(entry: TranscriptEntryLike): Record<string, unknown> | undefined { | ||
| return isRecord(entry.message) ? entry.message : undefined; | ||
| } |
| function prepareNextTurnForGraph( | ||
| graph: BrunchGraphDeps, | ||
| sessionManager: SessionManager, | ||
| ): PrepareNextTurnResult { | ||
| const snapshot = graph.reads.queryGraph(undefined, { visibility: 'all' }); | ||
| return prepareNextTurn({ | ||
| specId: graph.specId, | ||
| currentLsn: snapshot.lsn, | ||
| entries: sessionManager.getEntries(), | ||
| changes: graphChangesFromSnapshot(graph.specId, snapshot), | ||
| }); | ||
| } |
afbf0a6 to
6efa769
Compare
f36df77 to
13c0aca
Compare
6efa769 to
c2ddcdb
Compare
13c0aca to
50e5001
Compare
c2ddcdb to
f8a3245
Compare
50e5001 to
8e0d89d
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8e0d89d. Configure here.
…pre-scope
Final oracle pre-scope review folded in:
- D78-L/I46-L: resume-debt ignore set now covers reconciler-inserted
side-task & reviewer drains (D15-L), generalized to any notice owing
no assistant continuation
- S0 scaffold: shared continuity-entry classifier stub
(isWatermarkCarrier / isContinuityOnlyNonDebtEntry) so S1/S2 and S4
share one taxonomy; assert worldUpdate/watermark/kick as sets and
{specId,lsn} properties, not payload-order goldens
PLAN: all S0-S5 build on single FE-847 branch; coverage-first scaffold.
HANDOFF: records final oracle pass; edge-case ledger now 8.
Amp-Thread-ID: https://ampcode.com/threads/T-019eb232-6e53-74a2-9f95-fed451e47fa6
Co-authored-by: Amp <amp@ampcode.com>
8e0d89d to
f100a96
Compare
* Sync planning docs after FE-847 restack Amp-Thread-ID: https://ampcode.com/threads/T-019eb2e2-5c62-7388-8691-f8e04d4b6e50 Co-authored-by: Amp <amp@ampcode.com> * fable ln-induct review and re-scope * Flip I45 continuity guard live * Thread mention continuity through live submit path * Preserve watermark carriers across compaction * Seed and kick new sessions on real boot * Classify resume origination debt * Require live elicitation gap readers * Harden elicitation gap predicates * Sweep localized review fixes * Handle absent prompt gaps safely * Restore PLAN honesty for FE-847 residual closure The kick-and-context-seeding frontier was marked done while its four I46 resume-origination scaffold rows and two I47 idempotence rows remain it.todo in the Tier-2 suite. Revert it to active with an honest pointer, note the I47 residue on turn-boundary-reconciliation, and file the remediation sequence as memory/REFACTOR.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Make runtime-state append contract honest No caller consumes the appended entry id, and the extension-API write channel (pi.appendEntry) cannot supply one. Change the session-manager seam and appendBrunchAgentRuntimeSwitch to void, make appendBrunchAgentRuntimeInit return an appended/skipped boolean (the only meaningful sentinel it carried), and delete the hardcoded placeholder id in the commands adapter. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Extract exhaustive gap-predicate semantics owner gapPredicateSupport in the union's owning schema module classifies every arm (structural / manual / unsupported) behind a never check; boundary validation and coverage derivation both ride it. Adding a GapPredicate arm without deciding its semantics is now a compile error, and a structural arm without a derivation fails loud at read instead of silently deriving 0. Behavior-preserving. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Finish scoped offline env contract: set skip-version-check, drop dead dev flag applyBrunchOfflineDefault now sets PI_SKIP_VERSION_CHECK alongside PI_OFFLINE (the save/restore scaffolding's intent — offline launches emit no version-check noise), never overriding user-provided values. The dev flag on runWithScopedBrunchOfflineDefault was accepted but never read; removed. Env tests assert set-during-run and restore-after for both variables. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Refresh chrome footer after runtime posture switches The footer already re-projects strategy/lens from the transcript at render time; nothing requested a render after /brunch:strategy or /brunch:lens, so the footer kept showing launch-time values. Wire a chrome-refresh handle at the composition root: chrome binds its footer render-request into it, and a successful runtime switch calls it (not on rejection or picker cancel). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Echo projected mode in /brunch:mode no-op message The already-current branch hardcoded 'elicit' instead of echoing the projected operational mode; behavior-identical today, honest when the mode vocabulary grows. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Require graph reads on the prompt context; fail loud on empty gap register Reverts the 'Handle absent prompt gaps safely' patch (bbc4b4e6) and removes the ?? [] fallback it was shielding. graphReads is now a required, documented must-wire member of BrunchPromptContext — a composition root that omits it is a type error — while session/context are documented intended-optional. An empty gap register reaching legality derivation now surfaces through the existing missing-register-kind throw (the contract isCapabilityLegalForGaps already documents) instead of quietly returning empty manifests and axis options: every spec is seeded with floor gaps, so empty means wiring bug, not posture. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Pin live gap legality through the Tier-2 real-boot oracle The missing card acceptance from the live-gap-legality fix: a real runBrunchTui boot over a fresh seeded spec derives turn-boundary tool legality from that spec's actual gap coverage — uncovered floor gaps keep capability-gated tools (mutate_graph) locked, a foreign writer covering the grounding floor unlocks them on the next boundary, and elicit mode never advertises bash either way. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Derive post-switch tool posture from real selected-spec gaps applyRuntimeSwitch recomputed active tools with a hardcoded empty gap register, silently floor-locking capability-gated tools until the next turn boundary corrected it — the same optional-wiring fault family this remediation targets. The commands seam now requires a gap reader; the composition root derives it from the graph deps (selected-spec reads) or, with no graph in the composition, the explicitly named conservativeUncoveredFloorGaps fail-closed posture. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Flip the I46 resume-origination scaffold rows live through real boot Adds bootTier2RuntimeFromFixture (the resume-side real-boot chassis) and replaces the four I46 it.todo rows with live proofs: a user tail earns the kick behind reconciler-inserted continuity notices — including after earlier completed exchanges; request_* leaves stay idle for all three terminal envelopes plus assistant/system leaves; crash-after-notice reboot still kicks unresolved debt without duplicating the seed; and trailing side-task/reviewer drains neither manufacture nor mask debt. Two product fixes the live rows forced: - seedAndKickAssistantTurn no longer blanket-suppresses the kick when any past exchange result exists (which silently broke post-exchange resume kicks); origin now derives from projected transcript state (no conversational message entries = new session), with re-kick dedupe falling out of the debt classifier itself. - latestTailOwesAssistant reads the real request_* result envelope: outcome is answered/cancelled/unavailable key presence (as projections/exchanges actually writes it), not a status string — settling the PR #202 responseStatus question: the bot was right, an answered request tail would have re-kicked on resume. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Flip the I47 idempotence scaffold rows live through real restart Adds rebootTier2Runtime (flushes Pi's deferred JSONL, then re-boots the real runtime over the same session file) and replaces the remaining it.todo rows: the dedicated no-redundant-worldUpdate-after-seed proof runs through real boot + provider preflight; boot/resume dedupe is proven across an actual restart (seed, kick, and worldUpdate all non-duplicated, derived purely from transcript projection); and the sets-and-{specId,lsn} suite convention is enforced mechanically by a source scan banning golden matchers in this suite. The Tier-2 scaffold has no skipped or todo rows left. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Reconcile PLAN and REFACTOR state after FE-847 remediation closure Both FE-847 frontiers are now honestly done: every I46/I47 Tier-2 scaffold row runs live, with the resume-side and idempotence proofs through real boot/restart. REFACTOR.md remains only as the carrier for the suspended migration-0004 item handed to the stacked branch. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * File typing-collapse refactor plan for the exchanges editor seam Replaces the completed review-fix remediation plan in REFACTOR.md with the /expert-typescript-typing findings: one canonical editor envelope schema (the probe-side fallback is drift), a projected outcome union, and one grounding-gap fixture builder shared by production and tests. Carries the suspended migration-0004 item forward. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Extract the canonical request_choices editor-envelope schema The product editor envelope (schema name brunch.structured_exchange.request_choices.editor) moves from a hand-written interface + parser inside the request_choices tool to a zod schema co-located with the request details schemas. The prefill template now types against the schema input, the response type is inferred, and parsing is schema safeParse. A round-trip test locks prefill -> edited response -> parse -> projection. The new exchanges README documents the two-envelope rationale (editor wire status vs transcript outcome-key presence). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Extract the request outcome-union owner from the details schemas RequestOutcomeKey is now projected from the request details union branches (KeysOfUnion minus header/tool_meta), with the exported REQUEST_OUTCOME_KEYS list drift-coupled to the schema in both directions via a satisfies Record marker. All four request projection input types consume it, the editor envelope statuses become an Exclude<RequestOutcomeKey, 'unavailable'> projection, and the session debt classifier derives its terminal-keys check from the projections/exchanges re-export instead of restating literals. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Converge the RPC proof on the canonical envelope and delete the fallback The structured-exchange RPC proof now drives the product request_choices editor flow (requestChoicesViaEditor, extracted from the tool and shared by both callers) instead of the divergent probe-only envelope. The shared/editor-fallback.ts module — its envelope, parser, hand-written types, and single-select arm — is deleted along with its index re-exports and helper tests; multi-choice coverage through the one schema replaces the single-select arm. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Extract the grounding-gap fixture builder into graph/schema One builder module (src/graph/schema/elicitation-gap-fixtures.ts) now owns the synthetic ElicitationGap shape: presenceGap for single gaps and groundingFloorGaps for the context/thesis/goal/constraint floor with a per-kind coverage knob. The runtime extension's fail-closed conservativeUncoveredFloorGaps rides the builder (keeping its name, export, and doc comment), and the eleven hand-cloned per-test-file gap literals are deleted in favor of importing it. Production owns the shape; tests import it — never the reverse. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Mark typing-collapse refactor done; suspended migration item remains Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Retire REFACTOR.md; carry the migration handoff note into PLAN All refactor steps are done; the one suspended item (migration 0004 coherence, owned by the stacked successor branch) moves to PLAN's Active section so the reintegration re-check survives the file. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * ln-sync: reconcile canonical docs after the FE-847 closure arc SPEC: I45/I46/I47 invariant and verification-design rows flip from planned/coverage-first-scaffold to covered with 2026-06-11 evidence; D35-L reconciled to the shipped, test-locked startup-header behavior (every non-cancel activation headers; resume/open-stay-quiet clause superseded; expand affordance removed until an input path exists); A27-L gains the predicate-hardening evidence (gapPredicateSupport owner, loud field/coverage rejection, presence kind-floor dedup, hydration consistency); new Acknowledged Blind Spots row for live-vs-harness wiring divergence with its mitigations and revisit trigger. PLAN: 12 done frontier definitions archived to PLAN_HISTORY as dated pointer bullets (835 -> 543 lines); completed Sequencing subsections collapsed into a Recently Completed section; stale active-track reference repaired. GC: stale memory/cards/tooling--runtime-state-commands.md deleted (pickers/overlays shipped; the card's non-scope claims were drift). READMEs: src/dev Tier-2 harness ledger gains the resume/reboot chassis entries and the scaffold-fully-live note. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Graduate two induct lenses into ln-review contract-integrity catalog Per user approval: the optional-hook live-wiring divergence lens (four findings this arc) and the dark-union-variant lens (the gap-predicate family) join the stabilized lens library with their cues, repairs, and graduation evidence. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Amp <amp@ampcode.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>


Stack Context
This PR starts the FE-847 Tier-2 DX introspection work above the elicitation-gaps stack.
What?