M9 finish (2/2): §14.9 differential / reference-parser tester#35
Merged
Conversation
Add tests/differential.rs: an independent, spec-derived reference segment parser run alongside the production recover_segment classifier, with an exact-match oracle on classification. Catches a recovery-classifier error by construction (two implementations disagreeing) rather than probabilistically — the class of bug the issue #26 sentinel hole was. The reference calls no production parse code (not record::decode, not recover_segment, not the segment.rs helpers): it re-derives the §5.2/§5.3/§8.2 constants and re-implements the length-bound check, the all-zero-header sentinel rule (post #26), the CRC-validation ordering, the bounded tail-vs-corruption forward scan, and the sealed-vs-active distinction from raw bytes. It uses only the shared crc32c primitive (a dependency, not parse logic). Inputs: a deterministic 184-case scenario matrix enumerating every arm (clean runs, torn tails, interior corruption incl. the rec_type→0 vector, reserved types, LSN gaps, physical truncation, within/beyond-scan-bound continuations, len>max, non-1 bases; active + sealed) PLUS the Task-1 regrown corpora as raw segment bodies (1666 inputs). Both parsers must return the identical SegClass variant AND offsets/max_lsn. Green: 184 + 1666 agree, ~6 s. Falsifiability shown: injecting the pre-#26 naive rec_type==0 ⇒ sentinel rule into the reference makes the differential fire (production=Truncated vs reference=Clean) on the torn-last-zero_rectype scenario and real corpus entries — i.e. it would have caught the sentinel bug — then reverted. The only src/ change is a thin #[cfg(feature="fuzzing")] recover_segment_classify accessor in lib.rs (no public API widening, no second production parser). The test is #![cfg(feature="fuzzing")], so default cargo build/test are unaffected (zero release impact). CI: per-PR blocking `differential (§14.9)` job in ci.yml (a divergence reds the PR, never gates H1). Docs: §14.9 implemented, §14.11 per-PR row, §14.12 D4/D5/D10/D11 rows, §14.13 DoD bullet. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Rpbwt9JT56hQvVXiqTS131
guyo13
pushed a commit
that referenced
this pull request
Jul 2, 2026
…nly) The closing M9 truthfulness pass. New §14.12.1 subsection reconciles the traceability matrix against the actual tree, with every covering test grep-verified to exist and each D1–D12 row marked passing (a currently-existing software test) vs contingent (a named physical-layer gate). Verified (all 13 cited symbols confirmed present): - D5 traces to the sentinel-fix tests (rec_type_zeroed_interior_is_fatal_tornmidlog + tail companion) AND the §14.9 differential — not anything stale. - D4/D10/D11 cite the §14.9 differential; those PR #35 edits landed on main and name real differential_* symbols. - loom is absent from the matrix, the DoD, and the code (grep -i loom over src/ tests/ .github/ Cargo.toml finds only the docs "why not" note). - §14.9 differential present and implemented. - Each contingent gate names its unmet condition: fuzz ≥24 CPU-h/target since 2b198e7 (dedicated runner), soak multi-hour run, H1 owner hardware, LazyFS FUSE runner, §14.4d documented negative result. - Structural-vs-substantive caveat flagged: the D11 bounded-scan counter is a drift guard (the loop window IS scan_bound), not an input-bug finder; the substantive D11 proof is F1–F4's crash-free surface + the differential. §14.13's "every row has a passing test" bullet now points at §14.12.1. Docs-only (docs/wal_design_v6.md + CLAUDE.md); no src/, no test, no workflow. After this, M9 is software-complete; the only remaining §14.13 gates are passive runner time (fuzz 24-CPU-h/target, multi-hour soak), accruing off the critical path alongside H1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Rpbwt9JT56hQvVXiqTS131
guyo13
pushed a commit
that referenced
this pull request
Jul 2, 2026
…nly) The closing M9 truthfulness pass. New §14.12.1 subsection reconciles the traceability matrix against the actual tree, with every covering test grep-verified to exist and each D1–D12 row marked passing (a currently-existing software test) vs contingent (a named physical-layer gate). Verified (all 13 cited symbols confirmed present): - D5 traces to the sentinel-fix tests (rec_type_zeroed_interior_is_fatal_tornmidlog + tail companion) AND the §14.9 differential — not anything stale. - D4/D10/D11 cite the §14.9 differential; those PR #35 edits landed on main and name real differential_* symbols. - loom is absent from the matrix, the DoD, and the code (grep -i loom over src/ tests/ .github/ Cargo.toml finds only the docs "why not" note). - §14.9 differential present and implemented. - Each contingent gate names its unmet condition: fuzz ≥24 CPU-h/target since 2b198e7 (dedicated runner), soak multi-hour run, H1 owner hardware, LazyFS FUSE runner, §14.4d documented negative result. - Structural-vs-substantive caveat flagged: the D11 bounded-scan counter is a drift guard (the loop window IS scan_bound), not an input-bug finder; the substantive D11 proof is F1–F4's crash-free surface + the differential. §14.13's "every row has a passing test" bullet now points at §14.12.1. Docs-only (docs/wal_design_v6.md + CLAUDE.md); no src/, no test, no workflow. After this, M9 is software-complete; the only remaining §14.13 gates are passive runner time (fuzz 24-CPU-h/target, multi-hour soak), accruing off the critical path alongside H1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Rpbwt9JT56hQvVXiqTS131
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
M9 finish, task 2 of 2 — the §14.9 differential tester
Builds on the Task-1 regrown corpus (#34, merged). This is the technique that catches a recovery-classifier error by construction — two independent implementations disagreeing — rather than probabilistically (a fuzzer happening to hit the trigger). It exists precisely because the issue-#26 sentinel hole was that class of bug: a classifier mis-mapping one byte pattern.
What's here
tests/differential.rs— an independent reference segment parser run alongside the productionrecover_segmentclassifier, with an exact-match oracle on theSegClassvariant and itsoffset/max_lsn.record::decode, notrecover_segment, not thesegment.rsread helpers. It re-derives the §5.2/§5.3 constants and re-implements the length-bound check, the all-zero-header sentinel rule (post Durability contract correctness issue - Sentinel silent truncation #26), the CRC-validation ordering, the bounded tail-vs-corruption forward scan (lsn >= expected, boundmax+28), and the sealed-vs-active distinction from raw bytes. It uses only the sharedcrc32cprimitive (a dependency, not parse logic — re-implementing CRC-32C would test the crate, not the parser).rec_type == 0record with a non-zero CRC asInvalid(→TornMidLoginterior / torn-tail at the end), never a clean sentinel — the correct current spec.Inputs (both parsers see identical bytes)
rec_type→0/ extend-len / reserved-type / pad-flip), interior corruption (incl. the issue-Durability contract correctness issue - Sentinel silent truncation #26rec_type→0vector), LSN gaps, mid-run sentinel, physical truncation, within/beyond-scan-bound continuations,len>max, non-1 bases — each for active and sealed segments. This is the exact-match oracle.fuzz/corpus/{recovery,structure,decode,model}) as raw segment bodies — 1666 inputs of real fuzzer-discovered byte patterns the two parsers must still agree on.Green here: 184 + 1666 agree, ~6 s.
Falsifiability (§14.0.3)
Injecting the pre-#26 naive
rec_type == 0 ⇒ sentinelrule into the reference makes the differential fire —— i.e. the harness would have caught the sentinel bug. Reverted.
Scope note (honest, per the kickoff)
The corpus is consumed as raw segment bodies rather than by re-decoding each fuzz target's
arbitraryenvelope (which would duplicate those generators and be fragile). The differential property "both parsers agree on these bytes" holds regardless of how the bytes were produced; the envelope-specific deep states are covered exhaustively by the scenario matrix. No §14.3-model-oracle→differential wiring (corpus-driven is acceptable for v1 per the kickoff).src/change & release impactThe only
src/change is a thin#[cfg(feature="fuzzing")]recover_segment_classifyaccessor inlib.rs(returns a plainSegClass). No public API widening, no second production parser. The test is#[cfg(feature="fuzzing")], so defaultcargo build/cargo testcompile it to nothing — zero release impact.CI
Per-PR blocking
differential (§14.9)job inci.yml(cargo test --features fuzzing --test differential, ~6 s). A classification divergence is a real recovery-classifier bug and reds the PR — same posture as the fuzz smoke; never gates an H1 dispatch.Verification
cargo fmt --check,cargo clippy --all-targets -D warnings(default and--features fuzzing), defaultcargo test(differential compiles to nothing),cargo test --features fuzzing --test differential, MSRV 1.85cargo check --all-targets --locked,actionlint— all green.Docs
§14.9 marked implemented, §14.11 per-PR row, §14.12 D4/D5/D10/D11 rows now cite the differential, §14.13 DoD bullet, CLAUDE.md.
After this
M9 finish follow-ups remain (separate PRs, per the designer's close-out): (1+2) a deterministic interior-
rec_type→0permanent regression seed + the fuzz-coverage trust-model writeup, then (3) the DoD reconciliation audit. After those, M9 is software-complete (only the passive fuzz-hours / multi-hour-soak runner gates remain).🤖 Generated with Claude Code
Generated by Claude Code