Skip to content

M9 F3 + fix: structure-aware fuzz, and the D3/D5 sentinel hole it found (issue #26)#25

Merged
guyo13 merged 2 commits into
mainfrom
claude/youthful-hamilton-tncsqv
Jun 28, 2026
Merged

M9 F3 + fix: structure-aware fuzz, and the D3/D5 sentinel hole it found (issue #26)#25
guyo13 merged 2 commits into
mainfrom
claude/youthful-hamilton-tncsqv

Conversation

@guyo13

@guyo13 guyo13 commented Jun 28, 2026

Copy link
Copy Markdown
Owner

Two commits: F3 (the fuzzer) + the correctness fix it surfaced

This PR now ships F3 (the structure-aware classifier fuzzer) and the fix for the real bug it found — they ship together because F3's oracle moves in lockstep with the contract. The fix is commit cfa6a96; F3 is 977c732.


fix: — close the D3/D5 sentinel silent-truncation hole (issue #26)

The bug. record::decode returned Decoded::Sentinel on rec_type == 0 alone, before the CRC check. So a single-bit corruption of an interior record's rec_type byte (10) was mistaken for the end-of-records sentinel: recovery returned TailState::Clean at that offset and silently dropped every acked record after it, rewinding durable_lsn. Violates D3 (no loss ≤ a returned durable_lsn) and D5 (mid-log corruption must be fatal). Every other single-byte corruption was already handled correctly; rec_type→0 was the one escape, because the sentinel short-circuit bypassed the CRC that covers rec_type.

The fix (one decision point, src/record.rs::decode). A sentinel is recognized only by a full all-zero 20-byte header (rec_type == 0 && buf[..20].iter().all(|&b| b == 0)), short-circuited on rec_type so the Full-record hot path stays a single byte-compare. A corrupt rec_type==0, crc≠0 record now falls through the existing ladder → CRC fails → Invalid(BadCrc)recovery::classify → interior ⇒ fatal TornMidLog (D5) / tail ⇒ torn-tail truncate (D4). No new code path; D11 preserved (bounded before any payload touch). Genuine sentinels (pre-alloc zero region / §8.2.1 zeroing) are always all-zero ⇒ nothing legitimate lost.

Blast radius — three consumers walked (verified, not patched)

  1. recovery.rs:103 (CleanEnd arm) — the intended behavioral change.
  2. segment.rs:259 — a corrupt header is no longer Sentinel; on the header-only slice it decodes Incomplete/Invalid, so it falls through to the full-record decode → Invalid. Verified, no edit. segment.rs:246 (the unrelated remaining < 20 ⇒ CleanEnd) untouched.
  3. reader.rs:134 — already treats CleanEnd | Invalid identically. Reader semantics unchanged (a corrupt rec_type==0 tail record shifts CleanEndInvalid; both end the live stream per §15.2). Stated, not changed as a side effect.

Tests — every flip explained, no assertion weakened

  • NEW recovery.rs regression tests rec_type_zeroed_interior_is_fatal_tornmidlog (D5) and rec_type_zeroed_at_tail_is_torn_tail_and_zeroed (D4 + clean/zeroed reopen, D10). Built by encoding a valid record then setting buf[off+16]=0 without recomputing the CRC. Both fail before the fix, pass after (demonstrated).
  • record.rs::sentinel_header_detected FLIPPED, corrected to the honest contract: a real record with a zeroed rec_typeInvalid(BadCrc); an arbitrary non-all-zero rec_type==0 header ⇒ Invalid; all-zero ⇒ Sentinel. One deviation from the writeup, called out: issue Durability contract correctness issue - Sentinel silent truncation #26 expected the old 0xFF-filled fixture to be BadCrc, but that buffer's 0xFF length trips LengthTooLarge first — so the fixture now uses a genuine record to actually exercise the CRC-catches-rec_type mechanism (still Invalid, never Sentinel). Assertion not weakened.
  • Mocked-ScanOutcome recovery tests and the prop_decode_arbitrary_is_bounded catch-all arm are unaffected (confirmed).

F3 oracle updated in lockstep

Mutation::ZeroRecType is now invalidates() == true (a CRC-caught corruption, no longer a sentinel); the now-dead sentinel else arm removed. Re-smoked 80 000 runs, exit 0, zero crashes, cov 589 (up from 561 — ZeroRecType now exercises the corruption arms).

Spec

docs/wal_design_v6.md §8.2 step 1, §5.3 table row, §5.4, §4 D5 prose tightened to "all-zero header" + a v6 changelog bullet.

Gates

cargo test (both configs, 21 ok-lines), cargo clippy --all-targets -D warnings (both), cargo fmt --check, MSRV 1.85 (both) green. src/ diff is the fix + tests only.


F3 — structure-aware classifier fuzz (§14.5) — original PR content

Builds a valid dense single segment (harness owns the correct CRCs → hits deep classifier states a blind fuzzer can't reach), applies one localized mutation at a fuzzer-chosen record (flip CRC/body, zero rec_type, extend length, tamper padding, reserved type + re-CRC), drives the real public Wal::open. Sharp oracle: interior corruption ⇒ fatal TornMidLog (D5); last corruption ⇒ truncate (D4); survivors dense + byte-identical (D6/D10); idempotent reopen (D7); scan bounded (D11). Falsifiability shown (forcing the forward scan to report "no continuation" trips D5: interior corruption returned Ok). decode/recovery corpus cargo fuzz cmin'd; per-PR smoke runs F1+F2+F3; N-CPU-hour gate stays OPEN.

🤖 Generated with Claude Code

claude added 2 commits June 28, 2026 12:14
Third M9 slice. F3 is the target with teeth on the recovery classifier: it
builds a VALID dense single segment (the harness owns the correct CRCs via the
`fuzzing` generators, so the deep classifier states a blind byte fuzzer never
reaches are hit every run), applies ONE fuzzer-chosen localized mutation at a
chosen record, then drives the real public `Wal::open`.

- fuzz/fuzz_targets/structure.rs. Mutation menu: flip CRC / flip a CRC-covered
  body byte / zero rec_type (-> sentinel) / extend length / tamper padding /
  reserved rec_type + re-CRC (via the public crc32c). Sharp classifier oracle:
    * invalid INTERIOR record (a valid record still follows) => MUST be fatal
      TornMidLog/Corruption (D5 -- never silent truncation);
    * invalid LAST record => MUST truncate at its offset (D4);
    * rec_type->0 => sentinel / clean end.
  Plus: the surviving suffix is a dense, byte-identical prefix of the built
  records (D6/D10 -- nothing past the cut, no mutated/garbage bytes); an
  idempotent reopen presents a clean tail (D7, durable zeroing); the forward
  scan stays within scan_bound (D11).
- Corpus cargo-fuzz cmin'd (111 entries).
- CI: `structure` added to fuzz.yml matrix; per-PR smoke in ci.yml now runs
  F1+F2+F3 (renamed "fuzz smoke (F1/F2/F3)"); a crash reds the PR.

Falsifiability: forcing forward_scan_finds_valid to always report "no
continuation" (the classic D5 bug -- mid-log corruption silently truncated)
trips `D5: interior corruption returned Ok (silent truncation!)`, then reverted.

No `src/` change (git diff src/ empty). Built + smoke-green (150000 runs, exit
0, zero crashes, cov 561); cargo fmt --check, clippy --all-targets -D warnings,
cargo test, actionlint all green. N-CPU-hour release gate stays OPEN.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Rpbwt9JT56hQvVXiqTS131
#26)

F3 surfaced a real recovery-contract correctness bug. `record::decode` returned
`Decoded::Sentinel` on `rec_type == 0` ALONE, before the CRC check, so a
single-bit corruption of an interior record's `rec_type` byte (1->0) was
mistaken for the end-of-records sentinel and silently dropped every acked record
after it (durable_lsn rewound) -- violating D3 (no loss <= a returned
durable_lsn) and D5 (mid-log corruption must be fatal, never silent).

Fix (src/record.rs::decode): a sentinel is recognized only by a full all-zero
20-byte header (rec_type == 0 && buf[..20].iter().all(|&b| b == 0)),
short-circuited on rec_type so the Full-record hot path stays a single
byte-compare. A corrupt rec_type==0, crc!=0 record now falls through the existing
ladder -> CRC fails -> Invalid(BadCrc) -> recovery::classify -> interior =>
fatal TornMidLog (D5) / tail => torn-tail truncate (D4). No new code path; D11
preserved (bounded before any payload touch). Genuine sentinels (pre-alloc zero
region / 8.2.1 zeroing) are always all-zero, so nothing legitimate is lost.

Tests (every flip explained):
- NEW recovery.rs regression tests rec_type_zeroed_{interior_is_fatal_tornmidlog,
  at_tail_is_torn_tail_and_zeroed}: interior => TornMidLog (D5); tail =>
  TruncatedAt + clean/zeroed reopen (D4/D10). Both FAIL before the fix, pass
  after (demonstrated).
- record.rs sentinel_header_detected FLIPPED and corrected to the honest
  contract (a real record with a zeroed rec_type => Invalid(BadCrc); arbitrary
  non-all-zero rec_type==0 header => Invalid; all-zero => Sentinel). NOTE: the
  issue's writeup expected the old 0xFF-filled fixture to be BadCrc, but its 0xFF
  length trips LengthTooLarge first; the fixture now uses a genuine record so it
  actually exercises the CRC-catches-rec_type mechanism (still Invalid, never
  Sentinel). Assertion not weakened.

Blast radius walked (verified, not patched):
- recovery.rs:103 CleanEnd arm -- the intended behavioral change.
- segment.rs:259 -- a corrupt header is no longer Sentinel; falls through to the
  full-record decode -> Invalid (verified, no edit). segment.rs:246 untouched.
- reader.rs:134 -- already treats CleanEnd|Invalid identically; reader semantics
  UNCHANGED (a corrupt rec_type==0 tail record shifts CleanEnd->Invalid, both end
  the live stream per 15.2). Decision stated, not changed as a side effect.

F3 oracle updated in lockstep (fuzz/fuzz_targets/structure.rs): Mutation::
ZeroRecType is now invalidates()==true (a CRC-caught corruption, no longer a
sentinel); the dead sentinel `else` arm removed. Re-smoked 80000 runs, exit 0,
zero crashes, cov 589 (up from 561 -- ZeroRecType now exercises the corruption
arms).

Spec (docs/wal_design_v6.md): 8.2 step 1, 5.3 table row, 5.4, 4 D5 prose
tightened to "all-zero header" + a v6 changelog bullet.

cargo test (both configs, 21 ok-lines), clippy --all-targets -D warnings (both),
cargo fmt --check, MSRV 1.85 (both) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Rpbwt9JT56hQvVXiqTS131
@guyo13 guyo13 changed the title M9 F3: structure-aware tail-vs-corruption fuzz (§14.5, D4/D5/D6/D10) M9 F3 + fix: structure-aware fuzz, and the D3/D5 sentinel hole it found (issue #26) Jun 28, 2026
@guyo13 guyo13 merged commit 2b198e7 into main Jun 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants