M9 F3 + fix: structure-aware fuzz, and the D3/D5 sentinel hole it found (issue #26)#25
Merged
Merged
Conversation
Third M9 slice. F3 is the target with teeth on the recovery classifier: it
builds a VALID dense single segment (the harness owns the correct CRCs via the
`fuzzing` generators, so the deep classifier states a blind byte fuzzer never
reaches are hit every run), applies ONE fuzzer-chosen localized mutation at a
chosen record, then drives the real public `Wal::open`.
- fuzz/fuzz_targets/structure.rs. Mutation menu: flip CRC / flip a CRC-covered
body byte / zero rec_type (-> sentinel) / extend length / tamper padding /
reserved rec_type + re-CRC (via the public crc32c). Sharp classifier oracle:
* invalid INTERIOR record (a valid record still follows) => MUST be fatal
TornMidLog/Corruption (D5 -- never silent truncation);
* invalid LAST record => MUST truncate at its offset (D4);
* rec_type->0 => sentinel / clean end.
Plus: the surviving suffix is a dense, byte-identical prefix of the built
records (D6/D10 -- nothing past the cut, no mutated/garbage bytes); an
idempotent reopen presents a clean tail (D7, durable zeroing); the forward
scan stays within scan_bound (D11).
- Corpus cargo-fuzz cmin'd (111 entries).
- CI: `structure` added to fuzz.yml matrix; per-PR smoke in ci.yml now runs
F1+F2+F3 (renamed "fuzz smoke (F1/F2/F3)"); a crash reds the PR.
Falsifiability: forcing forward_scan_finds_valid to always report "no
continuation" (the classic D5 bug -- mid-log corruption silently truncated)
trips `D5: interior corruption returned Ok (silent truncation!)`, then reverted.
No `src/` change (git diff src/ empty). Built + smoke-green (150000 runs, exit
0, zero crashes, cov 561); cargo fmt --check, clippy --all-targets -D warnings,
cargo test, actionlint all green. N-CPU-hour release gate stays OPEN.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Rpbwt9JT56hQvVXiqTS131
#26) F3 surfaced a real recovery-contract correctness bug. `record::decode` returned `Decoded::Sentinel` on `rec_type == 0` ALONE, before the CRC check, so a single-bit corruption of an interior record's `rec_type` byte (1->0) was mistaken for the end-of-records sentinel and silently dropped every acked record after it (durable_lsn rewound) -- violating D3 (no loss <= a returned durable_lsn) and D5 (mid-log corruption must be fatal, never silent). Fix (src/record.rs::decode): a sentinel is recognized only by a full all-zero 20-byte header (rec_type == 0 && buf[..20].iter().all(|&b| b == 0)), short-circuited on rec_type so the Full-record hot path stays a single byte-compare. A corrupt rec_type==0, crc!=0 record now falls through the existing ladder -> CRC fails -> Invalid(BadCrc) -> recovery::classify -> interior => fatal TornMidLog (D5) / tail => torn-tail truncate (D4). No new code path; D11 preserved (bounded before any payload touch). Genuine sentinels (pre-alloc zero region / 8.2.1 zeroing) are always all-zero, so nothing legitimate is lost. Tests (every flip explained): - NEW recovery.rs regression tests rec_type_zeroed_{interior_is_fatal_tornmidlog, at_tail_is_torn_tail_and_zeroed}: interior => TornMidLog (D5); tail => TruncatedAt + clean/zeroed reopen (D4/D10). Both FAIL before the fix, pass after (demonstrated). - record.rs sentinel_header_detected FLIPPED and corrected to the honest contract (a real record with a zeroed rec_type => Invalid(BadCrc); arbitrary non-all-zero rec_type==0 header => Invalid; all-zero => Sentinel). NOTE: the issue's writeup expected the old 0xFF-filled fixture to be BadCrc, but its 0xFF length trips LengthTooLarge first; the fixture now uses a genuine record so it actually exercises the CRC-catches-rec_type mechanism (still Invalid, never Sentinel). Assertion not weakened. Blast radius walked (verified, not patched): - recovery.rs:103 CleanEnd arm -- the intended behavioral change. - segment.rs:259 -- a corrupt header is no longer Sentinel; falls through to the full-record decode -> Invalid (verified, no edit). segment.rs:246 untouched. - reader.rs:134 -- already treats CleanEnd|Invalid identically; reader semantics UNCHANGED (a corrupt rec_type==0 tail record shifts CleanEnd->Invalid, both end the live stream per 15.2). Decision stated, not changed as a side effect. F3 oracle updated in lockstep (fuzz/fuzz_targets/structure.rs): Mutation:: ZeroRecType is now invalidates()==true (a CRC-caught corruption, no longer a sentinel); the dead sentinel `else` arm removed. Re-smoked 80000 runs, exit 0, zero crashes, cov 589 (up from 561 -- ZeroRecType now exercises the corruption arms). Spec (docs/wal_design_v6.md): 8.2 step 1, 5.3 table row, 5.4, 4 D5 prose tightened to "all-zero header" + a v6 changelog bullet. cargo test (both configs, 21 ok-lines), clippy --all-targets -D warnings (both), cargo fmt --check, MSRV 1.85 (both) green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Rpbwt9JT56hQvVXiqTS131
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two commits: F3 (the fuzzer) + the correctness fix it surfaced
This PR now ships F3 (the structure-aware classifier fuzzer) and the fix for the real bug it found — they ship together because F3's oracle moves in lockstep with the contract. The fix is commit
cfa6a96; F3 is977c732.fix:— close the D3/D5 sentinel silent-truncation hole (issue #26)The bug.
record::decodereturnedDecoded::Sentinelonrec_type == 0alone, before the CRC check. So a single-bit corruption of an interior record'srec_typebyte (1→0) was mistaken for the end-of-records sentinel: recovery returnedTailState::Cleanat that offset and silently dropped every acked record after it, rewindingdurable_lsn. Violates D3 (no loss ≤ a returneddurable_lsn) and D5 (mid-log corruption must be fatal). Every other single-byte corruption was already handled correctly;rec_type→0was the one escape, because the sentinel short-circuit bypassed the CRC that coversrec_type.The fix (one decision point,
src/record.rs::decode). A sentinel is recognized only by a full all-zero 20-byte header (rec_type == 0 && buf[..20].iter().all(|&b| b == 0)), short-circuited onrec_typeso the Full-record hot path stays a single byte-compare. A corruptrec_type==0, crc≠0record now falls through the existing ladder → CRC fails →Invalid(BadCrc)→recovery::classify→ interior ⇒ fatalTornMidLog(D5) / tail ⇒ torn-tail truncate (D4). No new code path; D11 preserved (bounded before any payload touch). Genuine sentinels (pre-alloc zero region / §8.2.1 zeroing) are always all-zero ⇒ nothing legitimate lost.Blast radius — three consumers walked (verified, not patched)
recovery.rs:103(CleanEndarm) — the intended behavioral change.segment.rs:259— a corrupt header is no longerSentinel; on the header-only slice it decodesIncomplete/Invalid, so it falls through to the full-record decode →Invalid. Verified, no edit.segment.rs:246(the unrelatedremaining < 20 ⇒ CleanEnd) untouched.reader.rs:134— already treatsCleanEnd | Invalididentically. Reader semantics unchanged (a corruptrec_type==0tail record shiftsCleanEnd→Invalid; both end the live stream per §15.2). Stated, not changed as a side effect.Tests — every flip explained, no assertion weakened
recovery.rsregression testsrec_type_zeroed_interior_is_fatal_tornmidlog(D5) andrec_type_zeroed_at_tail_is_torn_tail_and_zeroed(D4 + clean/zeroed reopen, D10). Built by encoding a valid record then settingbuf[off+16]=0without recomputing the CRC. Both fail before the fix, pass after (demonstrated).record.rs::sentinel_header_detectedFLIPPED, corrected to the honest contract: a real record with a zeroedrec_type⇒Invalid(BadCrc); an arbitrary non-all-zerorec_type==0header ⇒Invalid; all-zero ⇒Sentinel. One deviation from the writeup, called out: issue Durability contract correctness issue - Sentinel silent truncation #26 expected the old0xFF-filled fixture to beBadCrc, but that buffer's0xFFlength tripsLengthTooLargefirst — so the fixture now uses a genuine record to actually exercise the CRC-catches-rec_typemechanism (stillInvalid, neverSentinel). Assertion not weakened.ScanOutcomerecovery tests and theprop_decode_arbitrary_is_boundedcatch-all arm are unaffected (confirmed).F3 oracle updated in lockstep
Mutation::ZeroRecTypeis nowinvalidates() == true(a CRC-caught corruption, no longer a sentinel); the now-dead sentinelelsearm removed. Re-smoked 80 000 runs, exit 0, zero crashes, cov 589 (up from 561 —ZeroRecTypenow exercises the corruption arms).Spec
docs/wal_design_v6.md§8.2 step 1, §5.3 table row, §5.4, §4 D5 prose tightened to "all-zero header" + a v6 changelog bullet.Gates
cargo test(both configs, 21 ok-lines),cargo clippy --all-targets -D warnings(both),cargo fmt --check, MSRV 1.85 (both) green.src/diff is the fix + tests only.F3 — structure-aware classifier fuzz (§14.5) — original PR content
Builds a valid dense single segment (harness owns the correct CRCs → hits deep classifier states a blind fuzzer can't reach), applies one localized mutation at a fuzzer-chosen record (flip CRC/body, zero
rec_type, extend length, tamper padding, reserved type + re-CRC), drives the real publicWal::open. Sharp oracle: interior corruption ⇒ fatalTornMidLog(D5); last corruption ⇒ truncate (D4); survivors dense + byte-identical (D6/D10); idempotent reopen (D7); scan bounded (D11). Falsifiability shown (forcing the forward scan to report "no continuation" tripsD5: interior corruption returned Ok).decode/recoverycorpuscargo fuzz cmin'd; per-PR smoke runs F1+F2+F3; N-CPU-hour gate stays OPEN.🤖 Generated with Claude Code