Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ The entire value of this component is **correct behavior under crashes and fault

## Project status (keep this updated)

- **LATEST (2026-06-29): M9 — `!Sync` compile-fail (trybuild) + Send/dir-lock LANDED (§14.6), green here.** New `trybuild` dev-dep + `tests/ui.rs` runner + `tests/ui/wal_not_sync.rs` (+ pinned `tests/ui/wal_not_sync.stderr`): asserts `Wal<NullObserver>: Sync` does **not** compile — the `PhantomData<Cell<()>>` marker (§6.2) makes the handle `!Sync`, so concurrent writers are a compile error (`Cell<()>` cannot be shared between threads safely). The **`Send`** half is asserted positively by a new `wal::tests::handle_is_send`; the runtime **directory lock** was **already** covered by `wal::tests::second_open_is_locked` (second in-process `Wal::open` ⇒ `WalError::Locked` via the exclusive `flock` on `dir/LOCK`) — not duplicated. **trybuild `.stderr` fragility handled honestly:** it's toolchain-sensitive, so it runs **only in the per-PR `test` job (stable)** — the MSRV job is `cargo check`, which builds the runner but does not execute it, so the `.stderr` is compared against exactly one toolchain; regenerate with `TRYBUILD=overwrite cargo test --test ui` on a rustc bump (documented in `Cargo.toml`, `tests/ui.rs`, §14.6). Generated/verified on rustc 1.94.1; the diagnostic wording ("cannot be shared between threads safely") is stable. `cargo test` (22 ok-lines incl. `ui`/`handle_is_send`/`second_open_is_locked`), `clippy --all-targets -D warnings`, `cargo fmt --check`, **MSRV 1.85** (trybuild v1.0.117 builds) green. Docs: §14.6 two bullets + a new §14.13 single-writer DoD row. **Remaining M9:** loom publish-barrier; soak; CI-matrix tidy-up; the F1–F4 N-CPU-hour release-gate observation.
- **LATEST (2026-06-29): M9 — Miri LANDED (§14.6), per-PR `Miri (codec subset)` CI job, green here.** Runs the **FS-free in-memory subset** `record::`/`crc::`/`lsn::`/`config::` under `cargo +nightly miri test --lib` with `MIRIFLAGS=-Zmiri-strict-provenance` — **25 tests, 0 failed, no UB, ~6 s** (sysroot warm). **Honest scope (designer note #2):** the crate has **zero `unsafe`** and no zero-copy casts, so Miri is a **regression guard against future UB**, not a current bug hunt; its guarantee lives in the pure in-memory codec (`decode`/`encode`, CRC, padding/alignment math). The file-backed `recovery`/`segment`/`wal` paths call **foreign syscalls** (`fallocate`/`fsync`/`flock`) Miri can't execute ⇒ **out of Miri scope** (their D11 surface is the ASan fuzz targets' job, §14.5). Two small test-only adjustments (both `src/record.rs`, documented): `proptest` block + its import are `#[cfg(not(miri))]`-gated (proptest is impractical under the interpreter and redundant with the deterministic units), and `roundtrip_spread_and_max`'s 1 MiB case is skipped under `cfg!(miri)` (a 1 MiB CRC in the interpreter is minutes-slow, no extra UB coverage — this cut the subset from 366 s → 6 s). **Finding flagged, not a bug in our code:** `-Zmiri-symbolic-alignment-check` trips on the `crc32c` dependency's word-at-a-time SW path (every CRC call) — but the **default** alignment checker passes, so it's alignment-correct at runtime, not genuine UB; we run strict-provenance only and document why (no `unsafe`/pointer-alignment of our own to verify). Normal `cargo test` unchanged (proptests + 1 MiB still run). `cargo fmt --check`, `clippy --all-targets -D warnings` green. **Remaining M9:** `!Sync` trybuild + dir-lock; loom publish-barrier; soak; CI-matrix tidy-up; the F1–F4 N-CPU-hour release-gate observation.
- **LATEST (2026-06-28): M9 slice 4 — F4 op-script oracle fuzz LANDED, built + smoke-green here; gate stays OPEN. ALL FOUR fuzz targets (F1–F4) now exist.** New `fuzz/fuzz_targets/model.rs` (fourth cargo-fuzz bin). Decodes fuzzer bytes (`arbitrary::Unstructured`) into a boundary-biased small `WalConfig` + a weighted `Vec<Op>` (`Append`/`Commit`/`Checkpoint`/`CrashAndRecover`/`Reopen`; payloads **clamped to `max_record_size`** so the model's `append(..).expect()` can't false-positive), then calls the **same** proptest-free executor the M6 harness uses — `tests/model/mod.rs::run`, included verbatim via `#[path = "../../tests/model/mod.rs"]`, **zero duplication, no change to `tests/model/`**. `run` drives the real `Wal` against an independent in-memory oracle and panics on any breach of **D1/D2/D3/D6/D7/D8**. **Crash model (flag #2, stated in the target header):** `CrashAndRecover` is the **process-crash state machine** (page cache survives — drop the handle without committing, reopen), **not** power loss / torn writes (those are H1 + the LazyFS suite) — a green F4 is never a power-loss durability proof. **Smoke-green: 40 000 runs, exit 0, zero crashes, cov 798**; corpus `cargo fuzz cmin`'d (321 entries — the richest target, full state-machine coverage). **Falsifiability shown**: a seeded recovery loss (`last_lsn = max_lsn − 1` in `recover_all`) trips `D1/D3: recovered durable_lsn 0 < oracle committed watermark 1`, then reverted. **No `src/` change** (`git diff src/` empty). CI: `model` added to `fuzz.yml` matrix + the per-PR smoke in `ci.yml` (now `fuzz smoke (F1/F2/F3/F4)`; F4's smoke is small since it does a real `fdatasync` per `Commit`; a crash reds the M9 PR). `cargo fmt --check`, `clippy --all-targets -D warnings`, `cargo test`, `actionlint` green. **Remaining M9:** Miri; `!Sync` trybuild + dir-lock; loom publish-barrier; soak; CI-matrix tidy-up; the F1–F4 N-CPU-hour release-gate observation on a dedicated runner.
- **LATEST (2026-06-28): CORRECTNESS FIX (gates F3) — closed a D3/D5 sentinel silent-truncation hole that F3 surfaced (issue #26).** `record::decode` returned `Decoded::Sentinel` on `rec_type == 0` **alone**, *before* the CRC check, so a single-bit corruption of an **interior** record's `rec_type` byte (`1`→`0`) was mistaken for the end-of-records sentinel and **silently dropped every acked record after it** (`durable_lsn` rewound) — violating **D3** (no loss ≤ a returned `durable_lsn`) and **D5** (mid-log corruption must be fatal). **Fix (one line in `src/record.rs::decode`):** a sentinel is recognized only by a full **all-zero 20-byte header** (`rec_type == REC_TYPE_SENTINEL && buf[..20].iter().all(|&b| b == 0)`, short-circuited on `rec_type` so the Full-record hot path stays a single byte-compare). A corrupt `rec_type==0, crc≠0` record now falls through the existing ladder → CRC fails → `Invalid(BadCrc)` → `classify` → interior ⇒ fatal `TornMidLog` (D5) / tail ⇒ torn-tail truncate (D4); **no new code path, D11 preserved**. Genuine sentinels (pre-alloc zero region / §8.2.1 zeroing) are always all-zero ⇒ nothing legitimate lost. **Tests:** new `recovery.rs` regression tests at **interior (D5)** and **tail (D4 + idempotent-zeroed reopen, D10)** — both **fail before the fix, pass after** (demonstrated); `record::sentinel_header_detected` fixture corrected to the honest contract (a real record with a zeroed `rec_type` ⇒ `Invalid(BadCrc)`; all-zero ⇒ `Sentinel`) — **note:** the designer's writeup said the old `0xFF`-filled fixture would be `BadCrc`, but that buffer's `0xFF` length trips `LengthTooLarge` first, so the fixture now uses a genuine record to actually exercise the CRC-catches-`rec_type` mechanism (still `Invalid`, never `Sentinel`). **Blast radius walked:** `segment.rs:259` falls through to the full-record decode → `Invalid` (verified, no edit); `reader.rs:134` already treats `CleanEnd|Invalid` identically ⇒ **reader semantics unchanged** (a corrupt `rec_type==0` tail record shifts `CleanEnd`→`Invalid`, both end the live stream per §15.2). **F3 oracle updated in lockstep:** `Mutation::ZeroRecType` is now `invalidates() == true` (a CRC-caught corruption, no longer a sentinel) — F3 re-smoked **80 000 runs, exit 0, zero crashes, cov 589** (up from 561, since ZeroRecType now exercises the corruption arms). **Spec:** §8.2 step 1, §5.3 table row, §5.4, §4 D5 prose tightened to "all-zero header" + v6 changelog bullet. `cargo test` (both configs), `clippy --all-targets -D warnings` (both), `cargo fmt --check`, MSRV 1.85 green. Ships on the F3 branch (PR #25).
Expand Down
113 changes: 113 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,13 @@ criterion = { version = "0.5", default-features = false, features = ["cargo_benc
# so the commit-latency group records per-iteration timings here and emits the
# tails itself.
hdrhistogram = "7"
# §14.6 `!Sync` compile-fail proof (`tests/ui.rs` + `tests/ui/*.rs`): asserts that
# sharing the single-writer `Wal` handle across threads does not compile. The
# pinned `.stderr` is toolchain-sensitive — regenerate with
# `TRYBUILD=overwrite cargo test --test ui` after a rustc bump that changes the
# diagnostic wording (the test only runs in the per-PR `test` job on stable; the
# MSRV job is `cargo check`, which does not execute it).
trybuild = "1"

# §14.7 perf benches. `harness = false` ⇒ criterion provides `main`. NIGHTLY/manual
# tier (§14.11); per-PR CI only compile-checks them (`cargo bench --no-run`).
Expand Down
5 changes: 3 additions & 2 deletions docs/wal_design_v6.md
Original file line number Diff line number Diff line change
Expand Up @@ -574,8 +574,8 @@ Construct the specific resurrection hazard: write records, induce a torn tail su

### 14.6 Concurrency & memory-model
- **Miri** on unit + property suites — UB, uninitialized reads, **alignment** (essential if any `unsafe`/zero-copy header casting is used). *(**M9 — IMPLEMENTED** as a per-PR `Miri (codec subset)` job in `ci.yml` over the **FS-free in-memory subset** (`record::`/`crc::`/`lsn::`/`config::`), `-Zmiri-strict-provenance`, ~6 s. **Honest scope:** the crate has **zero `unsafe`** and no zero-copy pointer casts, so Miri is a **regression guard against future UB**, not a current bug hunt — its guarantee lives in the pure in-memory codec (`decode`/`encode`, CRC, padding/alignment math), the byte-level parser. The file-backed `recovery`/`segment`/`wal` paths call foreign syscalls (`fallocate`/`fsync`/`flock`) Miri cannot execute, so they are **out of Miri scope** — their bounded-parsing/D11 surface is the ASan **fuzz** targets' job (§14.5), not Miri's. `proptest` is `#[cfg(not(miri))]`-gated (impractical under the interpreter; the deterministic unit tests cover the same paths) and the 1 MiB round-trip case is skipped under `cfg!(miri)`. We deliberately do **not** enable `-Zmiri-symbolic-alignment-check`: the `crc32c` dependency's word-at-a-time software path trips its worst-case-alignment assumption while being alignment-correct at runtime (the default checker passes), and this crate has no pointer alignment of its own to verify. **Revisit this scope if `unsafe` is ever introduced** into the crate: Miri then becomes a genuine bug hunt — widen the suite beyond the codec subset, enable `-Zmiri-symbolic-alignment-check` (fixing or justifying any zero-copy alignment it flags), and drop the "regression guard, not a bug hunt" framing here and in §14.13.)*
- **`!Sync` compile-fail test** (`trybuild`): sharing a write handle across threads fails to compile; two concurrent `&mut` borrows rejected.
- **Directory-lock test:** a second `open` of the same dir (same process; and a second process where feasible) fails with `Locked`.
- **`!Sync` compile-fail test** (`trybuild`): sharing a write handle across threads fails to compile; two concurrent `&mut` borrows rejected. *(**M9 — IMPLEMENTED:** `tests/ui/wal_not_sync.rs` (+ pinned `.stderr`) asserts `Wal<NullObserver>: Sync` does **not** compile — the `PhantomData<Cell<()>>` marker (§6.2) makes it `!Sync`, so concurrent writers are a compile error (`Cell<()>` cannot be shared between threads safely). The `Send` half (movable across threads) is asserted positively by `wal::tests::handle_is_send`. The `.stderr` is toolchain-sensitive (regenerate with `TRYBUILD=overwrite cargo test --test ui` on a rustc bump); it runs only in the per-PR `test` job on stable — the MSRV job is `cargo check`, which does not execute it.)*
- **Directory-lock test:** a second `open` of the same dir (same process; and a second process where feasible) fails with `Locked`. *(**M9 — already covered** by `wal::tests::second_open_is_locked`: a second in-process `Wal::open` of the same dir, with the first handle held, returns `WalError::Locked` via the exclusive `flock` on `dir/LOCK` — a second open-file-description's `LOCK_EX|LOCK_NB` gets `EWOULDBLOCK`.)*
- **`loom` harness for the integration barrier (recommended, integrator-facing).** Model the journal-consumer→publish-consumer handoff; prove that under all interleavings the publish side never treats a record as publishable before the corresponding `commit` returned, given Release store / Acquire load on the shared cursor.

### 14.7 Performance & regression (`criterion`)
Expand Down Expand Up @@ -651,6 +651,7 @@ Multi-hour randomized workload with periodic injected crashes+recoveries and che
- §14.8 H1: ≥ M power-pull cycles on target hardware, zero acked-record loss. *(M8: the **harness + runbook are built** — `src/bin/power_pull_{workload,verify}.rs` + `scripts/m8/power-pull.sh`, with the off-box network side channel, send-strictly-after-`commit() Ok` ack-ordering, contiguous-watermark conservative verify, and the H2 vacuous-pass gate as a precondition; the mechanical chain was dry-run green on loopback. **OPEN-pending-owner-run** for the actual ≥50-cycle power-pull on real/cache-configured hardware (no cuttable target in the sandbox). H3 fsync-failure poison: the **§12 state machine RUNS green** via the LD_PRELOAD shim (`scripts/m8/fsync-fault.sh`); the **physical** dm-flakey half now runs **nightly + manual on hosted CI** (`m8-dmflakey.yml`, best-effort + loud skip) instead of owner-only. H4 macOS `F_FULLFSYNC` **Half A** (routing/smoke) now runs on **macOS CI** (`m8-macos.yml`); Half B (`dtruss` trace) stays owner-run (root + SIP). See `docs/m8-runbook.md`.)*
- Zero-allocation assertion (§14.7) passes for append/commit and `Reader::next`. *(M7: PASSES — hardened to also prove no-roll in the measured window and to cover a `max_record_size` payload. The §14.7 benches + regression gate exist; **gate enforcement is OPEN-pending-controlled-runner** — informational on hosted CI per §14.11, a real gate on a pinned-governor runner.)*
- Miri clean on covered suites. *(**M9 — DONE for the covered (FS-free) suites.** Per-PR `Miri (codec subset)` job (`ci.yml`) runs `record::`/`crc::`/`lsn::`/`config::` under `-Zmiri-strict-provenance`, green (25 tests, ~6 s). Scope is honest: zero `unsafe` ⇒ a regression guard, not a bug hunt; the file-backed paths use foreign syscalls Miri can't run (covered by the ASan fuzz targets). Not enabling `-Zmiri-symbolic-alignment-check` (the `crc32c` SW path trips it while alignment-correct at runtime).)*
- Single-writer enforced at compile time + at runtime (§6.2). *(**M9 — DONE.** `!Sync` compile-fail proof `tests/ui/wal_not_sync.rs` (trybuild, pinned `.stderr`) — asserting `Wal: Sync` does not compile; the `Send` half asserted by `wal::tests::handle_is_send`; the runtime exclusive **directory lock** by `wal::tests::second_open_is_locked` (second `open` ⇒ `WalError::Locked`). The trybuild `.stderr` is toolchain-sensitive — per-PR `test` job only (stable), regenerate with `TRYBUILD=overwrite` on a rustc bump.)*

---

Expand Down
10 changes: 10 additions & 0 deletions src/wal.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1119,6 +1119,16 @@ mod tests {
));
}

#[test]
fn handle_is_send() {
// §6.2: the write handle is `Send` (it may be moved to another thread) but
// **not** `Sync` (it cannot be shared). This asserts the `Send` half at
// compile time; the `!Sync` half is the `tests/ui/wal_not_sync.rs`
// trybuild compile-fail proof (§14.6). Together they pin "Send but !Sync".
fn assert_send<T: Send>() {}
assert_send::<Wal<NullObserver>>();
}

#[test]
fn reader_from_skips_earlier_records() {
let dir = tmp();
Expand Down
Loading
Loading