From 4401a996deaf860cef8243a9cd4494ad86cc2f35 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 1 Jul 2026 18:00:11 +0000
Subject: [PATCH] =?UTF-8?q?M9:=20CI-matrix=20tidy-up=20(=C2=A714.11)=20?=
 =?UTF-8?q?=E2=80=94=20reconcile=20spec=20to=20the=20actual=20lanes?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The final M9 slice. Docs-only reconciliation: the 8 workflows already
implement the intended per-PR/nightly/manual matrix, but §14.11 predated the
M9 additions. Rewrite it into a faithful index.

- Per-PR (blocking) row now enumerates the real ci.yml jobs: rustfmt+clippy;
  build+test (§14.1 vectors + §14.2 reduced proptest + §14.7 zero-alloc +
  bench --no-run + the §14.6 !Sync trybuild compile-fail + dir-lock/Send);
  MSRV 1.85; Miri (codec subset); fuzz smoke F1-F4 (BLOCKING — a crash reds
  the PR); §14.4d dir-fsync presence (strace); plus the paths-filtered per-PR
  gates (m8-macos H4 Half A, lazyfs, m8).
- Nightly row names each scheduled workflow with its staggered cron
  (bench 03:17 / fuzz 04:17 / dm-flakey 04:23 / soak 05:17), all contingent,
  and honestly flags that full-iteration §14.2/§14.3 + §14.9 differential are
  not yet automated nightly (covered at PR granularity by reduced proptest +
  the M6 model_oracle suite).
- FS-matrix honesty note (plan Slice 9): the FS matrix matters only for the
  durability/metadata-fault gates (LazyFS/H3/§14.4d); the byte-level
  codec/CRC/recovery-classifier logic the fuzz + Miri lanes exercise is
  filesystem-independent, so those lanes are not multiplied across the matrix
  — over-claiming an "all-FS fuzz matrix" would be dishonest.

No long-run gate marked green: the §14.13 N-CPU-hour fuzz row and the
multi-hour soak stay OPEN-pending a dedicated runner. No src/ or workflow
behavior change (cargo fmt/clippy/test unaffected).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Rpbwt9JT56hQvVXiqTS131
---
 CLAUDE.md             | 1 +
 docs/wal_design_v6.md | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/CLAUDE.md b/CLAUDE.md
index bf36423..775a762 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -109,6 +109,7 @@ The entire value of this component is **correct behavior under crashes and fault
 
 ## Project status (keep this updated)
 
+- **LATEST (2026-07-01): M9 — CI-matrix tidy-up LANDED (§14.11); this is the LAST M9 build slice. M9 is now feature-complete, with only the standing owner/dedicated-runner *observations* open.** **Docs-only reconciliation** (no `src/`, no workflow behavior change — the 8 workflows already implemented the intended matrix; the spec had simply not caught up). Rewrote §14.11 into a **faithful index** of what actually runs: the **Per-PR (blocking)** row now enumerates the real `ci.yml` jobs — `rustfmt+clippy`, `build+test` (§14.1 vectors + §14.2 reduced proptest + §14.7 zero-alloc + `bench --no-run` + the §14.6 `!Sync` trybuild compile-fail + dir-lock/`Send` tests), `MSRV 1.85`, `Miri (codec subset)`, **`fuzz smoke (F1/F2/F3/F4)` (blocking — a crash reds the PR)**, `§14.4d dir-fsync presence (strace)` — plus the paths-filtered per-PR gates (`m8-macos.yml` H4 Half A, `lazyfs.yml`, `m8.yml`); the **Nightly (scheduled)** row names each workflow with its staggered cron (`bench.yml` 03:17 / `fuzz.yml` 04:17 / `m8-dmflakey.yml` 04:23 / `soak.yml` 05:17, all CONTINGENT + dispatch), and honestly flags that full-iteration §14.2/§14.3 and the §14.9 differential are **not yet** automated as nightly (covered at PR granularity by the reduced proptest + M6 `model_oracle`). Added the **FS-matrix honesty note** (per plan Slice 9): the FS matrix is meaningful only for the durability/metadata-fault gates (LazyFS/H3/§14.4d); the byte-level codec/CRC/recovery-classifier logic the §14.5 fuzz + §14.6 Miri lanes exercise is **filesystem-independent**, so those lanes are **not** multiplied across the matrix — over-claiming an "all-FS fuzz matrix" would be dishonest. **Honest close-out:** no long-run gate is marked green — the §14.13 fuzz row (N-CPU-hours) and the multi-hour soak stay OPEN-pending a dedicated runner; Miri clean, `!Sync`, and zero-alloc rows are DONE. `cargo fmt --check` / `clippy --all-targets -D warnings` / `cargo test` unaffected (docs-only). **M9 remaining: only the F1–F4 N-CPU-hour + multi-hour-soak release-gate observations on owner/dedicated hardware — no more in-session build work.**
 - **LATEST (2026-06-29): M9 — soak / endurance LANDED (§14.10), short run green here; the multi-hour gate stays CONTINGENT.** New `tests/soak.rs` (`#[ignore]`, env-driven `WAL_SOAK_SECONDS` default 3 / `WAL_SOAK_SEED` / optional `WAL_SOAK_EVIDENCE`): drives a **single long-lived `Wal`** through a weighted randomized loop — append (boundary-biased 0/1/8/`max_record_size`/random) / commit (timed into an `hdrhistogram`) / `checkpoint(durable)` / process-crash-recover (drop+reopen) — over a 4 KiB-segment/256 B-record config so rolls/splits/checkpoints fire constantly. After **every** recover it re-checks the §14.3 refinement envelope against an independent in-memory oracle: **D1/D3** (recovered `durable_lsn` ≥ committed watermark), **D8** (oldest ≤ authorized `up_to`+1, monotone), **D2/D6** (dense byte-identical replay `oldest..=durable` via `reader_from(0)`). Four resource monitors with bounded-growth gates: **fd** (`/proc/self/fd`, `peak ≤ baseline+32`), **disk** (a deterministic **per-checkpoint floor** — the soak always checkpoints to `durable_lsn`, so every sealed segment is superseded and exactly the active segment must remain right after `checkpoint(durable)`; a reclaim-N−1-of-N leak reds on cycle 1, not after 16 segments accrue — backstopped by a `peak ≤ 16×segment_size` runaway ceiling), **RSS** (`/proc/self/statm`×pagesize, `≤ baseline+64 MiB`; §8.5 — recovery materializes no payloads), **commit p999 ≤ 2 s**. Deterministic seeded LCG (no RNG dep) ⇒ reproduces from the seed; oracle self-prunes below `max_ckpt`/`oldest` so it never trips its own RSS watch. **Short run green here: `WAL_SOAK_SECONDS=4` ⇒ ~29 k ops / ~6.5 k commits / ~2 k checkpoints / ~2 k recoveries, fd 6→6, disk floor==1 every checkpoint, RSS +0.5 MiB, p999 sub-ms–7 ms.** **Falsifiability shown (honest form, per review)**: injecting a per-cycle leak (`deletable_prefix_len` reclaims N−1 of N) reds the floor on cycle 1 (`found 2 *.wal files`), then reverted — the real leak class, not a sub-working-set threshold. Wrapper `scripts/m9/soak.sh` (release build, scrapes the one-line JSON summary, re-emits a §5 evidence ledger via `scripts/m8/evidence.sh`, loud SHORT-vs-gate framing) + `.github/workflows/soak.yml` (schedule 05:17 + dispatch, **NOT per-PR**, contingent banner, uploads the evidence artifact). **No `src/` change** (public API only). **Honest framing (same stopgap as LazyFS/bench):** a SHORT run proves the driver/monitors/oracle work but is **not** the gate — the §14.13 soak is a **multi-hour** run on a dedicated runner with zero regression/violation. `cargo test --test soak` compiles, `clippy --test soak`, `shellcheck scripts/m9/soak.sh`, `actionlint soak.yml` clean. **Remaining M9:** CI-matrix tidy-up (§14.11); the F1–F4 N-CPU-hour + multi-hour-soak release-gate observation on a dedicated runner.
 - **LATEST (2026-06-29): M9 — loom item REMOVED as not-applicable (PR #30 closed unmerged); spec corrected, not silently deleted.** A loom publish-barrier harness was built (PR #30) but **closed without merging**: the test imported only `loom::sync::*` (no `open_wal` symbol) and hand-built a writer/reader cursor *inside the test*, so it was green independently of the crate — it validated a textbook Release/Acquire fact, not this code. The deeper reason it can't be fixed: the writer is **`!Sync` by construction** (`PhantomData<Cell<()>>`, §6.2, proven statically by the #29 trybuild test), so there is **no intra-WAL shared-memory concurrency for loom to model**, and a static `!Sync` proof is *stronger* than an empirical model check. The only Release/Acquire barrier is the **integrator's** publish cursor (out of scope, §1/§15.7); the WAL's actual obligation there (`on_durable` fires strictly after durability, monotonically, writer thread) is covered by the **§15.8 `DurabilityObserver` contract tests**. **Correction (docs-only):** §14.6 loom bullet → a normative *not-applicable* note (conditioned on the writer staying `!Sync`, so a future `Sync` redesign re-opens it); §14.6 concurrency cross-ref + the tooling list de-loom'd; v6 changelog entry added; `loom` dropped from the dev-tools list. **No `loom` dev-dep / `cfg(loom)` / `loom.yml` reaches `main`** (they lived only on the closed PR). Same discipline as the §14.4d negative result — the characterized "why not" is the deliverable. `cargo fmt --check` / `clippy --all-targets -D warnings` / `cargo test` unaffected (docs-only). **Remaining M9:** soak; CI-matrix tidy-up; the F1–F4 N-CPU-hour release-gate observation.
 - **LATEST (2026-06-29): M9 — `!Sync` compile-fail (trybuild) + Send/dir-lock LANDED (§14.6), green here.** New `trybuild` dev-dep + `tests/ui.rs` runner + `tests/ui/wal_not_sync.rs` (+ pinned `tests/ui/wal_not_sync.stderr`): asserts `Wal<NullObserver>: Sync` does **not** compile — the `PhantomData<Cell<()>>` marker (§6.2) makes the handle `!Sync`, so concurrent writers are a compile error (`Cell<()>` cannot be shared between threads safely). The **`Send`** half is asserted positively by a new `wal::tests::handle_is_send`; the runtime **directory lock** was **already** covered by `wal::tests::second_open_is_locked` (second in-process `Wal::open` ⇒ `WalError::Locked` via the exclusive `flock` on `dir/LOCK`) — not duplicated. **trybuild `.stderr` fragility handled honestly:** it's toolchain-sensitive, so it runs **only in the per-PR `test` job (stable)** — the MSRV job is `cargo check`, which builds the runner but does not execute it, so the `.stderr` is compared against exactly one toolchain; regenerate with `TRYBUILD=overwrite cargo test --test ui` on a rustc bump (documented in `Cargo.toml`, `tests/ui.rs`, §14.6). Generated/verified on rustc 1.94.1; the diagnostic wording ("cannot be shared between threads safely") is stable. `cargo test` (22 ok-lines incl. `ui`/`handle_is_send`/`second_open_is_locked`), `clippy --all-targets -D warnings`, `cargo fmt --check`, **MSRV 1.85** (trybuild v1.0.117 builds) green. Docs: §14.6 two bullets + a new §14.13 single-writer DoD row. **Remaining M9:** loom publish-barrier; soak; CI-matrix tidy-up; the F1–F4 N-CPU-hour release-gate observation.
diff --git a/docs/wal_design_v6.md b/docs/wal_design_v6.md
index 5f06616..404c81f 100644
--- a/docs/wal_design_v6.md
+++ b/docs/wal_design_v6.md
@@ -623,11 +623,11 @@ Multi-hour randomized workload with periodic injected crashes+recoveries and che
 *(**M9 — IMPLEMENTED** as `tests/soak.rs` (`#[ignore]`, env-driven `WAL_SOAK_SECONDS`/`WAL_SOAK_SEED`), driven by `scripts/m9/soak.sh` and run nightly/dispatch in `soak.yml`. It drives a **single long-lived `Wal`** through a weighted randomized loop — append (boundary-biased sizes 0/1/8/`max_record_size`/random), commit (timed into an `hdrhistogram`), `checkpoint(durable)`, and process-crash-recover (drop + reopen) — over a 4 KiB-segment/256 B-record config so rolls/splits/checkpoints fire constantly. After **every** recover it re-checks the §14.3 refinement envelope (D1/D3 durable ≥ committed watermark, D8 oldest ≤ authorized `up_to`+1 and monotone, D2/D6 dense byte-identical replay `oldest..=durable`) against an independent in-memory oracle, so a *correctness* regression under sustained load reds the run too — not just a leak. The four monitors are sampled every 128 ops with bounded-growth gates: **fd** (`/proc/self/fd`, `peak ≤ baseline+32`), **disk** (a deterministic **per-checkpoint floor** — because the soak always checkpoints to exactly `durable_lsn`, every sealed segment is fully superseded, so right after `checkpoint(durable)` exactly the active segment must remain (§9); a checkpoint that reclaims N−1 of N sealed segments reds on the **first** bad cycle, not after 16 accumulate — backstopped by a `peak ≤ 16×segment_size` ceiling for gross runaway between checkpoints), **RSS** (`/proc/self/statm`, `≤ baseline+64 MiB`; recovery materializes no payloads, §8.5), and **commit-latency** (p999 ≤ 2 s ceiling). Deterministic (seeded LCG, no RNG dep) so a failure reproduces from `WAL_SOAK_SEED`. **Honest framing (same stopgap as LazyFS/dm-flakey/bench):** a SHORT run (the per-dispatch default 300 s, and the in-session smoke) proves the driver/monitors/oracle work but is **NOT** the gate — the §14.13 soak gate is a **multi-hour** run with zero regression/violation; a short green run is CONTINGENT, a failure is a real bug. Falsifiability shown (honest form): injecting a per-cycle leak (`deletable_prefix_len` reclaims N−1 of N) reds the **floor** assertion on cycle 1 (`found 2 *.wal files`) — the real leak class, not a sub-working-set threshold; the oracle's refinement checks are the same ones the M6 harness demonstrated falsifiable. No `src/` change — the driver uses only the public API.)*
 
 ### 14.11 CI matrix
-- **Per-PR (fast):** §14.1, §14.2 (reduced), §14.6 compile-fail + Miri subset, §14.7 alloc assertion (enforced, `ci.yml` `cargo test`) + `cargo bench --no-run` (benches must compile, can't bitrot), **§14.8 H4 Half A** *(M8: `m8-macos.yml` on `macos-latest`, paths-filtered to the durable-path sources + the test; per-PR + push-to-main + manual — a macOS-only `F_FULLFSYNC`-routing regression is invisible to the Linux PR CI because the `cfg(macos)` path does not compile there; `dtruss` Half B stays owner-run per #19)*.
-- **Nightly:** full §14.2/§14.3 (high iteration), §14.4 LazyFS suite, §14.5 fuzz (time-boxed), §14.7 benchmarks + gates *(M7: `bench.yml`, schedule + manual; the gate runs **informational** on hosted runners until a controlled/pinned-governor runner makes the §14.7 thresholds enforceable — same stopgap as the LazyFS gate)*, §14.9 differential, **§14.8 H3-physical + §14.4d** *(M8: `m8-dmflakey.yml`, push-to-main + schedule + manual, not per-PR; hosted ubuntu VMs reach `dm-flakey`, so these are real CI gates there — ext4 hard with a source-confirmed block-layer EIO (#16) and a `drop_writes` positive control (#17), xfs/btrfs informational, **best-effort + loud skip** if a runner lacks dm-flakey)*. **H4 Half A** is per-PR not nightly — see Per-PR row addendum below.
+- **Per-PR (fast, blocking).** The always-on `ci.yml` jobs (`push:main` + `pull_request` + dispatch): **`rustfmt + clippy`** (`-D warnings`, all targets); **`build + test`** = §14.1 codec/CRC vectors, §14.2 (reduced proptest) round-trip/property suites, §14.7 **zero-alloc assertion** (enforced) + `cargo bench --no-run` (benches must compile, can't bitrot), and the §14.6 **`!Sync` compile-fail** (`tests/ui.rs` trybuild, stable-only) + **directory-lock / `Send`** runtime tests — all under `cargo test`; **`MSRV (1.85)`** `cargo check --all-targets --locked`; **`Miri (codec subset)`** (§14.6, `record`/`crc`/`lsn`/`config` under `-Zmiri-strict-provenance` — the whole Miri-executable surface, since the file-backed paths use foreign syscalls Miri can't run); **`fuzz smoke (F1/F2/F3/F4)`** (§14.5, short bounded `cargo fuzz run` per target — **BLOCKING: a reproducible crash is a real D1–D11 bug and reds the PR**, distinct from the nightly time-boxed lane and never gating an H1 dispatch); **`§14.4d dir-fsync presence (strace)`** (M8 Tier-1, FS-independent syscall-presence guard). Plus the **paths-filtered** per-PR durability gates that only fire when their sources change: **§14.8 H4 Half A** *(`m8-macos.yml` on `macos-latest` — a macOS-only `F_FULLFSYNC`-routing regression is invisible to Linux PR CI because the `cfg(macos)` path does not compile there; `dtruss` Half B stays owner-run per #19)*, the **§14.4b LazyFS** gate *(`lazyfs.yml`, informational)*, and the **§12 fsync-fault poison** gate *(`m8.yml`)*.
+- **Nightly (scheduled, staggered off the `:00` rush + `workflow_dispatch`).** **§14.7 benchmarks + gates** *(`bench.yml`, 03:17 UTC; **informational** on hosted runners until a controlled/pinned-governor runner makes the §14.7 thresholds enforceable — same stopgap as the LazyFS gate)*; **§14.5 fuzz (time-boxed F1–F4)** *(`fuzz.yml`, 04:17 UTC; CONTINGENT, NOT the N-CPU-hour §14.13 gate — a crash still reds it)*; **§14.8 H3-physical + §14.4d** *(`m8-dmflakey.yml`, 04:23 UTC + push-to-main; hosted ubuntu VMs reach `dm-flakey`, so these are real gates there — ext4 hard with a source-confirmed block-layer EIO (#16) and a `drop_writes` positive control (#17), xfs/btrfs informational, **best-effort + loud skip** if a runner lacks dm-flakey)*; **§14.10 soak** *(`soak.yml`, 05:17 UTC; a short hosted slice is a CONTINGENT smoke, not the multi-hour gate)*. *(Still to be wired as nightly, not yet automated: full-iteration §14.2/§14.3 and the §14.9 differential — currently the reduced per-PR proptest + the M6 `model_oracle` suite cover these at PR granularity.)*
 - **Pre-release / manual:** §14.8 H1 power-pull on target hardware, §14.10 soak *(M9: `soak.yml`, schedule + manual, **contingent** until a multi-hour run on a dedicated runner — a hosted nightly slice is a smoke, not the gate; same stopgap as the LazyFS/bench informational lanes; each run uploads its §5 soak-evidence artifact)*. *(The nightly §14.8 dm-flakey/macOS gates above also accept `workflow_dispatch`; a manual run posts its §5 evidence to the tracking issue as the human sign-off, while the nightly cron stays artifact-only and surfaces regressions as a red build.)*
 - **OS matrix:** Linux (primary — sole platform for the §14.8 hardware-durability gate), macOS (dev/correctness — exercises `F_FULLFSYNC`; unit/property/fuzz only, not §14.8). Windows is out of scope for v1 (§8.3).
-- **FS matrix (Linux):** ext4, xfs, btrfs (CoW), tmpfs (logic only — never durability claims).
+- **FS matrix (Linux):** ext4, xfs, btrfs (CoW), tmpfs (logic only — never durability claims). **Scope, honestly:** the FS matrix is meaningful *only* for the durability/metadata-fault gates whose outcome depends on filesystem journaling and cache semantics — §14.4b LazyFS, §14.8 H3-physical, and the §14.4d negative control (which reproduces behaviorally only on a journal-less FS — see §14.4d). The **byte-level logic** — the record codec, CRC, the recovery classifier/bounded scan, and everything the §14.5 fuzz targets and the §14.6 Miri subset exercise — is **filesystem-independent** (pure in-memory parsing over arbitrary bytes, or plain `pwrite`/`fdatasync` with no FS-specific assumption), so those lanes run once on the runner's default FS and are **not** multiplied across the matrix. Over-claiming an "all-FS fuzz matrix" would be dishonest; the fuzz/codec guarantees hold by construction regardless of FS.
 
 ### 14.12 Traceability matrix (invariant → tests)