Skip to content

feat(stark): single-source constraints — one definition per constraint, verified cross-version [stacked on #763]#764

Merged
MauroToscano merged 55 commits into
mainfrom
feat/single-source-constraints-switch
Jul 3, 2026
Merged

feat(stark): single-source constraints — one definition per constraint, verified cross-version [stacked on #763]#764
MauroToscano merged 55 commits into
mainfrom
feat/single-source-constraints-switch

Conversation

@MauroToscano

@MauroToscano MauroToscano commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

The proof, first

A cli built from before this change — the old boxed-constraint machinery — verifies proofs produced by the prover after this change, and vice versa, for real VM executions:

scripts/cross_verify_vm.sh 2499b2a8 e76807ee
PASS prove-NEW-verify-OLD : sub      PASS prove-OLD-verify-NEW : sub
PASS prove-NEW-verify-OLD : add      PASS prove-OLD-verify-NEW : add
PASS prove-NEW-verify-OLD : arith_8  PASS prove-OLD-verify-NEW : arith_8

Because the verifier recomputes the OOD constraint evaluations from its own constraint definitions against the other side's commitments, this passing in both directions means the constraint system is semantically unchanged: same polynomials, same ordering, same degrees, same transcript. The same exchange passes for the 11 example AIRs (22/22, scripts/cross_verify_examples.sh). Evidence logs live on the frozen branch reference/sscs-cross-verify-evidence (run at three checkpoints: pre-deletion, post-deletion, final tip). Reproduce any of it with the two scripts — they stay in-tree as permanent regression gates.

What this PR is

Every transition constraint is now defined once — one eval body per table — and interpreted three ways: compiled prover evaluation per LDE row, compiled verifier evaluation at the OOD point (the identical code path the recursion guest runs: no hashing, no interpretation in-circuit), and a captured flat IR (air.constraint_program()) for the upcoming GPU constraint interpreter.

Net −6,586 lines. Deleted: the TransitionConstraintEvaluator/adapter/boxed() machinery, every duplicated per-constraint struct and builder function, the old LogUp constraint objects — plus two features with no remaining users (approved): periodic columns and the periodic-exemptions zerofier apparatus (end_exemptions stays; re-adding any of it later costs no constraint rewrites — zerofier metadata is orthogonal to the bodies).

Shape of the result

  • Tables: XxxConstraints: ConstraintSet in each table file — meta() (index-ordered {idx, kind, degree, end_exemptions}) + one eval<B: ConstraintBuilder> body. See prover/src/tables/eq.rs (small), dvrm.rs/ecdas.rs (large), crypto/stark/src/examples/simple_fibonacci.rs (minimal).
  • LogUp: LogUpLayout (data) + emit_logup_constraints/logup_meta (framework) replace the constraint objects; same split_interactions, same degrees, same indices.
  • AirWithBuses<F, E, B, PI, CS> holds the constraint set + layout + meta; AIR::constraint_program() is lazy and OnceLock-cached — the verify path never captures.
  • Zerofiers: 1/(xᴺ−1) with optional end-exemption roots, driven by meta through free functions.
  • Prover frames: the evaluator borrows each LDE row in place (RowFrame — two slice pairs straight into the row-major storage) instead of gather-copying ~150–200 elements per row per table into an owned frame.

Verification (beyond cross-version)

  • Every constraint body was migrated with a 1000-random-row three-way differential against its old implementation (prover folder / verifier folder / capture→interpret) plus meta parity (count, num_base, degrees, zerofier params) — the old code stayed in-tree as the oracle until all gates passed, then was deleted in one commit.
  • Permanent tests retained: folder-vs-captured-IR consistency per table, LogUp layout coverage (1-/2-absorbed, absorbed-only, all 10 packings), emission-completeness and meta-invariant assertions.
  • Interpreter path: every production AIR's captured program (via air.constraint_program()) is interpreted and compared bit-for-bit against the compiled folders on random frames — all 26 AIRs, prover and verifier sides (constraint_program_tests.rs). Exact-once emission is asserted release-safely in every differential harness (the debug-only tracker compiles out of CI's --release test build; a skipped constraint would otherwise become silently unenforced).
  • Suites: cargo test -p stark 169/169; prover release suite green modulo 5 pre-existing failures (missing test_commit_split.elf fixture, fails on main too). fmt/clippy clean; zero unsafe in the new modules.

Design record

thoughts/gpu-constraint-eval/impl-plan-single-source-constraints.md (in-tree) is the implementation plan with the full decision log; survey-constraint-frontends.md documents how Plonky3/OpenVM/SP1/risc0/zisk/airbender solve the same problem and why this design follows the production-validated pattern (compiled folders for CPU/guest, captured IR for GPU).

Performance

Measured with 32-pair interleaved ABBA runs on the bench server (drift-free paired comparison, ethrex-20). The migration initially cost +4.0% prove time; profiling-by-elimination traced all of it to memory behavior, none to field arithmetic:

  • per-row Vec allocations in the LogUp fingerprint fold — fixed (fold-style accumulator);
  • per-row Vec allocations in AddOperand construction (CPU ×4/row, KECCAK ~150/row) — fixed (inline AddTerms storage);
  • the per-row frame gather-copy — fixed (RowFrame in-place borrows, the one change with a measurable win, ~0.8%).

Arithmetic micro-optimizations (subexpression hoists, negation elimination, β⁰ skip) were each tried, measured flat, and reverted — the constraint bodies stay in their plain declarative form, with comments marking the known-and-measured redundancies. Final verdict: −0.70% vs main (CI [−1.15, −0.25]) — accepted; the GPU phase replaces this CPU path, and the interpreter-as-prover alternative measured −9% on the same bench.

Pre-review sweep

Four independent review passes (correctness edge cases, test-coverage gaps, stale docs, general design) ran over the full diff before human review:

  • Correctness: no live defects. The one latent finding — exact-once emission had no release-safe gate — is fixed here (see Verification).
  • Coverage: the interpreter path (the GPU contract) was only tested on synthetic shapes; the all-26-AIR combined-program differential closes that.
  • Docs: ~60 stale comments from the migration (dangling references to deleted structs, wrong counts) rewritten to describe the current code only.
  • Design: dead surface removed (packing_shifts context field, the always-true ConstraintProgram::complete flag).

New module crypto/stark/src/constraint_ir/ providing a flat, topologically
ordered IR for transition constraints, generic over a field tower
<F: IsSubFieldOf<E>, E> (defaulting to Goldilocks and its degree-3
extension):

- ir.rs: Op/Dim/ConstraintProgram<F, E>. Constants live in base_consts/
  ext_consts side tables referenced by index (Op::ConstBase(u32)/
  Op::ConstExt(u32)), keeping Op a plain Copy+Eq+Hash payload with zero
  bounds on the fields.
- builder.rs: IrBuilder<F, E> with (Op, Dim) hash-consing, by-value
  constant dedup via linear scan (FieldElement's canonicalizing
  PartialEq), and the id-0 = base-zero convention.
- interp.rs: forward-pass interpreter; eval_program /
  eval_program_verifier match the AIR compute_transition_prover /
  compute_transition contracts, eval_program_base is the minimal
  single-root entry point. Const ops read the side tables directly.

Not wired into the prover or verifier; no behavior change.
Hand-built IrBuilder programs checked against direct FieldElement
arithmetic: every Op variant and leaf kind, mixed base/ext arithmetic
with auto-embed, constant dedup (base, signed, ext), the id-0 zero
convention, CSE sharing, out-of-order emit() root indexing, the
complete-flag plumbing, and 1000-row randomized differential checks.

The prover and verifier entry points are exercised against
hand-constructed TransitionEvaluationContext values (both variants),
including the base-root promotion on the verifier side and next-row
(offset 1) frame reads.

A reflexive-tower test (E = F over the u32 test field, which has a
different modulus and BaseType than Goldilocks) proves the module is
genuinely field-generic; math's test-utils feature is enabled as a
dev-dependency for it.
New module crypto/stark/src/constraints/builder.rs: one constraint body,
written once against the ConstraintBuilder trait, is interpreted three
ways depending on the implementation it runs over:

- ProverEvalFolder<F, E> (Expr = FieldElement<F>): compiled per-row
  prover evaluation, constructed from the Prover
  TransitionEvaluationContext variant plus output slices; emit_ext
  writes ext_evals at the absolute constraint index.
- VerifierEvalFolder<F, E> (Expr = FieldElement<E>): the same body at
  the OOD point over the all-extension frame; const_base embeds via
  FieldElement::<F>::from(v).to_extension(). Monomorphized into the
  guest binary this is the recursion path — no capture, no hashing,
  no interpretation.
- CaptureBuilder<F, E> (Expr = owned Rc expression tree with eager
  dim + degree): one setup-time run that flattens into the constraint
  IR's IrBuilder (hash-consing there = structural CSE) and returns the
  program plus per-root tree-measured degrees.

Also: ExprOps/ExtExprOps operator-bound aliases (mixed base/ext ops
keep the base operand on the left, matching the field tower),
ConstraintMeta + RootKind plain-data constraint metadata with the
dense/idx-ordered/Base-prefix invariants (num_base_from_meta), the
ConstraintSet per-table trait, and debug-build emit tracking asserting
every constraint index is emitted exactly once.

Tests: a sample ConstraintSet (EqXor-, IsBit- and Add-carry-pair-shaped
bodies plus a LogUp-shaped extension constraint) checked on 1000 random
rows three ways — prover folder vs direct arithmetic, prover folder vs
interpreted capture, verifier folder vs interpreted capture — plus
measured-vs-declared degrees, meta invariants, and the completeness
asserts.

Not wired into any production path; no behavior change.
New module crypto/stark/src/constraints/zerofier.rs: the bodies of the
TransitionConstraintEvaluator default methods (end_exemptions_roots,
end_exemptions_lde_evaluations, zerofier_evaluations_on_extended_domain,
evaluate_zerofier) relocated verbatim to free functions consuming plain
ConstraintMeta — they only ever read the metadata getters. The trait
defaults remain untouched and stay the production path until tables
convert to ConstraintSet.

Tests assert the free functions match the trait defaults bit-for-bit on
a configurable boxed constraint across every branch: the default shape,
end exemptions, period/offset combinations, and periodic exemptions with
and without end exemptions, over both the LDE-domain and OOD entry
points.
Two additions to the builder framework tests, per the review pre-flight:

- num_base alignment guard (release-checked): a counting capture wrapper
  records which emit_* sink the sample body calls per index and asserts
  the base-emit set is exactly the num_base_from_meta prefix, and that
  every captured root's dim matches the interpreter's c < num_base
  routing (the .as_base() panic condition). num_base has independent
  sources of truth in meta, the folders and finish(num_base) — this
  pins them together.

- next-row + multi-alpha differential: a LogUp-accumulator-shaped
  sample set reading aux(1, col) (next-row accumulator) and two
  distinct alpha powers — both used by the real 1-/2-absorbed LogUp
  bodies and covered by neither existing sample — checked three ways
  on 1000 random two-step frames (prover folder vs direct arithmetic
  vs interpreted capture; verifier folder vs interpreted capture).
Non-destructive emit_*/​*_meta function pairs in constraints/{templates,
cpu}.rs, written once against the generic ConstraintBuilder so one body
serves the compiled prover folder, the verifier folder and IR capture:

- templates: emit_is_bit (conditional + unconditional), emit_add_pair
  (the carry pair from ONE body, covering every AddOperand variant via
  shared operand/term helpers)
- cpu: emit_product_zero, emit_arg2_exclusive, emit_mem_flags_bit,
  emit_reg_not_read_is_zero, emit_arg2, emit_rvd_eq_res,
  emit_branch_rvd_pair, emit_branch_cond, emit_next_pc_add_pair
  (the two pc + instruction_length carry pairs share one gated body)

Each *_meta returns the idx-ordered ConstraintMeta (declared degree,
default zerofier shape — none of these constraints override period/
offset/exemptions, matching the structs).

The old boxed constraint structs stay untouched: they are the
differential oracle for the transcription until the table conversion
deletes them.
The transcription gate for the constraints switch: every emit_* body is
checked against the old boxed struct's evaluate on 1000 random rows
(off-trace points, where a weakened or slipped transcription diverges
with overwhelming probability), three ways per constraint:

- ProverEvalFolder output vs old evaluate::<Gl, Gl3>
- VerifierEvalFolder output vs old evaluate::<Gl3, Gl3>
- CaptureBuilder -> flatten -> interpret vs old evaluate::<Gl, Gl3>

plus tree-measured degree == declared meta degree == old degree(), and
*_meta zerofier parameters == the old struct's period/offset/
exemptions_period/periodic_exemptions_offset/end_exemptions.

Covers all eleven kinds; the ADD pair runs three configurations
(conditional dword, unconditional DWordHL + negative-coefficient
linear, multi-column condition with Word/Constant/DWordBL operands) so
every AddOperand variant and both const_signed signs are exercised.
… by design); adopt cross-version verification + random-row differentials
Add EqConstraints / StoreConstraints implementing ConstraintSet, mirroring
the old boxed builders index-for-index. New differential test module
constraint_set_tests_b compares old evaluate_prover/evaluate_verifier and
capture->interpret against the new single body on 1000 random rows, plus
meta parity and measured-vs-declared degree.
…ConstraintSet

Add MemwConstraints (15), MemwAlignedConstraints (8), MemwRegisterConstraints
(3) mirroring the old boxed builders index-for-index; extend
constraint_set_tests_b with 1000-row differential + meta-parity checks.
…intSet

Add BranchConstraints (5, carry-bit next-pc pairs + JALR is-bit) and
CommitConstraints (8, is-bit + first/end=>mu + two ADD carry pairs) mirroring
the old boxed builders index-for-index; extend constraint_set_tests_b with
1000-row differential + meta-parity checks.
…straintSet

Add KeccakConstraints (51: 25 mu-gated ADD carry pairs for state_ptr lanes +
top-lane no-overflow) and KeccakRndConstraints (20 mu-gated is-bit on
Cxz_right), mirroring the old boxed builders index-for-index; extend
constraint_set_tests_b with 1000-row differential + meta-parity checks.
Add Cpu32Constraints (32: is-bit flags, ADD/SUB carry pairs, register-zero
limbs, sign-extension arithmetic, sign-zero/arg2-exclusive/flag=>mu), unrolling
the old multi-kind cpu32_constraints assembly into straight-line emit calls;
extend constraint_set_tests_b with 1000-row differential + meta-parity checks.
…-switch

# Conflicts:
#	prover/src/tests/mod.rs
These two examples were the sole users of sub-row/virtual-column reads
and periodic columns, both abandoned features. The framework machinery
(Op::Periodic, periodic zerofier paths) stays; it is removed together
with the engine machinery in a later phase.
A cargo example target (requires the test-utils feature) that proves and
verifies each example AIR with bincode-serialized proofs, mirroring
bin/cli's VM-proof format. Trace sizes and public inputs mirror the
existing stark tests, so a proof produced by one version of the
constraint system can be checked by another — this binary is the
old-verifier side of scripts/cross_verify_examples.sh.

Public-input structs gain serde derives (with an explicit FieldElement
bound) so the proofs, which embed them, can round-trip.
Shared plumbing for routing an AIR's compute_transition_prover /
compute_transition through a ConstraintSet body via the folders. The
verifier-flavored helper also accepts a Prover context (debug trace
validation calls compute_transition with a prover frame) by running the
prover folder and promoting the Base prefix, mirroring the old boxed
path's evaluate_verifier promotion. The engine switch reuses these.
…y_air to ConstraintSet

Each example gains a single-body ConstraintSet transcribing its old
evaluate_verifier text, and routes compute_transition_prover /
compute_transition through the folders via the run_transition_* helpers;
num_base_transition_constraints comes from num_base_from_meta. Old
TransitionConstraintEvaluator impls stay (zerofier machinery + deletion
in a later phase). Meta preserves indexing, degrees, and end
exemptions (simple_fibonacci 2, quadratic 1, dummy fib 2 / bit 0,
simple_addition 0).
…o ConstraintSet

Same recipe: single-body ConstraintSet transcribed from the old
evaluate_verifier text, compute_transition_prover / compute_transition
routed through the folders, num_base from num_base_from_meta. All three
read the next row (fibonacci_multi_column reads two next rows); end
exemptions copied (1, 1, and 2 per column respectively). The
multi-column set is parameterized by num_columns, one constraint per
column with idx == column, exactly as the old per-column structs.
The RAP/LogUp trio: continuity and single-value stay Base constraints;
the permutation/LogUp constraints read the auxiliary column and the
interaction challenges, so they are Ext constraints after the Base
prefix (num_base = 1 resp. 2 via num_base_from_meta, overriding the old
all-ext default — value-neutral: the engine's F-vs-E accumulation split
changes, the composition polynomial does not). Degrees and end
exemptions copied from the old structs, including fibonacci_rap's
steps=16-hard-coded fib end exemptions and the degree-3 LogUp term.

multi_table_lookup has no example-level constraints (all LogUp,
framework-generated); noted in the file — it converts with AirWithBuses
in the engine-switch phase.
…fier fns

The two period=1 cases now exercise zerofier::end_exemptions_roots
directly with a ConstraintMeta. The nonzero-offset case exists purely to
exercise the period != 1 zerofier shape, which no production constraint
uses; it stays on the old trait path untouched and dies with that
machinery in the final deletion phase (noted inline).
…evidence

scripts/cross_verify_examples.sh builds examples_cli at two refs
(bench_abba-style isolated worktree) and, per example AIR, checks
prove NEW -> verify OLD and prove OLD -> verify NEW. The verifier
recomputes OOD constraint evaluations from its own definitions, so any
constraint (re)ordering, num_base drift, indexing or semantic change in
the migration fails loudly — no proof determinism needed.

cross_verify_examples.log: run with OLD=88adbfa6 (pre-migration
examples-cli) and NEW=f2d34efd (all examples on ConstraintSet):
all 11 examples pass in both directions.
…traints + logup_meta)

Generate the LogUp transition constraints from the interaction config through
the generic ConstraintBuilder, so one body serves the compiled prover folder,
the verifier folder and IR capture. LogUpLayout captures what AirWithBuses::new
computes (committed pairs, absorbed interactions, term/acc column indices);
emit_logup_constraints emits the batched-term and accumulated constraints
(1- and 2-absorbed branches, aux(1,.) next-row reads); logup_meta reproduces
the boxed structs' degree/zerofier answers (all RootKind::Ext, default shape).

The fingerprint/multiplicity/packing capture helpers are ported from the spike
branch's IrBuilder-shaped helpers to the generic B: ConstraintBuilder API
(operator style, base operand LEFT for mixed base x ext ops).

Differential test (logup_single_source_tests) compares the OLD boxed LogUp
structs vs the new emit fns via ProverEvalFolder, VerifierEvalFolder and
capture->interpret on 1000 random two-step frames, for 1-absorbed, 2-absorbed,
absorbed-only, and every Packing variant.
The CPU table's transition constraints are assembled by create_all_cpu_constraints
in prover/src/constraints/cpu.rs (never converted — P1 covered only
prover/src/tables/*.rs). Add CpuConstraints: ConstraintSet built from the
existing emit_*/*_meta fns in that file, in the same order as the old assembly
(39 constraints, all base-field, idx 0..38).

Differential test (constraint_set_tests_b::cpu) compares it against the old
boxed create_all_cpu_constraints assembly: count / num_base / per-idx degree /
zerofier params, plus the 1000-row three-way folder-vs-interpreter differential.
All value-identical; each item is independently revertable if the bench
says it isn't worth its cleverness:

- IsSubFieldOf gains sub_from (ext - base) with a neg(sub) default;
  Goldilocks ext2/ext3 specialize it to touch only component 0.
  sub_subfield routes through it (was 1 sub + 5 negs, now 1 sub).
- ConstraintBuilder gains ext_sub_base with the -(v - e) default (capture
  IR unchanged); the eval folders override via sub_subfield. Used for the
  fingerprint seed z - bus_id and the 1-absorbed accumulated root.
- LogUp sender/receiver signs resolve as sub-vs-add instead of negating
  the receiver term (x - (-t) = x + t): no ext negation per term.
- Evaluator uniform fold seeds with constraint 0's promoted evaluation
  (transition_coefficients[0] is beta^0 = 1 by construction) - skips a
  multiply-by-one per row without branching on the value.
- Shared-subexpression hoists: LOAD sign-extension product (was built 7x
  per row), MUL sign-fills (4x), LT carry_0/carry_1 (3x/2x), BRANCH
  next-pc repack + per-path carry_0 (4x/2x).
@MauroToscano

Copy link
Copy Markdown
Contributor Author

/bench-abba 32

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

ABBA tiebreaker started on the bench server (~30–40 min). The bench server is occupied until it finishes.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

ABBA tiebreaker — 73d33d4ea5 vs main (32 pairs)

=== ABBA paired result  (improvement: + = PR faster) ===
  pairs: 32   mean A (PR): 17.015s   mean B (base): 16.793s

  [parametric] paired-t   mean -1.33%   sd 1.52%   se 0.27%
               95% CI: [-1.88%, -0.78%]   (t df=31 = 2.042)
  [robust]     median -1.34%   Wilcoxon W+=53 W-=475  p(exact)=2.0e-05  (z=-3.94)

  --- server stability (this run; compare across servers) ---
  run-to-run jitter:    A CV 1.00%   B CV 0.85%        (lower = steadier)
  within-session drift: -0.43% over the run, 1st->2nd half -0.33%
    (jitter -> Tier-1 cached gate floor; drift -> whether the cached baseline can be trusted)

  VERDICT: REAL REGRESSION - PR slower by ~1.33% (t-CI and Wilcoxon agree)

  raw pairs: /tmp/abba_run/pairs.csv

Drift-free interleaved A/B/B/A measurement. + = PR faster. Trust the verdict when paired-t and Wilcoxon agree.

The LOAD/MUL/LT/BRANCH hoists from the micro-op bundle measured flat on
ABBA (-1.33% vs the pre-bundle -1.5%, within noise), so the bodies go
back to their declarative per-emit form -- the constraint bodies double
as the spec, and cleverness there has to earn its keep. Each site keeps
a comment recording that the redundancy is known and was measured to
not matter, so it doesn't get re-optimized.

The engine-side pieces of the bundle (ext_sub_base, sender-sign
sub-vs-add, beta^0 seed) are body-invisible and stay.
Drops ext_sub_base (builder primitive + IsSubFieldOf::sub_from + the
Goldilocks specializations), the sender-sign sub-vs-add rewrite, and the
beta^0-seeded evaluator fold. All measured flat on ABBA like the body
hoists, and each adds trait surface or non-local invariants the
straightforward form doesn't need. The branch's constraint plumbing is
back to the af8a23a state; the only bundle survivors are the
known-redundancy comments in the table bodies.
The LDE buffers have been row-major since the row-major LDE rework, but
the evaluator still gather-copied every main and aux column of every
transition offset into an owned Frame on each LDE point (~150-200
element clones per row per table) before the constraint body ran.

Replace the Prover context's frame with RowFrame: one borrowed
(main, aux) row-slice pair per transition offset, taken straight from
the row-major storage with the same cyclic row arithmetic. The folder
and the IR interpreter read rows[offset][col] directly; the per-thread
preallocated Frame and fill_from_lde are deleted (single-row steps
only - the sole shape since virtual columns were removed; asserted).
Frame stays for the verifier/OOD path and debug validation, which
bridges via Frame::as_row_frame.

Reads the same values from the same memory - proofs are unchanged.
@MauroToscano

Copy link
Copy Markdown
Contributor Author

/bench-abba 32

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

ABBA tiebreaker started on the bench server (~30–40 min). The bench server is occupied until it finishes.

Found by clippy::redundant_clone sweeping for pointless clones; the only
hit in production code (setup-time, cosmetic). The remaining test/example
hits are left as-is.
@MauroToscano MauroToscano marked this pull request as ready for review July 2, 2026 20:21
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

ABBA tiebreaker — 3638d7a1c7 vs main (32 pairs)

=== ABBA paired result  (improvement: + = PR faster) ===
  pairs: 32   mean A (PR): 16.903s   mean B (base): 16.787s

  [parametric] paired-t   mean -0.70%   sd 1.25%   se 0.22%
               95% CI: [-1.15%, -0.25%]   (t df=31 = 2.042)
  [robust]     median -0.84%   Wilcoxon W+=118 W-=410  p(exact)=0.0054  (z=-2.72)

  --- server stability (this run; compare across servers) ---
  run-to-run jitter:    A CV 0.68%   B CV 0.98%        (lower = steadier)
  within-session drift: -0.23% over the run, 1st->2nd half -0.21%
    (jitter -> Tier-1 cached gate floor; drift -> whether the cached baseline can be trusted)

  VERDICT: REAL REGRESSION - PR slower by ~0.70% (t-CI and Wilcoxon agree)

  raw pairs: /tmp/abba_run/pairs.csv

Drift-free interleaved A/B/B/A measurement. + = PR faster. Trust the verdict when paired-t and Wilcoxon agree.

Four-agent review sweep found ~60 stale comments left behind by the
multi-phase migration: doc blocks claiming the deleted per-constraint
structs still exist ('the old structs stay for now'), ~31 dangling
rustdoc links to deleted symbols (Twin of X, matches X::eval), count
headers that disagreed with the code (MEMW says 11 constraints, has 15;
SHIFT lists 26 columns, has 29), and references to removed concepts
(virtual columns, periodic tests, empty_constraints()).

All comments now describe the current code in plain present tense --
constraint-index maps verified against meta()/eval(), bus names and
counts recounted from bus_interactions(). No historical framing:
migration provenance lives in git, not doc comments.
Both are write-only vestiges the design review flagged:

- TransitionEvaluationContext::packing_shifts was constructed and passed
  at five sites and read at none -- the single-source bodies lower the
  packing shift constants through const_base, so the folders never touch
  it. The field, both constructor params, and three per-prove
  PackingShifts::new() constructions go away. The PackingShifts type
  stays: the aux-trace build path genuinely uses it.

- ConstraintProgram::complete / IrBuilder::mark_unsupported encoded a
  fall-back-to-boxed-path protocol for partially captured AIRs; the
  boxed path no longer exists and every AIR captures fully, so the flag
  was always true and read only by tests asserting it's true.

Also adds fail-loud asserts on the capture-time u8/u16 narrowing
(offsets, columns, challenge/alpha indices) in IrBuilder and
CaptureBuilder: capture runs once at setup, and a table wider than the
IR encoding must panic rather than silently truncate into the GPU
program.
…nce checks

The test-gap review found the capture->IR->interpret pipeline (the
future GPU path; no production caller today) was verified only on
shapes production never uses: no real table's combined base+LogUp
program was ever interpreted, multi-committed-pair layouts and five of
the seven Multiplicity variants never flowed through the interpreter.

- constraint_program_tests: every production AIR's captured program
  (via the production constraint_program() entry point) interpreted and
  compared bit-for-bit against the compiled folders on random two-step
  frames, prover and verifier sides -- all 26 AIRs.
- logup_two_committed_pairs: >= 2-pair layout fixture, exercising the
  batched-term loop and the accumulated term-column sum past their
  first iteration.
- RowFrame::from_lde unit tests: per-offset row borrows, the cyclic
  wrap at the domain end, the offsets cap, and as_row_frame equivalence.

The correctness review found the exact-once-emission invariant had no
release-safe gate: EmitTracker is debug-only and CI runs tests
--release, so a double-emit/skip-swap typo would ship a silently
unenforced (always-zero) constraint. Every differential harness now
asserts the emitted index set is exactly 0..n in any build profile, and
the per-table test rejects roots left at the id-0 sentinel.
- ConstraintSet doc: meta() and eval are parallel index walks that must
  agree entry for entry; say so where implementers read it.
- IrBuilder::const_ext/embed: note both are unreachable from the
  single-body capture path and kept for IR completeness / GPU lowering.
- emit_busvalue_fingerprint: state why only the Linear arm routes
  through the zero-skip hook.
- zerofier: the end-exemption walk computed the same pow twice.
The raw cross-version verification logs (examples 22/22, VM 6/6, at the
pre-deletion / post-deletion / final-tip checkpoints) are runtime
artifacts, not source. They stay available on the frozen branch
reference/sscs-cross-verify-evidence; this branch keeps only the
scripts that regenerate them (scripts/cross_verify_*.sh).
MauroToscano added a commit that referenced this pull request Jul 3, 2026
The RowDomain refactor turned emit_base/emit_ext into provided defaults that
forward to emit_base_rows/emit_ext_rows, adding a call hop on the
per-constraint-per-row prover path (vs #764's direct folder emit). A paired
ABBA vs #764 (12 pairs) showed a real ~1.1% regression: paired-t 95% CI
[-1.99%, -0.27%], Wilcoxon W=10 (significant), 10/12 pairs slower.

Marking the forwarding defaults and ProverEvalFolder's emit_*_rows #[inline]
collapses the hop back to a direct slice write. Pure codegen hint — no value
or wire change, so cross-verification stays green.
…#772)

* refactor(stark): derive constraint metadata from the single eval body

Metadata (kind, degree, end-exemptions) is now DERIVED from each table's
`eval` body instead of hand-maintained in a parallel `meta()` method, so the
two can no longer drift out of sync.

- `ConstraintBuilder::emit_base/emit_ext` now take the constraint's `degree`;
  new `emit_*_exempt` variants also take `end_exemptions`. The prover and
  verifier folders ignore both (dead args -> no hot-path cost); only metadata
  derivation reads them.
- `ConstraintSet::meta()` becomes a provided default that runs the body through
  a new no-op `MetaBuilder` (records {idx, kind, degree, end_exemptions}).
  Kind is implied by which sink is called; degree stays hand-declared, now at
  the emit site next to the expression it describes.
- Every per-table `meta()` override, every `*_meta` helper twin, and
  `logup_meta` are deleted; LogUp metadata is derived from
  `emit_logup_constraints` the same way.

Net -217 LoC across all tables + LogUp + the crypto/stark examples.

Verified: workspace + all test targets compile; 169 stark tests and 96 prover
constraint tests pass (including constraint_program_tests::
all_table_programs_match_folders and the per-table constraint counts); the
capture->interpreter differential and declared-vs-measured-degree gates hold;
`make lint` clean on all feature sets.

* refactor(stark): split constraint degree (per-table) from row-domain

Removes per-constraint `degree` and the positional-int emit args, replacing
them with two orthogonal, self-describing concepts:

- Degree: only the per-table MAX is consumed (by composition_poly_degree_bound),
  so it is declared once via `ConstraintSet::max_degree()` (default 2, overridden
  to 3 on the degree-3 tables) instead of on every emit call. `ConstraintMeta`
  drops its `degree` field; the bound now reads
  `max_degree().max(logup_max_degree(layout))`.
- Row-domain: `emit_base_exempt(idx, degree, n, e)` becomes
  `emit_base_rows(idx, RowDomain::except_last(n), e)` — a named type, so the rare
  end-exemption reads in plain language and is no longer welded onto degree
  (three unlabeled ints -> one named argument). Only the crypto/stark example
  AIRs use it; every production table's emit is now `emit_base(idx, expr)`.

The composition-poly bound stays byte-identical: each degree-3 table declares
`max_degree()` == its former per-constraint max, and the framework folds in the
LogUp max via `logup_max_degree`. The capture path asserts each constraint's
measured degree is `<= max_degree()` — which caught keccak_rnd's mu-gated
(degree-3) IS_BIT during migration.

Verified: workspace + all test targets compile; 169 stark tests and 96 prover
constraint tests pass (including constraint_program_tests::
all_table_programs_match_folders and the per-table measured<=max_degree gates);
`make lint` clean on all feature sets.

* perf(stark): inline the emit forwarding onto the per-row hot path

The RowDomain refactor turned emit_base/emit_ext into provided defaults that
forward to emit_base_rows/emit_ext_rows, adding a call hop on the
per-constraint-per-row prover path (vs #764's direct folder emit). A paired
ABBA vs #764 (12 pairs) showed a real ~1.1% regression: paired-t 95% CI
[-1.99%, -0.27%], Wilcoxon W=10 (significant), 10/12 pairs slower.

Marking the forwarding defaults and ProverEvalFolder's emit_*_rows #[inline]
collapses the hop back to a direct slice write. Pure codegen hint — no value
or wire change, so cross-verification stays green.
@MauroToscano MauroToscano enabled auto-merge July 3, 2026 20:05
@MauroToscano MauroToscano disabled auto-merge July 3, 2026 20:05
@MauroToscano MauroToscano added this pull request to the merge queue Jul 3, 2026
Merged via the queue into main with commit 0131c03 Jul 3, 2026
13 checks passed
@MauroToscano MauroToscano deleted the feat/single-source-constraints-switch branch July 3, 2026 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants