Revert PR #76: NTT-CRT zero-fill change (no certifiable net win) by mmurshed · Pull Request #77 · mmurshed/biginteger

mmurshed · 2026-05-31T06:31:17Z

Reverts the buffer zero-fill change from #76 (commit 68c7678), restoring assign(n, 0).

Why

A full size sweep (3k–8M limbs, best-of-5, M1 Max) showed no certifiable net speedup; the original 3-size result was within noise.

Root cause: std::vector::resize(n) value-initializes grown elements to 0, so on a cold buffer (fresh process / first call at a size — the benchmark's case) resize already zeroes all n. The follow-up std::fill of the pad tail was then redundant work, making the cold path slower — the source of the ~−2% 8M-limb regression observed after #76 merged.

At warm steady state the change is a small genuine win, but across the full sweep the net is dominated by run-to-run variance and power-of-2 codegen-alignment artifacts (+12% at 200k, −7% at 700k that don't survive controlled re-measurement).

Reverting keeps the code simple and removes the post-merge regression. The real traffic-reduction lever remains fusing the MFA transpose/twiddle passes, not initialization micro-opt.

Correctness

tests/mult_correctness.cpp passes.

🤖 Generated with Claude Code

…fers" This reverts the buffer zero-fill change from PR #76 (commit 68c7678), restoring the original assign(n, 0) initialization of the forward and scratch buffers in NttCrt::Multiply. A full size sweep (3k–8M limbs, best-of-5, M1 Max) showed the change does not produce a certifiable net speedup. The apparent wins/losses in the original 3-size benchmark were within measurement noise: - std::vector::resize(n) value-initializes any grown region to 0, so on a cold buffer (the common case for a fresh process / first call at a size) resize already zeroes all n elements. The follow-up std::fill of the pad tail was then pure redundant work, making the cold path slower, not faster — the source of the 8M-limb (~-2%) regression observed after merge. - At warm steady state the change is a genuine but small win; across the full sweep the net effect is dominated by run-to-run variance and power-of-2-size codegen-alignment artifacts (e.g. +12% at 200k, -7% at 700k that do not survive controlled re-measurement). Reverting to the proven assign(n, 0) keeps the code simple and removes the post-merge regression. The buffers are memory-bandwidth-relevant, but the real traffic-reduction lever is fusing the MFA transpose/twiddle passes, not micro-optimizing initialization. Correctness: tests/mult_correctness.cpp passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mmurshed merged commit 95a7c35 into main May 31, 2026
4 of 5 checks passed

mmurshed deleted the revert/ntt-zero-fill branch May 31, 2026 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert PR #76: NTT-CRT zero-fill change (no certifiable net win)#77

Revert PR #76: NTT-CRT zero-fill change (no certifiable net win)#77
mmurshed merged 1 commit into
mainfrom
revert/ntt-zero-fill

mmurshed commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mmurshed commented May 31, 2026

Why

Correctness

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant