Skip to content

perf: lower Newton balanced band ratio from 2/1 to 4/3#88

Merged
mmurshed merged 1 commit into
mainfrom
perf/newton-ratio-band
Jun 12, 2026
Merged

perf: lower Newton balanced band ratio from 2/1 to 4/3#88
mmurshed merged 1 commit into
mainfrom
perf/newton-ratio-band

Conversation

@mmurshed

Copy link
Copy Markdown
Owner

What

Re-measure of the near-balanced division band after #85#87 cut Newton's constant: the generic BZ/Newton crossover moved from ratio ~2 down to ~1.3. Lower NEWTON_BALANCED from 2/1 → 4/3 (same b ≥ 98304 limbs floor).

Measurements (M1 Max, min of 3, paired)

shape before (BZ) after (Newton)
nb=160 000 limbs, ratio 1.5 244 ms 165 ms
nb=160 000, ratio 1.95 ~490 ms ~165 ms
nb=100 000, ratio 1.4 201 ms 126 ms
nb=131 073 (2^17+1), ratio 1.5 10.7 s 157 ms (68×)
nb=131 073, ratio 1.95 21.0 s 161 ms

Power-of-two divisors (BZ's best case) regress ≤ ~25% in the narrow (4/3, ~1.4) sliver — same accepted tradeoff as PR #79.

Known residual (documented)

ratio ∈ (1, 4/3) still routes to BZ: generic sizes are genuinely faster there (BZ wins below the crossover), but 2^k+1-family divisor sizes still hit the transform-doubling blowup (1.1–5.4 s at nb=131073, ratios 1.05–1.25). Follow-up: quotient-sized division (next PR).

Testing

246/246 unit tests, div_correctness (cross-checks + q·b + r == a, r < b).

🤖 Generated with Claude Code

The wraparound-Newton PRs (#85-#87) cut Newton's constant ~35-45%, which
moved the generic BZ/Newton crossover in the near-balanced band from
ratio ~2 down to ~1.3 (measured at nb 100k-200k limbs: Newton wins from
1.3-1.35 up; at ratio 1.5 BZ is 1.7-1.9x slower, at 1.95 up to 3.4x;
2^k+1-family divisor sizes are 68x worse on BZ at ratio 1.5).

Lower NEWTON_BALANCED from 2/1 to 4/3. Exact-power-of-two divisors
(BZ's best case) regress <= ~25% in the narrow (4/3, ~1.4) sliver -
same accepted tradeoff as PR #79.

Measured after the change (M1 Max, min of 3):
- nb=160000 limbs ratio 1.5: 244 -> 165 ms
- nb=100000 limbs ratio 1.4: 201 -> 126 ms
- nb=131073 (2^17+1) ratio 1.5: 10.7 s -> 157 ms

Known residual ratio in (1, 4/3) still routes to BZ: generic sizes are
genuinely faster there, but 2^k+1-family sizes still blow up (1.1-5.4 s
at nb=131073, ratios 1.05-1.25). The follow-up fix is quotient-sized
division, which scales with the quotient instead of the divisor.

246 unit tests + div_correctness pass. Docs updated (DIVISION.md,
CLAUDE.md).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant