Skip to content

perf: quotient-sized division for short-quotient shapes (38–75× on 2^k+1 pathology)#89

Merged
mmurshed merged 1 commit into
mainfrom
perf/quotient-sized-division
Jun 12, 2026
Merged

perf: quotient-sized division for short-quotient shapes (38–75× on 2^k+1 pathology)#89
mmurshed merged 1 commit into
mainfrom
perf/quotient-sized-division

Conversation

@mmurshed

Copy link
Copy Markdown
Owner

What

Second half of the "do both" pair (#88 lowered the balanced band to 4/3; this closes the remaining ratio ∈ (1, 4/3) residual — the last known hole in division dispatch).

New algorithms/division/QuotientSizedDivision.h: for short-quotient shapes, the (Δ+1)-limb quotient is determined to ±a few units by the operand tops (t = Δ+4 limbs of b, Δ+t of a; truncating b perturbs q by ≤ ~B^(2−GUARD), sub-ulp). Divide the tops, then reconstruct the exact remainder with one Δ×nb back-multiply + ±few fixups (cap 8, fallback full Newton). Cost scales with the quotient, not the divisor.

Dispatch band: b ≥ NEWTON_BALANCED_B, a ≥ b + 64, ratio < 4/3. Tops divide (ratio ~2) routes to Newton directly at t ≥ 6144 — post-#85–87 Newton beats BZ at ratio 2 from ~6k limbs (40 vs 299 ms at 30k; 0.16 s vs 5.3 s at 65537). Recursion can't re-enter the band (tops ratio ~2 > 4/3).

Also lowers NEWTON_BALANCED_B 98304 → 24576 (the generic ratio-4/3 crossover sits at ~24k limbs now).

Results (M1 Max, min of 3)

shape before (BZ) after speedup
nb=131073 (2^17+1), ratio 1.05 1.07 s 28 ms 38×
nb=131073, ratio 1.10 2.16 s 43 ms 50×
nb=131073, ratio 1.25 5.35 s 71 ms 75×
nb=120000, ratios 1.05/1.10/1.25 37/56/116 ms 30/39/66 ms up to 1.8×
nb=48000, ratio 1.25 16 ms vs Newton 39 ms, FastDivision 733 ms
nb=131072 (BZ pow2 best case), 1.25 88 ms 65 ms no regression band

Testing

  • 246/246 unit tests, div_correctness
  • Band correctness: identity q·b + r == a, r < b + full FastDivision cross-check across ratios 1.01–1.32, sizes 98304–200000, all-max/sparse limb patterns, 2^k+1 sizes
  • Pow10-divisor suite, ToString round-trips (100k/1M/5M digits), wrap stress harness

🤖 Generated with Claude Code

…k+1 pathology)

New algorithms/division/QuotientSizedDivision.h: for ratio < 4/3 at
large b, the (delta+1)-limb quotient is determined to +-a few units by
the operand tops. With t = delta+4, divide (a >> B^(nb-t)) by
(b >> B^(nb-t)) — truncating b perturbs q by <= ~B^(2-GUARD), sub-ulp —
then reconstruct the exact remainder with one delta x nb back-multiply
plus +-few fixups (cap 8, fallback to full Newton). Cost scales with
the QUOTIENT, not the divisor.

Dispatch: b >= NEWTON_BALANCED_B, a >= b + QSIZED_MIN_DELTA (64),
ratio < 4/3. The tops divide (ratio ~2) goes to Newton directly at
t >= 6144 — post-#85-87 Newton beats BZ at ratio 2 from ~6k limbs
(13 vs 22 ms at 12k, 40 vs 299 ms at 30k, 0.16 s vs 5.3 s at 65537).
Recursion cannot re-enter the band (tops ratio ~2 > 4/3).

Also lowers NEWTON_BALANCED_B 98304 -> 24576: the generic ratio-4/3
Newton/BZ crossover sits at ~24k limbs after the wraparound PRs.

Measured (M1 Max, min of 3):
- nb=131073 (2^17+1) ratios 1.05/1.10/1.25:
  1.07 s / 2.16 s / 5.35 s -> 28 / 43 / 71 ms (38-75x)
- nb=120000 same ratios: 37/56/116 -> 30/39/66 ms
- nb=48000 ratio 1.25: 16 ms vs Newton 39, FastDivision 733
- nb=131072 (BZ pow2 best case) 1.25: 65 vs 88 ms - no regression band

This closes the last known residual in the division dispatch (ratio in
(1, 4/3) at large b routing to BZ's transform-doubling blowup).

Tests: 246 unit tests, div_correctness, identity + FastDivision
cross-checks across the band (incl. all-max/sparse patterns, 2^k+1
sizes, band edges 98304->24576), Pow10 suite, ToString round-trips.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@mmurshed mmurshed merged commit 2d9c7f8 into main Jun 12, 2026
@mmurshed mmurshed deleted the perf/quotient-sized-division branch June 12, 2026 04:49
mmurshed added a commit that referenced this pull request Jun 12, 2026
bench: record post-#82#89 benchsuite run; annotate two bad runs
mmurshed added a commit that referenced this pull request Jun 12, 2026
docs: refresh BENCHMARK.md division/parse/tostr tables post PRs #82#89
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant