Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions BENCHMARK.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,51 @@ Representative Base2_64 results:
| `12288/8192` | 44.41 | 7.94 | BZ |
| `20480/2048` (`10n/n`) | 19.15 | 19.19 | Newton |

### Near-balanced division — Newton balanced band (2026-05-31)

Profiling the dispatch path on near-balanced (ratio ≈ 2) division at large
divisor sizes surfaced two compounding problems, both fixed in
`perf/newton-balanced-division`:

1. **Newton single-block pathology.** The single-block path (`na ≤ 2n+1`) ran a
`2n+1`-limb chunk through the truncated `(chunk·R) >> 2n` quotient estimate.
The truncation error scales with `chunk/B^(2n)`, which reaches ~`B` once the
chunk exceeds `2n` limbs (the `+1` limb routinely appears from the Knuth
normalize shift on `a ≈ 2n`). The estimate then underestimated `Q` by more
than the 8-step fixup cap and bailed to quadratic `FastDivision`. At
`nb = 50000` this was a 23× spike (5200 ms vs ~360 ms at neighbouring sizes).
Fix: route `na > 2n` through the blockwise path so every chunk stays `≤ 2n`.

2. **Burnikel-Ziegler non-power-of-2 blowup.** BZ's recursive `2n/n` halving
lands its intermediate NTT multiplies just over power-of-2 length boundaries
for non-power-of-2 divisor sizes, where the transform length doubles. The
constant factor compounds across the recursion depth into a **5–60× slowdown**
versus Newton, worst right above a power of two (`n = 2^k + 1`). Newton pads
once to the working size and stays flat. Fix: a Newton *balanced band* —
`b ≥ NEWTON_BALANCED_B (98304)` and `a ≥ 2b` — takes the near-balanced large
case that BZ previously owned. (Exact-power-of-2 divisor sizes, BZ's best
case where it ties Newton, regress ~4 %; they are rare in practice.)

Dispatch wall-clock, BigMath Base2_64, M1 Max, before (`main` `d38f3c8`, → BZ)
vs after (Newton balanced band):

| shape (limbs) | divisor | before ms | after ms | speedup |
|---|---|---:|---:|---:|
| `200000/100000` | non-pow2 | 496.6 | 249.9 | **1.99×** |
| `500000/250000` | non-pow2 | 5 177.7 | 588.3 | **8.80×** |
| `1000000/500000` | non-pow2 | 10 554.7 | 1 485.7 | **7.10×** |
| `2000000/1000000` | non-pow2 | 21 006.9 | 3 747.6 | **5.61×** |
| `524288/262144` | pow2 (BZ best) | 656.4 | 684.0 | 0.96× |
| `524290/262145` | 2¹⁸+1 (BZ worst) | 92 060.5 | 941.9 | **97.7×** |
| `3000000/1000000` | ratio 3 (control) | 4 687.7 | 4 635.7 | 1.01× |

![Near-balanced division dispatch — Newton balanced band vs Burnikel-Ziegler](docs/images/division_balanced_speedup.png)

Harness: `tests/performance/division_balanced_bench.cpp`. Plot regenerated by
`docs/images/make_division_balanced_plot.py`. Cross-checked against BZ
limb-for-limb across `na ∈ {2n-1, 2n, 2n+1, 2n+2, 3n}` × `nb` near the boundary
(120 cases) and the canonical `div_correctness` suite — all match.

---

## Decimal parse (string → BigInteger)
Expand Down
3 changes: 3 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ if(BIGMATH_BUILD_TESTS)
add_executable(mul_xl_bench tests/performance/mul_xl_bench.cpp)
target_link_libraries(mul_xl_bench PRIVATE bigmath::bigmath)

add_executable(division_balanced_bench tests/performance/division_balanced_bench.cpp)
target_link_libraries(division_balanced_bench PRIVATE bigmath::bigmath)

add_executable(mfa_correctness tests/performance/mfa_correctness.cpp)
target_link_libraries(mfa_correctness PRIVATE bigmath::bigmath)

Expand Down
Binary file added docs/images/division_balanced_speedup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading