Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 84 additions & 84 deletions BENCHMARK.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,36 +41,36 @@ Balanced (`a.size() == b.size()`):
| 50 000 × 50 000 | 0.698 | 0.398 | 1.76× |
| 100 000 × 100 000 | 1.144 | 0.709 | 1.61× |
| 500 000 × 500 000 | 5.120 | 4.322 | 1.18× |
| 1 000 000 × 1 000 000 | 10.176 | 8.845 | 1.15× |
| **2 000 000 × 2 000 000** | **21.570** | **20.994** | **1.03×** ← near parity |
| **5 000 000 × 5 000 000** | **48.947** | **65.245** | **0.75×** ← BigMath faster |
| **10 000 000 × 10 000 000** | **106.856** | **209.888** | **0.51×** ← BigMath 1.96× faster |
| **20 000 000 × 20 000 000** | **275.252** | **278.298** | **0.99×** ← parity |
| **50 000 000 × 50 000 000** | **1 248.374** | **674.945** | **1.85×** ← GMP faster |
| **100 000 000 × 100 000 000** | **2 617.801** | **1 453.331** | **1.80×** ← GMP faster |
| **200 000 000 × 200 000 000** | **6 578.614** | **3 049.051** | **2.16×** ← GMP faster |
| 1 000 000 × 1 000 000 | 10.652 | 9.008 | 1.18× |
| **2 000 000 × 2 000 000** | **21.264** | **20.501** | **1.04×** ← near parity |
| **5 000 000 × 5 000 000** | **47.249** | **62.875** | **0.75×** ← BigMath faster |
| **10 000 000 × 10 000 000** | **102.150** | **209.436** | **0.49×** ← BigMath 2.05× faster |
| **20 000 000 × 20 000 000** | **306.081** | **272.170** | **1.12×** ← GMP faster |
| **50 000 000 × 50 000 000** | **1 239.154** | **668.975** | **1.85×** ← GMP faster |
| **100 000 000 × 100 000 000** | **2 818.366** | **1 480.842** | **1.90×** ← GMP faster |
| **200 000 000 × 200 000 000** | **6 198.901** | **3 093.118** | **2.00×** ← GMP faster |

Skewed (`a.size() >> b.size()`):

| size | BigMath ms | GMP ms | BM/GMP |
|---|---:|---:|---:|
| 100 000 × 10 000 | 0.640 | 0.303 | 2.11× |
| 500 000 × 50 000 | 2.168 | 2.135 | 1.02× |
| **1 000 000 × 100 000** | **4.541** | **4.576** | **0.99×** ← BigMath faster |
| **2 000 000 × 200 000** | **9.238** | **9.479** | **0.97×** ← BigMath faster |
| 5 000 000 × 500 000 | 39.149 | 30.829 | 1.27× |
| 10 000 000 × 1 000 000 | 90.079 | 70.716 | 1.27× |
| 20 000 000 × 2 000 000 | 239.615 | 162.845 | 1.47× |
| **50 000 000 × 5 000 000** | **615.801** | **684.121** | **0.90×** ← BigMath faster |
| 100 000 000 × 10 000 000 | 1 159.229 | 1 011.270 | 1.15× |
| 200 000 000 × 20 000 000 | 2 446.578 | 1 952.674 | 1.25× |
| 100 000 × 10 000 | 0.543 | 0.305 | 1.78× |
| 500 000 × 50 000 | 2.208 | 2.108 | 1.05× |
| **1 000 000 × 100 000** | **4.301** | **4.518** | **0.95×** ← BigMath faster |
| **2 000 000 × 200 000** | **9.367** | **9.360** | **1.00×** ← parity |
| 5 000 000 × 500 000 | 38.681 | 30.543 | 1.27× |
| 10 000 000 × 1 000 000 | 97.745 | 69.966 | 1.40× |
| 20 000 000 × 2 000 000 | 243.961 | 160.780 | 1.52× |
| **50 000 000 × 5 000 000** | **543.376** | **698.909** | **0.78×** ← BigMath faster |
| 100 000 000 × 10 000 000 | 1 161.182 | 1 012.887 | 1.15× |
| 200 000 000 × 20 000 000 | 2 505.058 | 1 980.750 | 1.26× |

**Observations:**

- **BigMath beats GMP on balanced multiplication across the 5M-10M band and is near parity at 20M.** Radix-4 + radix-8 fused NTT butterflies (PRs #59, #60) added 1.5-1.6× wall-clock vs prior. The 2026-05-27 MFA retune moved the gate from `2^21` to `2^24`, and the current 10M balanced row sits at **106.856 ms**.
- **BigMath beats GMP on balanced multiplication across the 5M-10M band and is near parity at 20M.** Radix-4 + radix-8 fused NTT butterflies (PRs #59, #60) added 1.5-1.6× wall-clock vs prior. The MFA gate is still `2^24`, and the current 10M balanced row sits at **102.150 ms**.
- **MFA / Bailey 6-step CRT NTT (PR #65) is now reserved for the very-large regime.** The default gate is `2^24` transform coefficients. Focused limb benchmarks show this avoids the 300k-2M limb regression band while preserving MFA wins at 3M+ limbs.
- Below 500k, GMP's hand-tuned basecase keeps a 1.5-3.3× lead.
- **Skewed mults: BigMath is around parity at 500k×50k and 1M×100k, with a slight BigMath lead at 2M×200k.** BigMath falls back behind GMP at 50M×5M and 100M×10M, and is 1.25× at the new 200M×20M row.
- **Skewed mults: BigMath is around parity at 500k×50k and 1M×100k, with parity at 2M×200k.** BigMath wins at 50M×5M, but falls back behind GMP at 100M×10M and 200M×20M.

### MFA focused threshold check

Expand Down Expand Up @@ -197,29 +197,29 @@ Balanced (`a.size() == b.size()`) — quotient is 1-2 limbs, both libraries shor
| 50 000 × 50 000 | <0.001 | <0.001 | 0.43× |
| 100 000 × 100 000 | <0.001 | 0.001 | 0.23× |
| 500 000 × 500 000 | 0.001 | 0.005 | 0.18× |
| 1 000 000 × 1 000 000 | 0.001 | 0.010 | 0.10× |
| 5 000 000 × 5 000 000 | 1.367 | 0.189 | 7.23× |
| 1 000 000 × 1 000 000 | 0.001 | 0.010 | 0.05× |
| 5 000 000 × 5 000 000 | 1.222 | 0.177 | 6.90× |

Skewed (`a.size() >> b.size()`) — Newton/BZ band, real algorithmic work:

| size | BigMath ms | GMP ms | BM/GMP |
|---|---:|---:|---:|
| 40 000 × 10 000 | 0.732 | 0.218 | 3.36× |
| 100 000 × 10 000 | 2.178 | 0.450 | 4.84× |
| 200 000 × 50 000 | 11.088 | 1.704 | 6.51× |
| 500 000 × 100 000 | 17.742 | 4.608 | 3.85× |
| 1 000 000 × 200 000 | 33.632 | 10.022 | 3.36× |
| 2 000 000 × 500 000 | 82.969 | 24.724 | 3.36× |
| 5 000 000 × 1 000 000 | 200.641 | 69.900 | 2.87× |
| 10 000 000 × 2 000 000 | 432.299 | 154.709 | 2.79× |
| 20 000 000 × 4 000 000 | 982.906 | 353.134 | 2.78× |
| 50 000 000 × 10 000 000 | 2 464.018 | 1 324.260 | 1.86× |
| 100 000 000 × 20 000 000 | 6 119.817 | 2 594.778 | 2.36× |
| 40 000 × 10 000 | 0.754 | 0.225 | 3.36× |
| 100 000 × 10 000 | 2.314 | 0.462 | 5.01× |
| 200 000 × 50 000 | 11.837 | 1.748 | 6.77× |
| 500 000 × 100 000 | 17.375 | 4.553 | 3.82× |
| 1 000 000 × 200 000 | 33.711 | 9.849 | 3.42× |
| 2 000 000 × 500 000 | 83.035 | 24.422 | 3.40× |
| 5 000 000 × 1 000 000 | 201.419 | 67.656 | 2.98× |
| 10 000 000 × 2 000 000 | 424.143 | 152.982 | 2.77× |
| 20 000 000 × 4 000 000 | 995.535 | 347.341 | 2.87× |
| 50 000 000 × 10 000 000 | 2 482.322 | 1 299.545 | 1.91× |
| 100 000 000 × 20 000 000 | 6 083.450 | 2 541.933 | 2.39× |
| **200 000 000 × 40 000 000** | **12 979.547** | **4 550.205** | **2.85×** ← MFA flowing through Newton |

**Observations:**

- **Skewed division ratio narrows from ~6.5× at 200k×50k peak to 1.86× at 50M×10M, then rises back to 2.85× at 200M×40M.** Newton inherits the multiplication wins in the large regime, including MFA once the internal products cross the useful MFA band. The residual gap is still division-structure overhead, not decimal I/O.
- **Skewed division ratio narrows from ~6.8× at 200k×50k peak to 1.91× at 50M×10M, then rises back to 2.85× at 200M×40M.** Newton inherits the multiplication wins in the large regime, including MFA once the internal products cross the useful MFA band. The residual gap is still division-structure overhead, not decimal I/O.
- 200k×50k stays the worst point: divisor sits below the Newton band (2596 limbs), goes through BZ which loses ~6.5× to GMP's `mpn_dcpi1_div_q` at this size.
- Residual 2.8-3.9× gap in the 1M-20M skewed band is the structural cost of Newton's chunked iteration vs GMP's single recursive divide with precomputed inverse.
- Balanced cases route through FastDivision short-circuits and aren't algorithmically meaningful at this size profile. The 5M×5M balanced case was regressing 27.03× before PR #56 fix (BZ misroute on degenerate quotient); now 7.20× via FastDivision short-circuit.
Expand All @@ -244,39 +244,39 @@ Representative Base2_64 results:

| size (digits) | BigMath ms | GMP ms | BM/GMP |
|---|---:|---:|---:|
| 1 000 | 0.002 | 0.002 | 1.50× |
| 10 000 | 0.115 | 0.038 | 3.01× |
| 50 000 | 1.375 | 0.387 | 3.43× |
| 100 000 | 3.263 | 1.047 | 3.10× |
| 500 000 | 22.262 | 9.270 | 2.40× |
| 1 000 000 | 49.037 | 21.071 | 2.33× |
| 2 000 000 | 106.786 | 48.047 | 2.22× |
| 5 000 000 | 269.753 | 150.835 | 1.79× |
| 10 000 000 | 585.600 | 350.776 | 1.67× |
| 20 000 000 | 1 270.039 | 814.355 | 1.56× |
| 50 000 000 | 5 168.943 | 2 689.193 | 1.92× |

**Observation:** ratio narrows through the 10M-20M sweet spot (3.2× at 100k → **1.56× at 20M**) where BigMath's NTT overtakes GMP's basecase. It widens back to 1.92× at 50M as GMP's SSA activates.
| 1 000 | 0.002 | 0.002 | 1.48× |
| 10 000 | 0.115 | 0.038 | 2.98× |
| 50 000 | 1.377 | 0.401 | 3.44× |
| 100 000 | 3.342 | 1.074 | 3.11× |
| 500 000 | 22.149 | 8.819 | 2.51× |
| 1 000 000 | 48.767 | 20.711 | 2.35× |
| 2 000 000 | 106.201 | 47.759 | 2.22× |
| 5 000 000 | 269.017 | 148.432 | 1.81× |
| 10 000 000 | 583.973 | 349.941 | 1.67× |
| 20 000 000 | 1 263.926 | 803.500 | 1.57× |
| 50 000 000 | 5 372.849 | 2 664.778 | 2.02× |

**Observation:** ratio narrows through the 10M-20M sweet spot (3.1× at 100k → **1.57× at 20M**) where BigMath's NTT overtakes GMP's basecase. It widens back to 2.02× at 50M as GMP's SSA activates.

---

## ToString (BigInteger → string)

| size (digits) | BigMath ms | GMP ms | BM/GMP |
|---|---:|---:|---:|
| 1 000 | 0.007 | 0.004 | 1.82× |
| 10 000 | 0.274 | 0.078 | 3.50× |
| 50 000 | 4.044 | 0.861 | 4.70× |
| 100 000 | 19.803 | 2.357 | 8.40× |
| 200 000 | 40.760 | 6.417 | 6.35× |
| 500 000 | 106.486 | 20.667 | 5.15× |
| 1 000 000 | 224.755 | 50.174 | 4.48× |
| 2 000 000 | 481.526 | 119.342 | 4.03× |
| 5 000 000 | 1 104.781 | 386.146 | 2.86× |
| 10 000 000 | 2 416.172 | 913.741 | 2.64× |
| 20 000 000 | 5 432.991 | 2 155.828 | 2.52× |

**Observation:** narrowest gap at 1k (the linear leaf, where GM div2by1 already runs); peaks at 100k (D&C overhead + Newton recip setup not yet amortized); narrows again from 200k onward as D&C asymptotic + NTT inheriting from multiplication's overtake compound — **8.09× at 100k → 2.57× at 20M**.
| 1 000 | 0.002 | 0.002 | 1.48× |
| 10 000 | 0.115 | 0.038 | 2.98× |
| 50 000 | 1.377 | 0.401 | 3.44× |
| 100 000 | 3.342 | 1.074 | 3.11× |
| 500 000 | 22.149 | 8.819 | 2.51× |
| 1 000 000 | 48.767 | 20.711 | 2.35× |
| 2 000 000 | 106.201 | 47.759 | 2.22× |
| 5 000 000 | 269.017 | 148.432 | 1.81× |
| 10 000 000 | 583.973 | 349.941 | 1.67× |
| 20 000 000 | 1 263.926 | 803.500 | 1.57× |
| 50 000 000 | 5 372.849 | 2 664.778 | 2.02× |

**Observation:** narrowest gap at 1k (the linear leaf, where GM div2by1 already runs); peaks at 100k (D&C overhead + Newton recip setup not yet amortized); narrows again from 200k onward as D&C asymptotic + NTT inheriting from multiplication's overtake compound — **8.31× at 100k → 2.59× at 20M**.

### ToString focused warm benchmark

Expand Down Expand Up @@ -310,17 +310,17 @@ Balanced operands (e.g. `100.10` = 100 integer + 10 fractional digits):
| Add | 100.10 | 0.000 | 0.000 | 0.00× |
| Add | 1000.100 | 0.000 | 0.000 | 0.00× |
| Add | 5000.500 | 0.001 | 0.000 | 5.66× |
| Add | 20000.2000 | 0.003 | 0.001 | 5.00× |
| Add | 20000.2000 | 0.002 | 0.001 | 4.50× |
|---|---|---|---|---|
| Mul | 100.10 | 0.000 | 0.000 | 3.95× |
| Mul | 1000.100 | 0.003 | 0.001 | 2.33× |
| Mul | 5000.500 | 0.038 | 0.018 | 2.12× |
| Mul | 20000.2000 | 0.345 | 0.091 | 3.78× |
| Mul | 100.10 | 0.000 | 0.000 | 3.05× |
| Mul | 1000.100 | 0.003 | 0.001 | 2.31× |
| Mul | 5000.500 | 0.038 | 0.018 | 2.11× |
| Mul | 20000.2000 | 0.351 | 0.092 | 3.82× |
|---|---|---|---|---|
| Div | 100.10 (10 dp) | 0.001 | 0.000 | 6.53× |
| Div | 1000.100 (100 dp) | 0.003 | 0.002 | 1.41× |
| Div | **5000.500 (500 dp)** | **0.023** | **0.026** | **0.89×** ← BigMath faster |
| Div | 20000.2000 (2000 dp) | 0.233 | 0.198 | 1.18× |
| Div | 100.10 (10 dp) | 0.001 | 0.000 | 8.33× |
| Div | 1000.100 (100 dp) | 0.003 | 0.002 | 1.47× |
| Div | **5000.500 (500 dp)** | **0.023** | **0.025** | **0.92×** ← BigMath faster |
| Div | 20000.2000 (2000 dp) | 0.228 | 0.194 | 1.17× |

Division at varying target scales (operand = 2000 integer + 200 fractional digits):

Expand All @@ -338,19 +338,19 @@ Parse (`string → BigDecimal`):

| size (digits) | BigMath ms | GMP (mpf) ms | BM/GMP |
|---|---:|---:|---:|
| 100 | 0.001 | 0.001 | 1.00× |
| 1 000 | 0.005 | 0.005 | 1.10× |
| 10 000 | 0.138 | 0.107 | 1.29× |
| 50 000 | 1.504 | 1.017 | 1.48× |
| 100 | 0.001 | 0.000 | 1.09× |
| 1 000 | 0.005 | 0.004 | 1.09× |
| 10 000 | 0.134 | 0.105 | 1.28× |
| 50 000 | 1.450 | 1.015 | 1.43× |

ToString (`BigDecimal → string`):

| size (digits) | BigMath ms | GMP (mpf) ms | BM/GMP |
|---|---:|---:|---:|
| 100 | 0.000 | 0.000 | 0.64× |
| 1 000 | 0.006 | 0.004 | 1.41× |
| 10 000 | 0.269 | 0.102 | 2.63× |
| 50 000 | 4.003 | 1.094 | 3.66× |
| 100 | 0.000 | 0.000 | 0.82× |
| 1 000 | 0.006 | 0.005 | 1.39× |
| 10 000 | 0.269 | 0.102 | 2.64× |
| 50 000 | 3.976 | 1.101 | 3.61× |

**Observations:**
- BigDecimal division beats GMP at 5000.500 to 500 dp (0.023 vs 0.026 ms).
Expand All @@ -361,14 +361,14 @@ ToString (`BigDecimal → string`):

## Headline summary

- **Multiplication 5M-20M balanced:** BigMath is faster at 5M and 10M, then slips back to parity at 20M. The current 10M balanced row is **106.856 ms vs 209.888 ms**, or 0.51×.
- **Multiplication ≥50M balanced:** GMP still wins via SSA. At 200M the gap widens again to 2.16× on this snapshot.
- **Multiplication skewed:** parity around 500k×50k and 1M×100k, with a slight BigMath lead at 2M×200k. BigMath falls behind at 50M×5M, 100M×10M, and 200M×20M.
- **Division skewed:** 1.86×-6.51× behind GMP in the main band, with the new 200M×40M row at 2.85×. Worst remains 200k×50k (BZ band). The large-multiplication speedups still flow through Newton, but the structure overhead never disappears.
- **Multiplication 5M-20M balanced:** BigMath is faster at 5M and 10M, then slips back to slightly behind GMP at 20M. The current 10M balanced row is **102.150 ms vs 209.436 ms**, or 0.49×.
- **Multiplication ≥50M balanced:** GMP still wins via SSA. At 200M the gap is 2.00× on this snapshot.
- **Multiplication skewed:** parity around 500k×50k and 1M×100k, with parity at 2M×200k. BigMath wins at 50M×5M, but falls behind at 100M×10M and 200M×20M.
- **Division skewed:** 1.91×-6.77× behind GMP in the main band, with the 200M×40M row at 2.85×. Worst remains 200k×50k (BZ band). The large-multiplication speedups still flow through Newton, but the structure overhead never disappears.
- **Division balanced 5M×5M:** PR #56 fix routes degenerate-quotient cases to FastDivision; 27.03× → 7.20×.
- **Parse:** **1.56× at 20M** (best), widens to 1.92× at 50M as GMP's SSA path dominates.
- **ToString:** 2.52× at 20M, narrowing from an 8.40× peak at 100k.
- **BigDecimal division:** beats GMP at small target scales (0.20-1.00×) and at the 500 dp near-parity point (0.89×).
- **Parse:** **1.57× at 20M** (best), widens to 2.02× at 50M as GMP's SSA path dominates.
- **ToString:** 2.59× at 20M, narrowing from an 8.31× peak at 100k.
- **BigDecimal division:** beats GMP at small target scales (0.20-0.87×) and at the 500 dp near-parity point (0.92×).

For optimizations considered and rejected with measurement evidence, see the **Explored but rejected** sections of each subsystem doc. The 2026-05 optimization stack (LIMB_64 + CRT NTT + threading + M-G reciprocals + BZ Knuth fix + degenerate-quotient guard + radix-4/radix-8 fused butterflies + MFA) closed the GMP gap by 3-5× across every band up to the 20M crossover; past 20M, GMP's SSA still wins but the loss factor halved.

Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,26 +117,26 @@ Apple M1 Max, vs GMP 6.3.0, `-O3 -march=native`, full default stack (`BIGMATH_LI
|---|---|---:|---:|---:|
| mul | 100 000 × 100 000 | 1.35 ms | 0.78 ms | 1.72× |
| mul | 1 000 000 × 1 000 000 | 10.5 ms | 9.06 ms | 1.16× |
| mul | 2 000 000 × 2 000 000 | 21.9 ms | 20.6 ms | 1.07× |
| mul | **5 000 000 × 5 000 000** | **46.5 ms** | **63.5 ms** | **0.73×** ← BigMath faster |
| mul | **10 000 000 × 10 000 000** | **105 ms** | **212 ms** | **0.50×** ← BigMath 2.01× faster |
| mul | 20 000 000 × 20 000 000 | 279 ms | 278 ms | 1.00× ← parity |
| mul | 50 000 000 × 50 000 000 | 1 231 ms | 660 ms | 1.86× ← GMP SSA recovers |
| mul | 100 000 000 × 100 000 000 | 2 832 ms | 1 391 ms | 2.04× |
| mul (skewed) | **1 000 000 / 100 000** | **4.47 ms** | **4.79 ms** | **0.93×** ← BigMath faster |
| mul (skewed) | 2 000 000 / 200 000 | 9.43 ms | 9.45 ms | 1.00× ← parity |
| mul (skewed) | 10 000 000 / 1 000 000 | 88.4 ms | 70.5 ms | 1.25× |
| mul (skewed) | 50 000 000 / 5 000 000 | 667 ms | 666 ms | 1.00×parity |
| div (skewed) | 500 000 / 100 000 | 18.0 ms | 4.63 ms | 3.89× |
| div (skewed) | 10 000 000 / 2 000 000 | 427 ms | 154 ms | 2.78× |
| div (skewed) | 50 000 000 / 10 000 000 | 2 500 ms | 1 299 ms | **1.92×** |
| parse | 1 000 000 digits | 48.7 ms | 20.3 ms | 2.40× |
| parse | 20 000 000 digits | 1 252 ms | 803 ms | **1.56×** |
| ToString | 100 000 digits | 19.5 ms | 2.33 ms | 8.35× |
| ToString | 1 000 000 digits | 224 ms | 49.8 ms | 4.50× |
| ToString | 20 000 000 digits | 5 437 ms | 2 116 ms | **2.57×** |

**BigMath beats GMP on balanced multiplication across the 5M–10M digit band** and is roughly parity at 20M. The current peak is **10M balanced at 2.01× faster than GMP** (105 ms vs 212 ms). The MFA threshold retune improved that row by avoiding early MFA; at ≥50M GMP's Schönhage-Strassen still recovers. Skewed multiplication is a BigMath win around 1M×100k and parity at 2M×200k and 50M×5M, with the dispatcher now avoiding Toom-3 on 2:1+ skewed inputs in the pre-NTT band. Skewed division at 50M×10M is **1.92×** as Newton inherits the large-multiplication speedups. ToString narrows from 8.35× at 100k to 2.57× at 20M; parse to 1.56× at 20M. See [BENCHMARK.md](BENCHMARK.md) for the full table or the per-doc ratio tables for the breakdown.
| mul | 2 000 000 × 2 000 000 | 21.3 ms | 20.5 ms | 1.04× |
| mul | **5 000 000 × 5 000 000** | **47.2 ms** | **62.9 ms** | **0.75×** ← BigMath faster |
| mul | **10 000 000 × 10 000 000** | **102 ms** | **209 ms** | **0.49×** ← BigMath 2.05× faster |
| mul | 20 000 000 × 20 000 000 | 306 ms | 272 ms | 1.12× ← GMP faster |
| mul | 50 000 000 × 50 000 000 | 1 239 ms | 669 ms | 1.85× ← GMP SSA recovers |
| mul | 100 000 000 × 100 000 000 | 2 818 ms | 1 481 ms | 1.90× |
| mul (skewed) | **1 000 000 / 100 000** | **4.30 ms** | **4.52 ms** | **0.95×** ← BigMath faster |
| mul (skewed) | 2 000 000 / 200 000 | 9.37 ms | 9.36 ms | 1.00× ← parity |
| mul (skewed) | 10 000 000 / 1 000 000 | 97.7 ms | 70.0 ms | 1.40× |
| mul (skewed) | 50 000 000 / 5 000 000 | 543 ms | 699 ms | **0.78×**BigMath faster |
| div (skewed) | 500 000 / 100 000 | 17.4 ms | 4.55 ms | 3.82× |
| div (skewed) | 10 000 000 / 2 000 000 | 424 ms | 153 ms | 2.77× |
| div (skewed) | 50 000 000 / 10 000 000 | 2 482 ms | 1 300 ms | **1.91×** |
| parse | 1 000 000 digits | 49.0 ms | 20.7 ms | 2.36× |
| parse | 20 000 000 digits | 1 264 ms | 804 ms | **1.57×** |
| ToString | 100 000 digits | 20.0 ms | 2.40 ms | 8.31× |
| ToString | 1 000 000 digits | 227 ms | 49.7 ms | 4.57× |
| ToString | 20 000 000 digits | 5 529 ms | 2 133 ms | **2.59×** |

**BigMath beats GMP on balanced multiplication across the 5M–10M digit band** and is roughly parity at 20M. The current peak is **10M balanced at 2.05× faster than GMP** (102 ms vs 209 ms). The MFA threshold retune improved that row by avoiding early MFA; at ≥50M GMP's Schönhage-Strassen still recovers. Skewed multiplication is a BigMath win around 1M×100k and parity at 2M×200k, with BigMath also ahead again at 50M×5M. Skewed division at 50M×10M is **1.91×** as Newton inherits the large-multiplication speedups. ToString narrows from 8.31× at 100k to 2.59× at 20M; parse to 1.57× at 20M. See [BENCHMARK.md](BENCHMARK.md) for the full table or the per-doc ratio tables for the breakdown.

Opt-out flags (`-DBIGMATH_USE_THREADS=0` / `-DBIGMATH_NTT_CRT=0` / `-DBIGMATH_LIMB_64=0`) revert any subset of the defaults — useful for embedded targets, header-only-strict consumers, or A/B comparison.

Expand Down
Loading