Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,14 @@ CI (`.github/workflows/qa.yaml`) runs `nanoclaw-task qa-agent` on PRs against `o
**Division dispatch** (`algorithms/Division.h`, thresholds defined in `src/algorithms/Division.cpp`):
- `NewtonDivision` (Newton-Raphson reciprocal, O(M(n)); handles arbitrary `na/nb` via blockwise mode — top chunk in [n+1, 2n], slide down by n, thread the remainder) when any skew band holds:
- `b ≥ 4096` at ratio ≥ 3 (`NEWTON_SKEW` 3/1), or
- `b ≥ 98304` at ratio ≥ 2 (`NEWTON_BALANCED` 2/1 — the near-balanced band, PR #79), or
- `b ≥ 98304` at ratio ≥ 4/3 (`NEWTON_BALANCED` 4/3 — the near-balanced band, PR #79, lowered from 2/1 2026-06-11), or
- `b ≥ 2048` at ratio ≥ 8 (`NEWTON_HIGH_SKEW` 8/1).
- else `BurnikelZieglerDivision` for power-of-two base when `b > 512` and the BZ band fits (near-balanced `b ≥ 1024, b+32 ≤ a ≤ 3b`, or big-and-skewed `a > 2048 && a > 3b`).
- otherwise multi-limb → `FastDivision` (Knuth Algorithm D variant)
- single-limb divisor → `ClassicDivision`
- `KnuthDivision` and `ReciprocalDivision` are alternates used by correctness tests for cross-checking.

The balanced band exists because BZ's recursive 2n/n halving lands intermediate NTT multiplies just over power-of-2 transform-length boundaries for non-power-of-2 divisor sizes, blowing up 5–60× vs Newton (worst at `n = 2^k+1`); Newton pads once and stays flat. **Known residual:** ratio ∈ (1, 2) at large `b` still routes to BZ and hits the same blowup (~2.7× slower than Newton would be at ratio 1.5) — the balanced band's `a ≥ 2b` lower bound doesn't cover it yet.
The balanced band exists because BZ's recursive 2n/n halving lands intermediate NTT multiplies just over power-of-2 transform-length boundaries for non-power-of-2 divisor sizes, blowing up 5–60× vs Newton (worst at `n = 2^k+1`); Newton pads once and stays flat. **Known residual:** ratio ∈ (1, 4/3) at large `b` still routes to BZ; generic sizes are fine there (BZ wins below the ~1.3 crossover) but `2^k+1`-family divisor sizes still blow up — the proper fix for that range is quotient-sized division.

When adding a new algorithm, slot the implementation under `algorithms/<op>/<Name>.h`, then update the dispatch in `algorithms/<Op>.h` — the thresholds there are the only place size cutoffs live.

Expand Down
4 changes: 2 additions & 2 deletions docs/DIVISION.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ flowchart TD
C -- no --> D{Compare&#40;a, b&#41;}
D -- a == b --> Eq[return &#123;1, 0&#125;]
D -- a &lt; b --> Less[return &#123;0, a&#125;]
D -- a &gt; b --> E{Newton band?<br/>b ≥ 4096 and a ≥ 3b<br/>OR b ≥ 98304 and a2b<br/>OR b ≥ 2048 and a ≥ 8b}
D -- a &gt; b --> E{Newton band?<br/>b ≥ 4096 and a ≥ 3b<br/>OR b ≥ 98304 and 3a4b<br/>OR b ≥ 2048 and a ≥ 8b}
E -- yes --> N[NewtonDivision]
E -- no --> F{Power-of-two base<br/>AND b.size &gt; 512<br/>AND BZ shape fits?}
F -- yes --> BZ[BurnikelZieglerDivision]
Expand Down Expand Up @@ -105,7 +105,7 @@ The current dispatch logic, paraphrased:

The ordering matters: Newton wins on **large skewed** problems because the per-divisor reciprocal setup amortizes over multiple chunks. BZ wins on **mid-size near-balanced** problems where its 2n/n recursion structure beats both FastDivision and Newton's setup cost.

**Near-balanced band (PR #79).** Above `NEWTON_BALANCED_B` (98304 limbs), ratio-≥2 division goes to Newton instead of BZ. BZ's recursive 2n/n halving lands its intermediate NTT multiplies just over power-of-2 transform-length boundaries for non-power-of-2 divisor sizes — the FFT length doubles and the constant factor compounds across recursion depth into a **5–60× slowdown vs Newton**, worst at `n = 2^k + 1` (measured ~90 s for a 262145-limb divisor vs Newton's ~0.9 s). Newton pads once to the working size and stays flat. Exact-power-of-2 divisor sizes are BZ's best case (it ties Newton); they regress ~4 % under this band but are rare in practice. **Known residual:** ratio ∈ (1, 2) at large `b` still routes to BZ and hits the same blowup (~2.7× slower than Newton at ratio 1.5); the band's `a ≥ 2b` lower bound does not cover it. FastDivision is the default workhorse for everything else.
**Near-balanced band (PR #79, lowered to 4/3 on 2026-06-11).** Above `NEWTON_BALANCED_B` (98304 limbs), ratio-≥-4/3 division goes to Newton instead of BZ. BZ's recursive 2n/n halving lands its intermediate NTT multiplies just over power-of-2 transform-length boundaries for non-power-of-2 divisor sizes — the FFT length doubles and the constant factor compounds across recursion depth into a **5–60× slowdown vs Newton**, worst at `n = 2^k + 1` (measured ~90 s for a 262145-limb divisor vs Newton's ~0.9 s). Newton pads once to the working size and stays flat. Exact-power-of-2 divisor sizes are BZ's best case (it ties Newton); they regress ~4 % under this band but are rare in practice. **Band lower edge re-measured 2026-06-11** after the wraparound-Newton PRs (#85–#87) cut Newton's constant: the generic BZ/Newton crossover moved from ~2 down to ~1.3 (nb 100k–200k limbs: Newton wins from ratio 1.3–1.35 up; at 1.5 BZ is 1.7–1.9× slower, at 1.95 up to 3.4×). Band lowered `2/1 → 4/3`. Exact-power-of-2 divisors (BZ's best case) regress ≤ ~25% in the narrow (4/3, 1.4) sliver — same accepted tradeoff as PR #79. **Known residual:** ratio ∈ (1, 4/3) still routes to BZ; fine for generic sizes (BZ genuinely wins below the crossover) but `2^k+1`-family divisor sizes still hit the transform-doubling blowup (measured 1.1–5.4 s vs Newton's 0.16 s at nb = 131073, ratios 1.05–1.25). The right fix there is quotient-sized division (divide the top ~2Δ limbs by the divisor's top Δ limbs when the quotient is short), which scales with the quotient instead of the divisor. FastDivision is the default workhorse for everything else.

`KnuthDivision` and `ReciprocalDivision` exist as alternate implementations used by correctness tests for cross-checking. They are not in the production dispatch path.

Expand Down
6 changes: 3 additions & 3 deletions include/biginteger/algorithms/Division.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
* 1. NewtonDivision (blockwise handles arbitrary ratio via reciprocal cache),
* when any of these skew bands hold:
* - b ≥ NEWTON_MEDIUM_B AND a ≥ NEWTON_SKEW (3/1) · b
* - b ≥ NEWTON_BALANCED_B AND a ≥ NEWTON_BALANCED (2/1) · b
* - b ≥ NEWTON_BALANCED_B AND a ≥ NEWTON_BALANCED (4/3) · b
* - b ≥ NEWTON_HIGH_SKEW_B AND a ≥ NEWTON_HIGH_SKEW (8/1) · b
* The balanced (ratio ≥ 2) band starts higher (96k limbs) because BZ wins
* near-balanced below that; above it BZ degrades erratically (measured
Expand Down Expand Up @@ -56,11 +56,11 @@ namespace BigMath
#endif

#ifndef BIGMATH_NEWTON_BALANCED_NUMERATOR
#define BIGMATH_NEWTON_BALANCED_NUMERATOR 2
#define BIGMATH_NEWTON_BALANCED_NUMERATOR 4
#endif

#ifndef BIGMATH_NEWTON_BALANCED_DENOMINATOR
#define BIGMATH_NEWTON_BALANCED_DENOMINATOR 1
#define BIGMATH_NEWTON_BALANCED_DENOMINATOR 3
#endif

#ifndef BIGMATH_NEWTON_HIGH_SKEW_B
Expand Down
2 changes: 1 addition & 1 deletion src/algorithms/Division.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ namespace BigMath
bool newton_medium_skew =
b.size() >= NEWTON_MEDIUM_B &&
NEWTON_SKEW_DENOMINATOR * a.size() >= NEWTON_SKEW_NUMERATOR * b.size();
// Near-balanced (ratio ≥ 2) band: only above NEWTON_BALANCED_B, where BZ's
// Near-balanced (ratio ≥ 4/3) band: only above NEWTON_BALANCED_B, where BZ's
// near-balanced path degrades erratically (measured 2×–4.5× slower than
// Newton at b ≥ 100k limbs); below it BZ wins, so leave it alone.
bool newton_balanced =
Expand Down
Loading