Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,15 @@ CI (`.github/workflows/qa.yaml`) runs `nanoclaw-task qa-agent` on PRs against `o
**Division dispatch** (`algorithms/Division.h`, thresholds defined in `src/algorithms/Division.cpp`):
- `NewtonDivision` (Newton-Raphson reciprocal, O(M(n)); handles arbitrary `na/nb` via blockwise mode — top chunk in [n+1, 2n], slide down by n, thread the remainder) when any skew band holds:
- `b ≥ 4096` at ratio ≥ 3 (`NEWTON_SKEW` 3/1), or
- `b ≥ 98304` at ratio ≥ 4/3 (`NEWTON_BALANCED` 4/3 — the near-balanced band, PR #79, lowered from 2/1 2026-06-11), or
- `b ≥ 24576` at ratio ≥ 4/3 (`NEWTON_BALANCED` near-balanced band; ratio lowered from 2/1 and floor from 98304 on 2026-06-11), or
- `b ≥ 2048` at ratio ≥ 8 (`NEWTON_HIGH_SKEW` 8/1).
- `QuotientSizedDivision` when `b ≥ 24576`, `a ≥ b + 64`, and ratio < 4/3: divides the operand TOPS (`t = Δ+4` limbs of b, `Δ+t` of a) for the (Δ+1)-limb quotient, then one Δ×nb back-multiply for the remainder — cost scales with the quotient, not the divisor.
- else `BurnikelZieglerDivision` for power-of-two base when `b > 512` and the BZ band fits (near-balanced `b ≥ 1024, b+32 ≤ a ≤ 3b`, or big-and-skewed `a > 2048 && a > 3b`).
- otherwise multi-limb → `FastDivision` (Knuth Algorithm D variant)
- single-limb divisor → `ClassicDivision`
- `KnuthDivision` and `ReciprocalDivision` are alternates used by correctness tests for cross-checking.

The balanced band exists because BZ's recursive 2n/n halving lands intermediate NTT multiplies just over power-of-2 transform-length boundaries for non-power-of-2 divisor sizes, blowing up 5–60× vs Newton (worst at `n = 2^k+1`); Newton pads once and stays flat. **Known residual:** ratio ∈ (1, 4/3) at large `b` still routes to BZ; generic sizes are fine there (BZ wins below the ~1.3 crossover) but `2^k+1`-family divisor sizes still blow up — the proper fix for that range is quotient-sized division.
The balanced band exists because BZ's recursive 2n/n halving lands intermediate NTT multiplies just over power-of-2 transform-length boundaries for non-power-of-2 divisor sizes, blowing up 5–60× vs Newton (worst at `n = 2^k+1`); Newton pads once and stays flat. The former ratio ∈ (1, 4/3) residual is closed by `QuotientSizedDivision` (2026-06-11): the `2^k+1`-family blowup shapes went from seconds to tens of ms, and it beats BZ even at BZ's exact-power-of-2 best case.

When adding a new algorithm, slot the implementation under `algorithms/<op>/<Name>.h`, then update the dispatch in `algorithms/<Op>.h` — the thresholds there are the only place size cutoffs live.

Expand Down
6 changes: 4 additions & 2 deletions docs/DIVISION.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ flowchart TD
C -- no --> D{Compare&#40;a, b&#41;}
D -- a == b --> Eq[return &#123;1, 0&#125;]
D -- a &lt; b --> Less[return &#123;0, a&#125;]
D -- a &gt; b --> E{Newton band?<br/>b ≥ 4096 and a ≥ 3b<br/>OR b ≥ 98304 and 3a ≥ 4b<br/>OR b ≥ 2048 and a ≥ 8b}
D -- a &gt; b --> E{Newton band?<br/>b ≥ 4096 and a ≥ 3b<br/>OR b ≥ 24576 and 3a ≥ 4b<br/>OR b ≥ 2048 and a ≥ 8b}
E -- yes --> N[NewtonDivision]
E -- no --> F{Power-of-two base<br/>AND b.size &gt; 512<br/>AND BZ shape fits?}
F -- yes --> BZ[BurnikelZieglerDivision]
Expand Down Expand Up @@ -105,7 +105,9 @@ The current dispatch logic, paraphrased:

The ordering matters: Newton wins on **large skewed** problems because the per-divisor reciprocal setup amortizes over multiple chunks. BZ wins on **mid-size near-balanced** problems where its 2n/n recursion structure beats both FastDivision and Newton's setup cost.

**Near-balanced band (PR #79, lowered to 4/3 on 2026-06-11).** Above `NEWTON_BALANCED_B` (98304 limbs), ratio-≥-4/3 division goes to Newton instead of BZ. BZ's recursive 2n/n halving lands its intermediate NTT multiplies just over power-of-2 transform-length boundaries for non-power-of-2 divisor sizes — the FFT length doubles and the constant factor compounds across recursion depth into a **5–60× slowdown vs Newton**, worst at `n = 2^k + 1` (measured ~90 s for a 262145-limb divisor vs Newton's ~0.9 s). Newton pads once to the working size and stays flat. Exact-power-of-2 divisor sizes are BZ's best case (it ties Newton); they regress ~4 % under this band but are rare in practice. **Band lower edge re-measured 2026-06-11** after the wraparound-Newton PRs (#85–#87) cut Newton's constant: the generic BZ/Newton crossover moved from ~2 down to ~1.3 (nb 100k–200k limbs: Newton wins from ratio 1.3–1.35 up; at 1.5 BZ is 1.7–1.9× slower, at 1.95 up to 3.4×). Band lowered `2/1 → 4/3`. Exact-power-of-2 divisors (BZ's best case) regress ≤ ~25% in the narrow (4/3, 1.4) sliver — same accepted tradeoff as PR #79. **Known residual:** ratio ∈ (1, 4/3) still routes to BZ; fine for generic sizes (BZ genuinely wins below the crossover) but `2^k+1`-family divisor sizes still hit the transform-doubling blowup (measured 1.1–5.4 s vs Newton's 0.16 s at nb = 131073, ratios 1.05–1.25). The right fix there is quotient-sized division (divide the top ~2Δ limbs by the divisor's top Δ limbs when the quotient is short), which scales with the quotient instead of the divisor. FastDivision is the default workhorse for everything else.
**Near-balanced band (PR #79; ratio lowered to 4/3 and floor to 24576 limbs on 2026-06-11).** Above `NEWTON_BALANCED_B` (24576 limbs), ratio-≥-4/3 division goes to Newton instead of BZ. BZ's recursive 2n/n halving lands its intermediate NTT multiplies just over power-of-2 transform-length boundaries for non-power-of-2 divisor sizes — the FFT length doubles and the constant factor compounds across recursion depth into a **5–60× slowdown vs Newton**, worst at `n = 2^k + 1` (measured ~90 s for a 262145-limb divisor vs Newton's ~0.9 s). Newton pads once to the working size and stays flat. Exact-power-of-2 divisor sizes are BZ's best case (it ties Newton); they regress ~4 % under this band but are rare in practice. **Band re-measured 2026-06-11** after the wraparound-Newton PRs (#85–#87) cut Newton's constant: the generic BZ/Newton crossover moved from ratio ~2 down to ~1.3 at large b, and the size floor from ~100k down to ~24k limbs (ratio-1.4 crossover ≈ 24k limbs; ratio-2 crossover ≈ 6k). Band lowered `2/1 → 4/3` and `NEWTON_BALANCED_B` `98304 → 24576`. The former ratio ∈ (1, 4/3) residual is closed by **QuotientSizedDivision** (see below). FastDivision is the default workhorse for everything else.

**Short-quotient band (`QuotientSizedDivision`, 2026-06-11).** For `b ≥ NEWTON_BALANCED_B`, `a ≥ b + 64`, ratio < 4/3: with Δ = a.size − b.size and t = Δ+4, the (Δ+1)-limb quotient is determined to ±a few units by the operand TOPS — `q_est = floor((a >> B^(nb−t)) / (b >> B^(nb−t)))` (truncating b perturbs q by ≤ ~B^(2−GUARD), sub-ulp). The tops divide (ratio ~2, size ~2Δ/Δ) goes to Newton directly at t ≥ 6144 (measured: Newton beats BZ at ratio 2 from ~6k limbs — 13 vs 22 ms at 12k, 40 vs 299 ms at 30k, 0.16 s vs 5.3 s at 65537); the exact remainder then costs one Δ×nb back-multiply plus ±few fixups (cap 8, fallback Newton). Cost scales with the **quotient**, not the divisor. Measured (nb=131073, the 2^k+1 pathology): ratios 1.05/1.10/1.25 went from 1.07 s / 2.16 s / 5.35 s (BZ) to **28 / 43 / 71 ms** (38–75×). Generic nb=120000: 37/56/116 ms → 30/39/66 ms. It also beats BZ at BZ's exact-power-of-2 best case (131072 @1.25: 65 vs 88 ms) — no regression band.

`KnuthDivision` and `ReciprocalDivision` exist as alternate implementations used by correctness tests for cross-checking. They are not in the production dispatch path.

Expand Down
24 changes: 17 additions & 7 deletions include/biginteger/algorithms/Division.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,18 @@
* - b ≥ NEWTON_MEDIUM_B AND a ≥ NEWTON_SKEW (3/1) · b
* - b ≥ NEWTON_BALANCED_B AND a ≥ NEWTON_BALANCED (4/3) · b
* - b ≥ NEWTON_HIGH_SKEW_B AND a ≥ NEWTON_HIGH_SKEW (8/1) · b
* The balanced (ratio ≥ 2) band starts higher (96k limbs) because BZ wins
* near-balanced below that; above it BZ degrades erratically (measured
* 2×–4.5× slower at b ≥ 100k) while Newton stays smooth.
* 2. Power-of-two base AND b > BZ_DIVISOR_THRESHOLD AND BZ band fits
* The balanced (ratio ≥ 4/3) band starts at 24k limbs — the generic
* Newton/BZ crossover measured after the wraparound-Newton PRs (#85-#87);
* below it BZ wins near-balanced, above it BZ degrades erratically.
* 2. QuotientSizedDivision when b ≥ NEWTON_BALANCED_B, a ≥ b + 64, and
* ratio < 4/3 — short-quotient shapes where cost should scale with the
* quotient, not the divisor (and where BZ blows up 7-128× on
* 2^k+1-family divisor sizes).
* 3. Power-of-two base AND b > BZ_DIVISOR_THRESHOLD AND BZ band fits
* → BurnikelZieglerDivision (balanced 2n/n recursion)
* 3. else
* 4. else
* → FastDivision (Knuth Algorithm D, hybrid-64-bit basecase)
* 4. single-limb divisor inside the above → ClassicDivision
* 5. single-limb divisor inside the above → ClassicDivision
*
* Thresholds tunable at compile time via -DBIGMATH_*=N.
*
Expand All @@ -32,6 +36,7 @@
#include "division/ClassicDivision.h"
#include "division/FastDivision.h"
#include "division/NewtonDivision.h"
#include "division/QuotientSizedDivision.h"

namespace BigMath
{
Expand All @@ -52,7 +57,7 @@ namespace BigMath
#endif

#ifndef BIGMATH_NEWTON_BALANCED_B
#define BIGMATH_NEWTON_BALANCED_B 98304
#define BIGMATH_NEWTON_BALANCED_B 24576
#endif

#ifndef BIGMATH_NEWTON_BALANCED_NUMERATOR
Expand All @@ -63,6 +68,10 @@ namespace BigMath
#define BIGMATH_NEWTON_BALANCED_DENOMINATOR 3
#endif

#ifndef BIGMATH_QSIZED_MIN_DELTA
#define BIGMATH_QSIZED_MIN_DELTA 64
#endif

#ifndef BIGMATH_NEWTON_HIGH_SKEW_B
#define BIGMATH_NEWTON_HIGH_SKEW_B 2048
#endif
Expand All @@ -82,6 +91,7 @@ namespace BigMath
extern const SizeT NEWTON_BALANCED_B;
extern const SizeT NEWTON_BALANCED_NUMERATOR;
extern const SizeT NEWTON_BALANCED_DENOMINATOR;
extern const SizeT QSIZED_MIN_DELTA;
extern const SizeT NEWTON_HIGH_SKEW_B;
extern const SizeT NEWTON_HIGH_SKEW_NUMERATOR;
extern const SizeT NEWTON_HIGH_SKEW_DENOMINATOR;
Expand Down
102 changes: 102 additions & 0 deletions include/biginteger/algorithms/division/QuotientSizedDivision.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#ifndef QUOTIENT_SIZED_DIVISION
#define QUOTIENT_SIZED_DIVISION

#include <utility>
#include <vector>
using namespace std;

#include "../../common/Comparator.h"
#include "../../common/Util.h"
#include "../Addition.h"
#include "../Multiplication.h"
#include "../Subtraction.h"
#include "NewtonDivision.h"

namespace BigMath
{
// Top-level dispatcher (defined in src/algorithms/Division.cpp). The tops
// sub-division below routes through it so the recursion picks the right
// strategy for its own size; its ratio is ~2, so it can never re-enter the
// quotient-sized band (ratio < 4/3).
pair<vector<DataT>, vector<DataT>> DivideAndRemainder(
vector<DataT> const &a,
vector<DataT> const &b,
BaseT base,
bool computeRemainder);

// Short-quotient division. When Δ = a.size() − b.size() is small relative
// to the divisor, the quotient has only Δ+1 limbs and is determined to ±a
// few units by the operands' TOPS: with t = Δ + GUARD,
// q_est = floor( (a >> B^(nb−t)) / (b >> B^(nb−t)) )
// truncating b low limbs perturbs a/b by a relative ~B^(1−t), i.e. the
// (Δ+1)-limb quotient by ≤ ~B^(Δ+2−t) = B^(2−GUARD) — sub-ulp. The exact
// remainder then comes from ONE Δ×nb back-multiply plus ±few fixups.
//
// Cost: T(2Δ/Δ tops divide) + M(Δ, nb) + O(nb) — scales with the QUOTIENT,
// not the divisor. This is the right shape for ratio ∈ (1, 4/3) at large b,
// where full Newton pays an nb-sized reciprocal for a tiny quotient and
// BZ's recursion blows up 7–128× on 2^k+1-family divisor sizes.
class QuotientSizedDivision
{
public:
static constexpr SizeT GUARD = 4;

static pair<vector<DataT>, vector<DataT>> DivideAndRemainder(
vector<DataT> const &a,
vector<DataT> const &b,
BaseT base,
bool computeRemainder = true)
{
SizeT na = (SizeT)a.size();
SizeT nb = (SizeT)b.size();
// Caller contract (dispatch): a > b, and delta + GUARD < nb.
SizeT delta = na - nb;
SizeT t = delta + GUARD;
SizeT s = nb - t;

vector<DataT> a_top(a.begin() + s, a.end());
vector<DataT> b_top(b.begin() + s, b.end());

// Tops divide is ratio ~2. Post-#85-87 Newton beats BZ there from
// ~6k limbs (measured: 12k limbs 13 vs 22 ms, 30k 40 vs 299 ms,
// 65537 161 ms vs 5.3 s); below that the dispatcher's BZ/Fast pick
// is near-tied.
vector<DataT> q = (t >= 6144)
? NewtonDivision::DivideAndRemainder(a_top, b_top, base, false).first
: BigMath::DivideAndRemainder(a_top, b_top, base, false).first;

// Reconstruct: r = a − q·b, fixing q's ±few-unit estimate error.
vector<DataT> qb = Multiply(q, b, base);

static const vector<DataT> one{1};
const int FIXUP_LIMIT = 8;

int iters = 0;
while (Compare(qb, a) > 0)
{
if (++iters > FIXUP_LIMIT)
return NewtonDivision::DivideAndRemainder(a, b, base, computeRemainder);
q = Subtract(q, one, base);
qb = Subtract(qb, b, base);
}
vector<DataT> r = Subtract(a, qb, base);

iters = 0;
while (Compare(r, b) >= 0)
{
if (++iters > FIXUP_LIMIT)
return NewtonDivision::DivideAndRemainder(a, b, base, computeRemainder);
r = Subtract(r, b, base);
q = Add(q, one, base);
}

TrimZerosToOne(q);
if (!computeRemainder)
return {q, vector<DataT>()};
TrimZerosToOne(r);
return {q, r};
}
};
}

#endif
14 changes: 14 additions & 0 deletions src/algorithms/Division.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
*/

#include "biginteger/algorithms/Division.h"
#include "biginteger/algorithms/division/QuotientSizedDivision.h"

#include <stdexcept>

Expand All @@ -17,6 +18,7 @@ namespace BigMath
const SizeT NEWTON_BALANCED_B = BIGMATH_NEWTON_BALANCED_B;
const SizeT NEWTON_BALANCED_NUMERATOR = BIGMATH_NEWTON_BALANCED_NUMERATOR;
const SizeT NEWTON_BALANCED_DENOMINATOR = BIGMATH_NEWTON_BALANCED_DENOMINATOR;
const SizeT QSIZED_MIN_DELTA = BIGMATH_QSIZED_MIN_DELTA;
const SizeT NEWTON_HIGH_SKEW_B = BIGMATH_NEWTON_HIGH_SKEW_B;
const SizeT NEWTON_HIGH_SKEW_NUMERATOR = BIGMATH_NEWTON_HIGH_SKEW_NUMERATOR;
const SizeT NEWTON_HIGH_SKEW_DENOMINATOR = BIGMATH_NEWTON_HIGH_SKEW_DENOMINATOR;
Expand Down Expand Up @@ -60,6 +62,18 @@ namespace BigMath
if (newton_eligible)
return NewtonDivision::DivideAndRemainder(a, b, base, computeRemainder);

// Short-quotient band: large divisors below the Newton balanced band
// (ratio < 4/3) with a quotient big enough that FastDivision's O(n*delta)
// loses. BZ's near-balanced path blows up 7-128x on 2^k+1-family divisor
// sizes here; quotient-sized division scales with the quotient instead.
bool qsized_eligible =
(base == Base2_32 || base == Base2_64) &&
b.size() >= NEWTON_BALANCED_B &&
a.size() >= b.size() + QSIZED_MIN_DELTA &&
NEWTON_BALANCED_DENOMINATOR * a.size() < NEWTON_BALANCED_NUMERATOR * b.size();
if (qsized_eligible)
return QuotientSizedDivision::DivideAndRemainder(a, b, base, computeRemainder);

// BZ for large near-balanced divisors and for big-and-skewed cases.
// The +32-limb quotient-bulk guard in the near-balanced clause excludes
// degenerate cases where a ≈ b and the quotient is 0-2 limbs — BZ would
Expand Down
Loading