Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
[workspace]
resolver = "2"
members = [
# The bench that matters — ports cashubtc/nutshell PR #999 (multiplicative
# blinding, blst + MPI peripheral). See README.md / RESULTS.md.
# The bench that matters — implements NUT-00's BLS12-381 (v3) protocol
# (cashubtc/nuts#371): multiplicative blinding, mandatory point validation,
# Fiat-Shamir batch weights, blst + MPI peripheral. See README.md / RESULTS.md.
"esp32c3-bench-blst",
# Original-ESP32 (Xtensa) MPI hardware microbench.
"esp32-bench-mpi",
# SUPERSEDED: an early mock with *additive* blinding (not PR #999) on the
# SUPERSEDED: an early mock with *additive* blinding (not NUT-00 v3) on the
# pure-Rust zkcrypto backend. Kept only for the historical pure-Rust-vs-blst
# comparison; see legacy/README.md and legacy/crypto/src/lib.rs.
"legacy/crypto",
Expand Down
68 changes: 40 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,54 @@
# bls-bench

Benchmarks the BLS12-381 cryptography that Cashu would need if it migrates
BDHKE from secp256k1 to BLS12-381 — tracking [cashubtc/nutshell PR #999][pr]
(`feat(crypto): migrate BDHKE to BLS12-381 (v3 keysets)`) — running on an
**ESP32-C3**, including a path that offloads the field arithmetic to the chip's
RSA/MPI peripheral.
BDHKE from secp256k1 to BLS12-381 — implementing [NUT-00's BLS12-381 (v3)
protocol][pr] (cashubtc/nuts PR #371, keysets with version byte `02`) — running
on an **ESP32-C3**, including a path that offloads the field arithmetic to the
chip's RSA/MPI peripheral.

[pr]: https://github.com/cashubtc/nutshell/pull/999
[pr]: https://github.com/cashubtc/nuts/pull/371

## The bench that matters: `esp32c3-bench-blst/`

This is the one to look at. It ports PR #999's BLS scheme faithfully (see
`esp32c3-bench-blst/src/main.rs` and the comparison notes below):
This is the one to look at. It implements NUT-00's BLS12-381 (v3) protocol
faithfully (see `esp32c3-bench-blst/src/main.rs`), and a startup gate checks it
against the spec's test vectors (`tests/00-tests.md`) byte-for-byte:

- **Multiplicative blinding** — `B' = r·Y`, `C' = a·B'`, `C = r⁻¹·C' = a·Y`.
- **Multiplicative blinding** — `B_ = r·Y`, `C_ = a·B_`, `C = r⁻¹·C_ = a·Y`.
No point additions in the BDHKE steps, no `− r·K` unblind.
- **Mint pubkey `K2 = a·G2` on G2 only** — 96-byte keyset keys; no G1 mint
key. (Additive blinding would need the key on *both* G1 and G2 = 144 bytes —
that's why the PR went multiplicative.)
- **Mint pubkey `K = a·G2` on G2 only** — 96-byte keyset keys; no G1 mint key.
(Additive blinding would need the key on *both* G1 and G2 = 144 bytes —
that's why the spec is multiplicative.)
- **Hash-to-G1 via RFC 9380 SSWU**, DST `CASHU_BLS12_381_G1_XMD:SHA-256_SSWU_RO_`.
- **Wallet verify** `e(C, G2) == e(Y, K2)`; **batch verify** collapses N proofs
into `1 + U` Miller loops (U = unique keysets) via random linear combinations.
- **DLEQ removed** — pairings make it redundant.
- **Mandatory point validation** (NUT-00 §Point Validation, flagged CRITICAL):
every received `B_`/`C_`/`C`/`K` is decompressed from canonical bytes and
rejected unless on-curve, non-identity, and in the prime-order subgroup
(`uncompress` + `in_g1`/`in_g2`). The mint validates `B_` before signing; the
wallet validates `K`, `C_`, and `C`.
- **Wallet verify** `e(C, G2) == e(Y, K)`; **batch verify** collapses N proofs
into `1 + U` Miller loops (U = unique keysets), with the random-linear-
combination weights derived deterministically via a Fiat-Shamir SHA-256
transcript + per-proof rejection sampling in `Fr*` (`BLS_BATCH_DST`).
- **No DLEQ for v3** — NUT-12 scopes DLEQ to secp256k1; the pairing check
replaces it.

**Backend:** `blst` 0.3.16, vendored + patched (`esp32c3-bench-blst/vendor/blst/`,
wired via `[patch.crates-io]`). On RV32IMC blst has no asm path, so it falls back
to portable C — and the patch routes blst's Montgomery multiply *and* squaring
(`mul_mont_n`, `mul_mont_nonred_n`, `sqr_mont_382x`) through the C3's RSA/MPI
peripheral via `mpi_mul_mont_n` in `esp32c3-bench-blst/src/mpi.rs`. The bench
prints a correctness gate (`keyed_verification` / `pairing_verification` must
round-trip) and a bit-exact MPI-vs-software `mul_mont` diagnostic.
prints a spec-conformance gate (the NUT-00 test vectors — `Y/K/B_/C_/C`, the
batch challenge, and the rejection-sampled weights — must match byte-for-byte)
and a bit-exact MPI-vs-software `mul_mont` diagnostic.

**Headline numbers** (ESP32-C3 rev v0.4 @ 160 MHz; full table in
[`RESULTS.md`](RESULTS.md)): portable-C blst does `bdhke_full_round` in 421 ms
and `pairing_verification` in 1.21 s; with the MPI peripheral those drop to
**71.5 ms** and **278 ms** — ~4-5× across the board. A 10-proof token verifies
in **~0.8-1.1 s** on the bare chip — under today's secp256k1+DLEQ wallet
(~1.5 s), no coprocessor.
[`RESULTS.md`](RESULTS.md)): portable-C blst does `bdhke_full_round` in 459 ms
and `pairing_verification` in 1.31 s; with the MPI peripheral those drop to
**104 ms** and **304 ms** — ~4.5× across the board. A typical 10-proof token
(all one keyset) batch-verifies in **~0.9 s** on the bare chip, a realistic
3–4-keyset mix in **~1.1-1.2 s** — at parity with today's secp256k1+DLEQ wallet
(~1.5 s for 10 proofs), no coprocessor. All figures now include the spec's
mandatory point validation on every received point.

## `esp32-bench-mpi/`

Expand All @@ -49,26 +61,26 @@ with `cargo +esp run --release` from that crate.
## ⚠️ Superseded: `legacy/`

The `legacy/crypto/`, `legacy/esp32c3-bench/`, `legacy/host-bench/` crates
predate PR #999 as a reference. They mock an **additive**-blinding BDHKE
(`B' = Y + r·G`, `C = C' − r·K`) — which is **not** what PR #999 does, and would
force the mint key onto both G1 (for the `− r·K` unblind) and G2 (for the
predate the spec. They mock an **additive**-blinding BDHKE
(`B' = Y + r·G`, `C = C' − r·K`) — which is **not** what NUT-00 (v3) does, and
would force the mint key onto both G1 (for the `− r·K` unblind) and G2 (for the
pairing check) = 144-byte keyset keys. They also still hash with a placeholder
DST and use the pure-Rust `bls12_381` (zkcrypto) backend rather than `blst`.

They live under [`legacy/`](legacy/) and are **kept only for the historical
pure-Rust-vs-`blst` per-primitive comparison** (the ~9-40×-per-op gap is
interesting; see `RESULTS.md`). Do not use them for protocol-accurate numbers —
use `esp32c3-bench-blst/` for anything that needs to match PR #999.
use `esp32c3-bench-blst/` for anything that needs to match the NUT-00 spec.

## Layout

```
bls-bench/
├── esp32c3-bench-blst/ ← the bench that matters: blst + MPI, matches PR #999
├── esp32c3-bench-blst/ ← the bench that matters: blst + MPI, matches NUT-00 v3
│ ├── src/mpi.rs MPI peripheral driver + mpi_mul_mont_n
│ └── vendor/blst/ vendored+patched blst 0.3.16
├── esp32-bench-mpi/ ← original-ESP32 (Xtensa) MPI hardware microbench
└── legacy/ ← ⚠️ SUPERSEDED — additive-blinding mock, not PR #999
└── legacy/ ← ⚠️ SUPERSEDED — additive-blinding mock, not NUT-00 v3
├── crypto/ additive-blinding BDHKE mock (zkcrypto backend)
├── esp32c3-bench/ runs the mock on ESP32-C3
└── host-bench/ criterion host baseline for the mock
Expand All @@ -82,7 +94,7 @@ serial. (`esp32-bench-mpi` targets the original ESP32, Xtensa LX6 @ 240 MHz.)
## Running

```bash
# The PR-#999-accurate bench (board connected, espflash installed)
# The NUT-00-v3-accurate bench (board connected, espflash installed)
cd esp32c3-bench-blst && cargo run --release

# Original-ESP32 MPI hardware microbench (also needs `espup install`)
Expand Down
Loading