Skip to content

AstekGroup/pulse-cerberus

Repository files navigation

pulse-cerberus

A Rust/PyO3 accelerator for cerberus — an iso-functional, drop-in Validator whose validate() hot path runs in native code, with transparent fallback to cerberus for anything outside its fast path. It's a separate package that depends on cerberus and delegates back to it, so it never diverges on input cerberus accepts — by construction.

pip install pulse-cerberus
# was:  from cerberus import Validator
from pulse_cerberus import Validator

schema = {
    "id":   {"type": "integer", "required": True, "min": 1},
    "name": {"type": "string", "required": True, "minlength": 1, "maxlength": 64},
    "role": {"type": "string", "allowed": ["admin", "user", "guest"]},
    "tags": {"type": "list", "schema": {"type": "string"}},
    "addr": {"type": "dict", "schema": {"city": {"type": "string"}}},
}

v = Validator(schema)          # compiled once
v.validate({"id": 1, "name": "alice", "role": "admin"})   # → True, ~200× faster
v.errors                        # same .errors tree as cerberus, byte-for-byte

The only change is the import. Validator, .validate(), .validated(), .errors, .document, SchemaError, DocumentError, registries — the cerberus API works unchanged.

Performance

cerberus's Validator.validate is interpreted Python: per call it re-dispatches every rule, spawns child-validators for nested schemas, and re-expands the schema. pulse-cerberus compiles the schema once into a Rust rule-AST and then validates flatly.

Drift-immune A/B (median per call), pre-built validator, realistic schema (8 fields incl. a nested dict and a list), CPython 3.11, Apple Silicon:

median / validate() speedup
cerberus.Validator.validate ~245 µs
pulse_cerberus.Validator.validate ~1.2 µs ~×200

The striking part: cerberus pays a large fixed per-call cost (rule dispatch + child-validator spawning

  • schema re-expansion) on every validate(), even for a small document. That interpreted machinery is exactly what a native validator removes. Reproduce it:
import statistics, time, cerberus, pulse_cerberus

schema = {"id": {"type": "integer", "required": True, "min": 1},
          "name": {"type": "string", "required": True, "minlength": 1, "maxlength": 64},
          "role": {"type": "string", "allowed": ["admin", "user", "guest"]},
          "tags": {"type": "list", "schema": {"type": "string"}},
          "addr": {"type": "dict", "schema": {"city": {"type": "string"}}}}
doc = {"id": 1, "name": "alice", "role": "admin", "tags": ["x"], "addr": {"city": "Paris"}}

ref  = cerberus.Validator(schema)        # build once (validators are meant to be reused)
cand = pulse_cerberus.Validator(schema)

def bench(v, reps, rounds=15):
    out = []
    for _ in range(rounds):
        t = time.perf_counter()
        for _ in range(reps): v.validate(doc)
        out.append((time.perf_counter() - t) / reps)
    return statistics.median(out)

r = bench(ref, 2000); c = bench(cand, 20000)
print(f"cerberus {r*1e6:.1f} us  ->  pulse {c*1e6:.2f} us   (x{r/c:.0f})")

Why Rust wins here (and why this is not a false friend)

Profiling validate(): the hot path is interpreted Python (__validate_definitions / __get_rule_handler per rule×field, child-validator spawning, schema re-expansion), 0 % in the C re engine. The ~38 % that shows up as "C" is isinstance / abc.__instancecheck__ / dict.get — dispatch glue that simply evaporates in typed Rust. Because the bottleneck is interpreted Python (not a C-bound kernel), a native rewrite wins by orders of magnitude.

What's covered natively (and what falls back)

pulse-cerberus validates natively when the schema uses only:

  • types integer / float / number / boolean / string / dict / list (with cerberus's exact bool semantics: integer/float accept True, number excludes it);
  • rules required, allowed, min, max, minlength, maxlength, empty, nullable, and nested schema (dict + list-of), with allow_unknown and require_all.

Everything else is transparently delegated to cerberus: normalization (coerce/default/rename/ purge_unknown/readonly), logic (*of, dependencies, excludes), check_with, keysrules/ valuesrules, items, contains, regex, allow_unknown as a rules-set, registries, custom Validator subclasses, non-dict documents, and exotic value types. A SchemaError is raised at construction exactly as cerberus would. When in doubt, it falls back — it never guesses.

The error messages are rendered through Python's str() (cerberus formats its own messages the same way), so the .errors tree — structure, messages, and per-field alphabetical-by-rule order — is identical to cerberus's.

Iso-functionality

Proven by a typed differential oracle comparing (validate(), .errors, .document) against stock cerberus, on a curated corpus of the iso-critical cases plus adversarial fuzzing of random (schema, doc) pairs — including the bool subtleties, error ordering, nested/list-of error trees, the empty/min interaction, and exception parity (SchemaError/DocumentError). The pure-Python fallback path is verified to be iso too (PULSE_FORCE_FALLBACK=1).

Wheels

abi3 wheels (Python ≥ 3.11) for Linux (x86_64/aarch64, manylinux + musllinux), macOS (Apple Silicon), and Windows; sdist elsewhere (builds the Rust core via maturin).

License

ISC (same as cerberus).

About

Rust/PyO3 accelerator for cerberus — iso-functional Validator drop-in, ~200× faster, with transparent fallback to cerberus.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors