deflect is a small package for activation-space parameter-efficient adaptation of transformer
backbones. It mirrors the parts of Hugging Face PEFT that are
useful for this setting, while keeping model discovery explicit for DINO/timm-style backbones and
custom detection models.
The registered method is TRANSPORT, an original implementation of the DEFLECT method
(Thoreau, Marsocci & Derksen, 2025; arXiv:2503.09493), written
from the paper's equations. See method attribution below.
deflect is licensed under Apache-2.0; see LICENSE.
uv pip install -e . # core (torch only)
uv pip install -e ".[safetensors]" # add safetensors weight format
uv pip install -e ".[dev]" # add tests + lintersfrom deflect import get_deflect_model, TransportConfig
model = get_deflect_model(
base_model,
TransportConfig(target_blocks=[0, 1, 2], hidden_dim=8),
)
model.print_trainable_parameters()TRANSPORT is dual-stream. The wrapper does not invent a full model forward pass; caller code is
responsible for preprocessing the high-resolution embedding (T_map, HR-to-LR norm matching, HR CLS
prefix) and passing (lr_embedding, hr_embedding) through the adapted blocks. See the module docstring
at src/deflect/tuners/transport/model.py for the canonical caller pattern.
PEFT is built around weight-space adaptation (LoRA et al.) and assumes
HF Transformers conventions. DEFLECT modifies activations and targets DINO
backbones plus custom detection models, neither of which fits cleanly into
PEFT's HF-centric machinery (AutoModelFor*, Hub from_pretrained, regex
target_modules over standardised layer names).
Mirroring PEFT's shape keeps familiar entry points — get_*_model, *Config,
*Model, print_trainable_parameters, registry, and save/load layout — while
leaving model discovery explicit for non-HF backbones.
The package is intentionally narrow: it wraps compatible ViT blocks, manages named injected adapters, saves and loads adapter-scoped weights, and leaves the caller in control of the model-specific forward path.
- PEFT-like surface without PEFT assumptions. Users get familiar config/wrapper/save-load
patterns without requiring every backbone to be an
AutoModelFor*. - Small registry surface.
register_deflect_method,get_deflect_model, and task wrappers follow PEFT's shape, but the only registered method today isTRANSPORT. - Activation-space adaptation.
TRANSPORTwraps transformer blocks and operates on the low-resolution/high-resolution embedding streams used by DEFLECT. The base LR attention/projection weights stay frozen while the HR path and small auxiliary modules carry the trainable adaptation. - Adapter coexistence. The file layout deliberately avoids PEFT's filenames, so a
deflectadapter and apeftadapter can live in the same directory. - Narrow extension points. The package keeps the tuner, tuner layer, registry, and serialization boundaries explicit instead of building a broad auto-discovery layer.
The main entry points mirror PEFT naming where useful:
get_deflect_model(model, config)wraps a model and injects the configured tunerinject_adapter_in_model(config, model)mutates a model without the wrapperDeflectModel.save_pretrained(...)writes adapter-scoped config and weightsDeflectModel.from_pretrained(...)reloads an adapter into a compatible base modelregister_deflect_method(...)is the extension point for future tuners
The wrapper freezes the base model by default. modules_to_save can mark selected base submodules as
trainable and included in adapter checkpoints.
TRANSPORT expects a ViT-style block layout, either at model.blocks or model.backbone.blocks.
The blocks need the usual timm/DINO-style pieces such as norm1, attn.qkv, attention projection,
MLP, and embedding dimensions. The tests use synthetic ViT/detector fixtures to lock down the library
contract; real checkpoints still need to match that layout or be shimmed before wrapping.
pytest tests/ # 193 testsMost of the suite uses synthetic ViT and detector fixtures that match the required block layout, so it
stays fast and offline — validating registry behavior, adapter lifecycle, dual-stream block math,
save/load, safetensors fallback, and task wrapper dispatch. tests/test_e2e.py adds an end-to-end pass
on a real timm vit_small (random-init, offline; falls back to a constructed ViT if timm is absent)
that checks the DEFLECT equations against an independent oracle plus the full train/save/load lifecycle.
None of it claims real-checkpoint DINO integration coverage.
The synthetic tests lock down the library contract. Real DINO/timm checkpoints still need integration validation around positional embeddings, caller-side dual-stream forwarding, and any model-specific preprocessing.
| In scope | Out of scope |
|---|---|
TRANSPORT tuner (the DEFLECT method) |
HF Trainer (use Lightning) |
| DINOv2 / timm-style ViT block layouts | HF Hub push_to_hub / hub-id from_pretrained |
Custom detectors exposing model.backbone.blocks |
AutoPeftModel* family |
| Multi-adapter, save/load, unload hooks | Bitsandbytes / quantisation |
BaseTuner / BaseTunerLayer extension points |
Diffusers, mixed models |
The PEFT-shaped lifecycle exists for familiarity, but TRANSPORT is an activation-space tuner:
merge_adapter() is a no-op rather than true weight fusion, and overlapping adapters on the same
blocks should be treated as unsupported integration work.
Differs from PEFT so a deflect adapter and a peft adapter can coexist in the same directory:
| PEFT | deflect |
|---|---|
adapter_config.json |
deflect_config.json |
adapter_model.safetensors |
deflect_model.safetensors |
adapter_model.bin |
deflect_model.bin |
Adapter weights are saved per adapter subdirectory. Safetensors is used when available; otherwise the
package falls back to PyTorch .bin files.
The DEFLECT method — untangled dual-stream cross-attention, the per-token displacement (deflection) renormalisation, and the surrounding adapter design — was introduced by:
Romain Thoreau, Valerio Marsocci & Dawa Derksen. Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection. arXiv:2503.09493, 2025. arxiv.org/abs/2503.09493
The TRANSPORT tuner here (UntangledAttention, AdapterBlock, the T_map MLP,
and hr_cls_token) is an original implementation written from the paper's
equations (the untangled attention of Eqs. 9-10 and the deflection constraint
of Eq. 12). The paper is cited as the source of the method, not as a code
ancestor: this code is an independent expression and is not derived from any
third-party implementation of the method.
Cite the paper if you use this method:
@misc{thoreau2025deflect,
title = {Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection},
author = {Thoreau, Romain and Marsocci, Valerio and Derksen, Dawa},
year = {2025},
eprint = {2503.09493},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2503.09493},
}The package's API shape (config / wrapper / registry / save-load layout) mirrors Hugging Face PEFT (Apache-2.0).
deflect is licensed under the Apache License, Version 2.0; see
LICENSE and NOTICE.
The TRANSPORT tuner is an original implementation of the DEFLECT method written
from the paper (arXiv:2503.09493); the method is attributed to its authors above.
The PEFT-shaped API scaffolding follows Hugging Face PEFT, which is also
Apache-2.0.