Skip to content

boschet/deflect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deflect

deflect is a small package for activation-space parameter-efficient adaptation of transformer backbones. It mirrors the parts of Hugging Face PEFT that are useful for this setting, while keeping model discovery explicit for DINO/timm-style backbones and custom detection models.

The registered method is TRANSPORT, an original implementation of the DEFLECT method (Thoreau, Marsocci & Derksen, 2025; arXiv:2503.09493), written from the paper's equations. See method attribution below.

deflect is licensed under Apache-2.0; see LICENSE.

Install

uv pip install -e .                  # core (torch only)
uv pip install -e ".[safetensors]"   # add safetensors weight format
uv pip install -e ".[dev]"           # add tests + linters

Usage

from deflect import get_deflect_model, TransportConfig

model = get_deflect_model(
    base_model,
    TransportConfig(target_blocks=[0, 1, 2], hidden_dim=8),
)
model.print_trainable_parameters()

TRANSPORT is dual-stream. The wrapper does not invent a full model forward pass; caller code is responsible for preprocessing the high-resolution embedding (T_map, HR-to-LR norm matching, HR CLS prefix) and passing (lr_embedding, hr_embedding) through the adapted blocks. See the module docstring at src/deflect/tuners/transport/model.py for the canonical caller pattern.

Design

PEFT is built around weight-space adaptation (LoRA et al.) and assumes HF Transformers conventions. DEFLECT modifies activations and targets DINO backbones plus custom detection models, neither of which fits cleanly into PEFT's HF-centric machinery (AutoModelFor*, Hub from_pretrained, regex target_modules over standardised layer names).

Mirroring PEFT's shape keeps familiar entry points — get_*_model, *Config, *Model, print_trainable_parameters, registry, and save/load layout — while leaving model discovery explicit for non-HF backbones.

The package is intentionally narrow: it wraps compatible ViT blocks, manages named injected adapters, saves and loads adapter-scoped weights, and leaves the caller in control of the model-specific forward path.

Features

  • PEFT-like surface without PEFT assumptions. Users get familiar config/wrapper/save-load patterns without requiring every backbone to be an AutoModelFor*.
  • Small registry surface. register_deflect_method, get_deflect_model, and task wrappers follow PEFT's shape, but the only registered method today is TRANSPORT.
  • Activation-space adaptation. TRANSPORT wraps transformer blocks and operates on the low-resolution/high-resolution embedding streams used by DEFLECT. The base LR attention/projection weights stay frozen while the HR path and small auxiliary modules carry the trainable adaptation.
  • Adapter coexistence. The file layout deliberately avoids PEFT's filenames, so a deflect adapter and a peft adapter can live in the same directory.
  • Narrow extension points. The package keeps the tuner, tuner layer, registry, and serialization boundaries explicit instead of building a broad auto-discovery layer.

API Surface

The main entry points mirror PEFT naming where useful:

  • get_deflect_model(model, config) wraps a model and injects the configured tuner
  • inject_adapter_in_model(config, model) mutates a model without the wrapper
  • DeflectModel.save_pretrained(...) writes adapter-scoped config and weights
  • DeflectModel.from_pretrained(...) reloads an adapter into a compatible base model
  • register_deflect_method(...) is the extension point for future tuners

The wrapper freezes the base model by default. modules_to_save can mark selected base submodules as trainable and included in adapter checkpoints.

Backbone contract

TRANSPORT expects a ViT-style block layout, either at model.blocks or model.backbone.blocks. The blocks need the usual timm/DINO-style pieces such as norm1, attn.qkv, attention projection, MLP, and embedding dimensions. The tests use synthetic ViT/detector fixtures to lock down the library contract; real checkpoints still need to match that layout or be shimmed before wrapping.

Tests

pytest tests/    # 193 tests

Most of the suite uses synthetic ViT and detector fixtures that match the required block layout, so it stays fast and offline — validating registry behavior, adapter lifecycle, dual-stream block math, save/load, safetensors fallback, and task wrapper dispatch. tests/test_e2e.py adds an end-to-end pass on a real timm vit_small (random-init, offline; falls back to a constructed ViT if timm is absent) that checks the DEFLECT equations against an independent oracle plus the full train/save/load lifecycle. None of it claims real-checkpoint DINO integration coverage.

The synthetic tests lock down the library contract. Real DINO/timm checkpoints still need integration validation around positional embeddings, caller-side dual-stream forwarding, and any model-specific preprocessing.

Scope

In scope Out of scope
TRANSPORT tuner (the DEFLECT method) HF Trainer (use Lightning)
DINOv2 / timm-style ViT block layouts HF Hub push_to_hub / hub-id from_pretrained
Custom detectors exposing model.backbone.blocks AutoPeftModel* family
Multi-adapter, save/load, unload hooks Bitsandbytes / quantisation
BaseTuner / BaseTunerLayer extension points Diffusers, mixed models

The PEFT-shaped lifecycle exists for familiarity, but TRANSPORT is an activation-space tuner: merge_adapter() is a no-op rather than true weight fusion, and overlapping adapters on the same blocks should be treated as unsupported integration work.

Filename layout

Differs from PEFT so a deflect adapter and a peft adapter can coexist in the same directory:

PEFT deflect
adapter_config.json deflect_config.json
adapter_model.safetensors deflect_model.safetensors
adapter_model.bin deflect_model.bin

Adapter weights are saved per adapter subdirectory. Safetensors is used when available; otherwise the package falls back to PyTorch .bin files.

Method attribution

The DEFLECT method — untangled dual-stream cross-attention, the per-token displacement (deflection) renormalisation, and the surrounding adapter design — was introduced by:

Romain Thoreau, Valerio Marsocci & Dawa Derksen. Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection. arXiv:2503.09493, 2025. arxiv.org/abs/2503.09493

The TRANSPORT tuner here (UntangledAttention, AdapterBlock, the T_map MLP, and hr_cls_token) is an original implementation written from the paper's equations (the untangled attention of Eqs. 9-10 and the deflection constraint of Eq. 12). The paper is cited as the source of the method, not as a code ancestor: this code is an independent expression and is not derived from any third-party implementation of the method.

Cite the paper if you use this method:

@misc{thoreau2025deflect,
  title         = {Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection},
  author        = {Thoreau, Romain and Marsocci, Valerio and Derksen, Dawa},
  year          = {2025},
  eprint        = {2503.09493},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2503.09493},
}

The package's API shape (config / wrapper / registry / save-load layout) mirrors Hugging Face PEFT (Apache-2.0).

License

deflect is licensed under the Apache License, Version 2.0; see LICENSE and NOTICE.

The TRANSPORT tuner is an original implementation of the DEFLECT method written from the paper (arXiv:2503.09493); the method is attributed to its authors above. The PEFT-shaped API scaffolding follows Hugging Face PEFT, which is also Apache-2.0.

About

DEFLECT: activation-space parameter-efficient adaptation for transformers, mirroring PEFT's API.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages