English · 中文
A PyTorch language-model training framework for fast research iteration: pretraining, SFT, preference learning, online RL, and distillation. The core — Registry, Config, Engine, UpdateRule, Trainer, EventBus, PrepGraph — is small enough to read end-to-end; research-grade extras (PEFT, alternative architectures, sweeps, distributed) are opt-in.
Design goals: registry-first, failure-first, plugin-clean, lab-friendly, audit-ready.
Status: testing phase. Distributed (DDP/FSDP/TP/PP) is implemented and unit-tested via CPU multiprocess spawn (SP/EP are registered but not yet wired into the train runtime, and EP is still a skeleton), not validated on multi-node GPU clusters — use at your own risk for production. The test suite is ~33K lines / 1900+ tests with adversarial regression tests verified by mutation testing.
git clone <this-repo> lighttrain && cd lighttrain
pip install -e .
pip install -e ".[peft]" # optional: LoRA / IA³ / AdaLoRA
pip install -e ".[peft,quant]" # optional: + bitsandbytes 4-bit (Linux+CUDA)lighttrain init my_project # scaffold a commented, runnable recipe
cd my_project
lighttrain dry-run -c cfg.yaml # resolve & print the config (no training)
lighttrain train -c cfg.yaml ++trainer.max_steps=50 # 50-step smoke runThe generated cfg.yaml runs once you add a corpus.txt (one example per line)
and is heavily commented as a living tutorial — uncomment the optional blocks
(models:, parallel:, prep_graph:, PEFT…) to grow it. → Getting started
- Registry — short-name → class resolution over a fixed category set.
- Config — OmegaConf + Pydantic v2; the model is a config group
(
model_profiles:+model: <name>). - Engine + UpdateRule — the engine owns the accelerator and delegates the
per-step math (forward/backward/clip/step) to a swappable
UpdateRule, so you can change the training math without touching the loop. The flatTrainercomposes public primitives (run_train_loop,apply_update,forward_with_activations). - EventBus — 46 lifecycle events; isolated per-callback exceptions; results
aggregate to a
Signal(STOP_TRAINING > RETRY_STEP > SKIP_STEP > CONTINUE). - PrepGraph — content-addressed DAG of data-prep nodes; cached by a fingerprint over config + code + schema + upstream.
lighttrain train -c recipes/pretrain_causal.yaml
lighttrain fork --from runs/<...>/checkpoints/step_500 -c recipes/finetune.yaml
lighttrain resume --run runs/<...>The new algorithm is usually the loss:, not a new trainer. Online RL:
trainer: { name: ppo, rollout_steps: 32, rollout_backend: hf_generate }
loss: { name: ppo_surrogate, clip_eps: 0.2 }
judge: { name: verifier, verify_pattern: "\\d+" } # → reward_fnMulti-model (a frozen teacher + a trainable student) is a named model set; a
custom trainer reads self.models["teacher"]. A runnable end-to-end template:
examples/online_distill.py
(lighttrain train -c recipes/online_distill_demo.yaml).
→ Training paradigms
A self-contained run capsule under runs/<exp>/<ts>-<slug>-<hash>/: config
snapshot + resolved config, env.json, logs/metrics.jsonl, and checkpoints/
(manifest.json written last = the completeness marker). → Getting
started
| Kind | Names |
|---|---|
| Models | tiny_lm, hf_causal, tiny_rwkv, tiny_mamba, jepa, + PEFT lora/ia3/adalora |
| Trainers | pretrain, preference, reward_model, ppo, grpo |
| Losses | cross_entropy, dpo/ipo/simpo/orpo/kto, ppo_surrogate, grpo, kl_topk, … |
| Optimizers | adamw, lion · Schedulers constant/linear/warmup_cosine/wsd |
| Data | datasets, collators, samplers, byte tokenizer, PrepGraph nodes |
| Diagnostics | invariants, nan_hunter, frozen_step, loss_attribution, doctor |
All of these are concrete @register implementations and ship in
lighttrain.builtin_plugins (core keeps only protocols + framework); they're
resolved by short name regardless of where the code lives. Full tables:
Registry & protocols.
Everything lives under docs/ (English + 中文, split by topic):
- Getting started · CLI · Configuration
- Architecture · Training · Data & PrepGraph
- Distributed · Diagnostics
- Alternative architectures · Extending · Recipes · Troubleshooting
- Reference: Registry & protocols
MIT. Built with the assistance of Claude Code; architecture, test design, and quality gates are human-directed.