Neural machine translation with reinforcement learning for style-aware translation. Maintains source text style (law, literature, news, science) in target translation.
- Style-Aware Translation: Preserves document style across languages
- Multi-Component Rewards: Format validation + Semantic quality (COMET) + Style consistency (BERT)
- GRPO Training: Group Relative Policy Optimization for RL fine-tuning
- Hydra Configuration: Flexible, composable configuration management
- Modular Architecture: Clean separation with dependency injection
# Install dependencies
conda env create -f environment.yml
conda activate style_translator
# Train style detector (BERT-based classifier)
cd style_detector
python train.py
# Train translation model with RL
cd ../rl
python scripts/train_rl.pyStyleTranslator/
├── style_detector/ # Style classification (BERT)
│ ├── corpus/ # Corpus generation
│ ├── dataset/ # Dataset loaders
│ ├── model/ # StyleDetector model
│ ├── config.yaml # Training config
│ └── train.py # Training script
│
├── rl/ # RL training module
│ ├── configs/ # Hydra configurations
│ │ ├── env/ # Environment settings
│ │ ├── reward/ # Reward presets
│ │ └── model/ # Model configs
│ ├── src/
│ │ ├── rewards/ # Reward system
│ │ ├── trainer/ # GRPO trainer
│ │ └── utils/ # Utilities
│ ├── scripts/ # Entry points
│ └── data/ # Training data
BERT-based classifier for detecting text style (law, literature, news, science).
cd style_detector
python train.py # Train on your corpusConfig: style_detector/config.yaml
GRPO-based reinforcement learning for style-aware translation.
cd rl
# Default training
python scripts/train_rl.py
# Server environment + style-weighted rewards
python scripts/train_rl.py env=server reward=style_weighted
# Override parameters
python scripts/train_rl.py training.num_epochs=5 device=cudaConfigs: rl/configs/
| Component | Weight | Description |
|---|---|---|
| Format | 1.0 | XML tag validation (<think>, <translate>) |
| Semantic | 6.0 | COMET translation quality |
| Style | 4.0 | BERT style consistency (source ↔ target) |
Customize in rl/configs/reward/*.yaml.
- Python 3.8+
- PyTorch 2.0+
- Transformers
- TRL (Transformers Reinforcement Learning)
- COMET
- Hydra
- PyTorch Lightning
See environment.yml for full dependencies.
Text → BERT → Style Classifier → [law, literature, news, science]
Source Text → LLM → Translation
↓
Format + Semantic + Style Rewards
↓
GRPO Optimization
MIT License - see LICENSE file for details.