RLVR training framework for LLMs
Project description
retrain
retrain is a TOML-first RLVR (Reinforcement Learning with Verifiable Rewards) trainer for LLMs.
If you are new, start with install -> explore commands -> run a tiny config.
Install
Requires Python 3.11+.
# CLI + docs exploration
uv tool install retrain
# Local GPU training (adds torch)
uv tool install "retrain[local]"
# Remote Tinker backend
uv tool install "retrain[tinker]"
If you are developing this repo directly:
pip install -e ".[dev]"
Explore the CLI
Use these first to understand what exists before you train:
retrain --help
retrain man
retrain man --topic quickstart
retrain man --list-topics
retrain backends
retrain doctor
Useful inspection commands while iterating:
retrain explain retrain.toml # dry-run: what this config would do
retrain status logs # summarize runs/campaigns under logs/
retrain plugins # list built-ins + discovered plugins
retrain init-plugin --kind transform --name my_transform --with-test
retrain man --json --topic quickstart
retrain man --path # editable bundled manual source
Tiny TOML Demo
Create mini.toml:
[model]
model = "Qwen/Qwen3-4B-Instruct-2507"
[algorithm]
advantage_mode = "grpo"
transform_mode = "none"
[training]
max_steps = 20
batch_size = 2
group_size = 8
max_tokens = 1024
lr = 4e-5
[backend]
backend = "local"
adapter_path = "adapters/mini"
[logging]
log_dir = "logs/mini"
Run it:
retrain mini.toml
Override fields from CLI without editing TOML:
retrain mini.toml --seed 42 --max-steps 40 --wandb-project my-project
Quick Start from Template
retrain init --template quickstart
retrain retrain.toml
Other templates:
retrain init --list
retrain init --template experiment
retrain init --template campaign
retrain init --interactive
Why retrain
- Composable advantage pipeline: GRPO/MaxRL + GTPO/HICRA/SEPA
- Pluggable backends and inference engines
- Pluggable rewards (match, math, judge, custom)
- Campaign sweeps from one TOML
- LoRA-Squeeze rank analysis/compression
- Checkpoint resume and run status tooling
Common Config Patterns
Use verifiers environments from TOML:
[environment]
provider = "verifiers"
id = "primeintellect/gsm8k"
args = { split = "train" }
auto_install = true
max_turns = 8
Use custom advantage + transform plugins from TOML:
[algorithm]
advantage_mode = "my_advantages.hipa_like_advantages"
transform_mode = "my_transforms.make_transform_spec"
Use a full algorithm plugin (overrides composable advantage+transform path):
[algorithm]
algorithm_mode = "my_algorithms.my_algorithm"
Documentation
Full docs: retrain.readthedocs.io
- Getting Started
- Configuration Reference
- Advantage Functions
- SEPA Scheduling
- Campaigns
- LoRA-Squeeze
- Reward Functions
- Inference Engines
Contributor note: run retrain man --check in CI to detect stale auto-generated manual blocks, and retrain man --sync locally to refresh them.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file retrain-0.2.1.tar.gz.
File metadata
- Download URL: retrain-0.2.1.tar.gz
- Upload date:
- Size: 508.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
033ea57f8edebd52a452acb53ba3e3b3a588c40e74c3dcf9884a7d92f9dc21a2
|
|
| MD5 |
db02f5f46d1f903d74e18eb252522400
|
|
| BLAKE2b-256 |
349f6bc492bb525d82aa95a36f285322d2f656c9f9d243143e48b345ed0b88fd
|
File details
Details for the file retrain-0.2.1-py3-none-any.whl.
File metadata
- Download URL: retrain-0.2.1-py3-none-any.whl
- Upload date:
- Size: 134.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
941027719849063bf9e389de9eac15b8a863b5e3212065d224c553aea6fbba6a
|
|
| MD5 |
4a60d8655117b623fa9d7b43b806c2c4
|
|
| BLAKE2b-256 |
e2101b3de04ee418576db0f530e139f15c049e36bf8093676a331dfe043232d2
|