Hackable RL post-training for LLMs
Project description
invokerl
Hackable and performant RL post-training for LLMs.
Install
pip install invokerl
Quick start on Single GPU
import invokerl as rl
MODEL = "Qwen/Qwen3-0.6B"
generator = rl.VLLMGenerator(MODEL, gpu_memory_utilization=0.3, max_model_len=2048)
policy = rl.Policy(MODEL)
ref_policy = rl.Policy(MODEL).freeze() # frozen ref for KL
trainer = rl.Trainer(
config=rl.TrainerConfig(
model_name_or_path=MODEL, total_steps=200, lr=5e-6,
batch_size=1, group_size=4, accumulation_steps=4,
),
algorithm=rl.algorithms.GRPO(clip_eps=0.2, beta=0.04),
generator=generator, policy=policy, ref_policy=ref_policy,
reward_fn=rl.rewards.ExactMatch(),
dataset=rl.datasets.GSM8K("train"),
eval_dataset=rl.datasets.GSM8K("test"),
)
trainer.train()
Full runnable: examples/train_grpo_gsm8k.py
Multi-GPU
Same trainer.train() — pass different objects:
# Disagg (generation on cuda:0, training on cuda:1)
pipeline = rl.DisaggPipeline(...)
trainer.train(pipeline=pipeline)
# FSDP (launch with torchrun)
policy = rl.Policy(MODEL).fsdp() # auto-inits torch.distributed
trainer.train(pipeline=pipeline) # FSDP auto-detected from the policy
Full runnable: examples/train_disagg.py, examples/train_fsdp.py
Profiling is first-class
with rl.profile() as p:
trainer.step()
p.summary() # wall / CPU / CUDA / unaccounted + per-phase
p.export_trace("trace.json") # open at ui.perfetto.dev
Also works with nsys — the NVTX markers are emitted unconditionally, no extra flag needed:
nsys profile --trace=cuda,nvtx python examples/train_grpo_gsm8k.py
Full runnable: examples/profile_step.py
Writing a new algorithm
Every algorithm implements two methods:
from invokerl import BaseAlgorithm, RolloutBatch
class MyAlgorithm(BaseAlgorithm):
def compute_advantages(self, batch: RolloutBatch) -> Tensor:
"""Turn rewards into per-token learning signals. The credit
assignment hook — override for group normalization, GAE,
token-level shaping, PRM scores, etc."""
...
def compute_loss(self, new_log_probs, batch, advantages):
"""The policy objective. Return (loss, metrics)."""
...
Pass it to Trainer:
trainer = rl.Trainer(..., algorithm=MyAlgorithm(...))
Five algorithms already exist as reference: GRPO, DPO, PPO, SimPO, DAPO.
RolloutBatch
The data contract between the trainer and your algorithm:
| Field | Shape | Description |
|---|---|---|
token_ids |
[B, T] |
Prompt + completion token IDs |
response_mask |
[B, T] |
True for generated tokens |
rewards |
[B] |
Per-sequence scalar rewards |
token_rewards |
[B, T] |
Optional per-token rewards |
old_log_probs |
[B, T] |
Log-probs from policy at generation time |
ref_log_probs |
[B, T] |
Log-probs from frozen reference model |
group_ids |
[B] |
Which prompt each completion belongs to |
group_size |
int |
Completions per prompt |
Project structure
invokerl/
├── __init__.py # public API (rl.Trainer, rl.Policy, rl.algorithms.GRPO, ...)
├── trainer.py # Trainer: train() dispatches to internal standard/disagg/FSDP paths
├── policy.py # PolicyModel + .fsdp() for distributed
├── generator.py # VLLMGenerator
├── pipeline.py # DisaggPipeline (optional, for 2-GPU async)
├── distributed.py # FSDP init helpers
├── profiling.py # rl.profile() context manager
├── algorithms/ # base + GRPO, DPO, PPO, SimPO, DAPO
├── data/ # base + GSM8K
└── rewards/ # base + rule-based exact match
examples/
├── train_grpo_gsm8k.py # single GPU
├── train_disagg.py # 2 GPUs async
├── train_fsdp.py # FSDP multi-GPU
├── profile_step.py # profiling
└── sweep_grpo_lr.py # hyperparameter sweep
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file invokerl-0.2.0.tar.gz.
File metadata
- Download URL: invokerl-0.2.0.tar.gz
- Upload date:
- Size: 44.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c619857c80237b99790fe2cb7c90ebb2149db1a1570abfa61894556bb5f1542
|
|
| MD5 |
2e23c1bd14a02bcbde6adac83ae04f1c
|
|
| BLAKE2b-256 |
144066630c161123484fb56fbcc3b4b24fbbfacdc7dce8916c301f8008957373
|
Provenance
The following attestation bundles were made for invokerl-0.2.0.tar.gz:
Publisher:
publish.yml on dhmnr/invokerl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
invokerl-0.2.0.tar.gz -
Subject digest:
0c619857c80237b99790fe2cb7c90ebb2149db1a1570abfa61894556bb5f1542 - Sigstore transparency entry: 1344298800
- Sigstore integration time:
-
Permalink:
dhmnr/invokerl@0a5ce3a7326cadcce13b5fafc7facbf62eb7d33f -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dhmnr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0a5ce3a7326cadcce13b5fafc7facbf62eb7d33f -
Trigger Event:
release
-
Statement type:
File details
Details for the file invokerl-0.2.0-py3-none-any.whl.
File metadata
- Download URL: invokerl-0.2.0-py3-none-any.whl
- Upload date:
- Size: 53.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
441674093cfe7992027a88036131cc702c8787770d1025f6946b4302fd090767
|
|
| MD5 |
751282bf8441a36649d6c4ccc874efdb
|
|
| BLAKE2b-256 |
4b6ed442be1e99ba0ea50d060efef1f5328424b80be1d439f3abd56a5cea849f
|
Provenance
The following attestation bundles were made for invokerl-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on dhmnr/invokerl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
invokerl-0.2.0-py3-none-any.whl -
Subject digest:
441674093cfe7992027a88036131cc702c8787770d1025f6946b4302fd090767 - Sigstore transparency entry: 1344298904
- Sigstore integration time:
-
Permalink:
dhmnr/invokerl@0a5ce3a7326cadcce13b5fafc7facbf62eb7d33f -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dhmnr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0a5ce3a7326cadcce13b5fafc7facbf62eb7d33f -
Trigger Event:
release
-
Statement type: