A stable-core reinforcement learning package with semver-governed APIs, experimental extensions, and benchmark workflows.
Project description
A modular reinforcement learning library for research and production.
80+ algorithms · unified API · CLI-first workflow · Core / Contrib / Zoo
Quick Start · Installation · Algorithms · Architecture · Docs · Contributing · Citation
✨ Why AxiomRL
|
On-policy, off-policy, value-based, offline, model-based, world models, goal-conditioned — all in one package. |
Every algorithm shares the same TrainConfig → train / eval / resume contract.
|
axiomrl train, eval, resume, zoo, tune — declarative YAML, no boilerplate.
|
Stable Core, experimental Contrib, curated Zoo — pick the surface that matches your risk tolerance. |
|
Deterministic seeding, checkpoint resume, multi-seed sweeps, semver-governed core. |
TensorBoard out of the box, structured run artifacts, hyperparameter studies via Optuna. |
🚀 Quick Start
pip install axiomrl
|
Python from axiomrl import PPO, TrainConfig
algo = PPO(TrainConfig(
algo="ppo",
env_id="CartPole-v1",
seed=42,
total_timesteps=100_000,
output_dir="runs/ppo-cartpole",
))
result = algo.learn()
|
CLI # environment check
axiomrl doctor
# train · eval · resume
axiomrl train --config configs/ppo/cartpole.yaml
axiomrl eval --checkpoint runs/<run>/checkpoints/best.pt
axiomrl resume --checkpoint runs/<run>/checkpoints/step_<n>.pt
# multi-seed sweep
axiomrl train --config configs/ppo/cartpole.yaml --seeds 1,2,3
|
📦 Installation
pip install axiomrl
Optional extras
pip install "axiomrl[atari]" # Atari / ALE support
pip install "axiomrl[offline]" # Minari offline datasets
pip install "axiomrl[tuning]" # Optuna hyperparameter tuning
pip install "axiomrl[experimental]" # Experimental namespace
pip install -e ".[dev]" # Development install (from source)
Requirements: Python 3.10+ · PyTorch 2.0+ · Gymnasium · NumPy · PyYAML · TensorBoard
🧠 Stable Core API
AxiomRL keeps a small semver-governed surface for application engineers while still exposing a broader research playground.
- Import stable algorithms and
TrainConfigfromaxiomrl.corefor production-facing workflows. - Use
axiomrl.experimentalwhen you want access to faster-moving research APIs. - Keep using
axiomrl.contribfor extensions that sit outside the stable core contract. - Legacy root-level advanced imports remain available for now, but they are deprecated so downstream users can migrate before removal.
from axiomrl.core import PPO, TrainConfig
from axiomrl.experimental import DrQ
🏗 Architecture
| Layer | Purpose | Examples |
|---|---|---|
| Core | Stable train / eval / resume for mainstream algorithms | PPO, DQN, SAC, TD3, IQL, CQL, BC |
| Contrib | Experimental extensions with additional dependencies | RecurrentPPO (LSTM), GAIL |
| Zoo | Named presets, benchmark manifests, launch recipes | Atari DQN/PPO, MuJoCo SAC presets |
from axiomrl import PPO, DQN, SAC, IQL, HER, TrainConfig # Core
from axiomrl.contrib import RecurrentPPO # Contrib
📚 Algorithms
AxiomRL implements 80+ algorithms across six categories. Click to expand any category.
🟦 On-Policy — PPO · A2C · TRPO · PPG · GAIL · IMPALA · APPO · RecurrentPPO · ARS · OpenAI ES
| Algorithm | Type | Action Space |
|---|---|---|
| PPO | Policy gradient | Box, Discrete |
| A2C | Policy gradient | Box, Discrete |
| TRPO | Policy gradient | Box, Discrete |
| PPG | Policy gradient | Discrete |
| GAIL | Imitation + PG | Box, Discrete |
| IMPALA | Distributed AC | Discrete |
| APPO | Distributed AC | Discrete |
| RecurrentPPO | Recurrent PG (contrib) | Box, Discrete |
| ARS | Evolutionary | Box |
| OpenAI ES | Evolutionary | Box |
🟪 Off-Policy (Continuous) — SAC · TD3 · DDPG · CrossQ · REDQ · TQC · D4PG · NAF · DrQ · CURL · DrQ-v2
| Algorithm | Type | Action Space |
|---|---|---|
| SAC | Actor-Critic | Box |
| TD3 | Actor-Critic | Box |
| DDPG | Actor-Critic | Box |
| CrossQ | Low-tuning AC | Box |
| REDQ | Ensemble AC | Box |
| TQC | Quantile AC | Box |
| D4PG | Distributed AC | Box |
| NAF | Q-learning | Box |
| Discrete SAC | Actor-Critic | Discrete |
| DrQ | Pixel-based (SAC) | Box |
| CURL | Pixel-based (contrastive) | Box |
| DrQ-v2 | Pixel-based (TD3) | Box |
🟧 Value-Based (Discrete) — DQN family · C51 · QR-DQN · IQN · FQF · Rainbow · R2D2 · Agent57 · SPR …
| Algorithm | Type | Notes |
|---|---|---|
| DQN | Value-based | Classic |
| Double DQN | Value-based | Reduced overestimation |
| Dueling DQN | Value-based | Advantage decomposition |
| Noisy DQN | Exploration | Noisy networks |
| Prioritized DQN | Sampling | Prioritized replay |
| N-Step DQN | Multi-step | N-step returns |
| C51 | Distributional | Categorical distribution |
| QR-DQN | Distributional | Quantile regression |
| IQN | Distributional | Implicit quantiles |
| FQF | Distributional | Fully quantile function |
| Rainbow DQN | Combined | Multi-enhancement combo |
| DRQN | Recurrent | LSTM Q-network |
| R2D2 | Recurrent | Replay with recurrence |
| Agent57 | Recurrent | Adaptive exploration |
| SPR | Self-predictive | Representation learning |
| and more… | Boltzmann · Mellowmax · Munchausen · CQL-DQN · Hysteretic … |
🟨 Offline RL — BC · IQL · CQL · BCQ · BEAR · TD3+BC · AWR · AWAC · CRR · Cal-QL · EDAC · RLPD · XQL · ReBRAC · MARWIL
| Algorithm | Type | Notes |
|---|---|---|
| BC | Imitation | Behavioral cloning |
| IQL | Value-based | In-sample Q-learning |
| CQL | Conservative | Conservative Q-learning |
| Cal-QL | Conservative | Calibrated CQL |
| BCQ | Constrained | Batch-constrained Q-learning |
| BEAR | Constrained | Support-matching AC |
| TD3+BC | Constrained | TD3 with BC penalty |
| CRR | Constrained | Critic-regularized regression |
| ReBRAC | Constrained | Behavior-regularized AC |
| AWR | Weighted | Advantage-weighted regression |
| AWAC | Weighted AC | Advantage-weighted AC |
| MARWIL | Weighted | RLlib-style weighted imitation |
| XQL | Value-based | Extreme value regression |
| EDAC | Ensemble | Ensemble-diversified AC |
| RLPD | Online-to-offline | Prior data with SAC |
🟩 Model-Based & World Models — PETS · MOPO · MBPO · Decision Transformer · Dreamer · DreamerV3 · Diamond · MuZero · EfficientZero …
| Algorithm | Type | Notes |
|---|---|---|
| PETS | MPC | Ensemble dynamics + CEM |
| MOPO | Model-based offline | Pessimistic reward |
| MBPO | Model-based online | Dyna-style rollouts |
| Decision Transformer | Sequence model | Offline sequence decision |
| Dreamer | World model | Latent imagination |
| DreamerV3 | World model | Symlog + discrete latent |
| Diamond | World model | Diffusion world model |
| MuZero | Planning | Learned model + tree search |
| Gumbel MuZero | Planning | Policy improvement with Gumbel |
| EfficientZero | Planning | Sample-efficient MuZero |
| ScaleZero | Planning | Scalable MuZero |
🟫 Goal-Conditioned & Special — HER · PODreamer · EADream · MoW · JOWA · HorizonImagination
| Algorithm | Type | Notes |
|---|---|---|
| HER | Goal-conditioned | Hindsight experience replay |
| PODreamer | World model | Partially observable Dreamer |
| EADream | World model | Energy-aware Dreamer |
| MoW | World model | Mixture of World models |
| JOWA | World model | Joint World-Action model |
| HorizonImagination | World model | Horizon-aware imagination |
🔁 Training Workflow
graph LR
A[YAML Config] --> B[TrainConfig]
B --> C[Algorithm]
C --> D{Training Loop}
D --> E[Collector]
D --> F[Buffer]
D --> G[Evaluator]
D --> H[TensorBoard]
G -->|Early Stop?| I{Continue?}
I -->|Yes| D
I -->|No| J[Checkpoint]
J --> K[axiomrl eval]
J --> L[axiomrl resume]
⚙ Configuration
AxiomRL uses declarative YAML for full reproducibility:
algo: ppo
env_id: CartPole-v1
seed: 42
total_timesteps: 100000
output_dir: runs/ppo-cartpole
eval_episodes: 10
algo_kwargs:
learning_rate: 0.0003
n_steps: 2048
batch_size: 64
n_epochs: 10
gamma: 0.99
gae_lambda: 0.95
clip_range: 0.2
ent_coef: 0.01
Use
axiomrl config --config <path>to preview the resolved config before training.
🎯 Zoo · Presets & Benchmarks
The Zoo layer provides curated presets and benchmark recipes:
axiomrl zoo --format commands # list presets
axiomrl train --config zoo/atari/dqn_breakout.yaml # run preset
axiomrl zoo --format report --runs-dir runs # report
axiomrl zoo --format leaderboard --runs-dir runs --group-by preset # leaderboard
🔬 Hyperparameter Tuning
axiomrl tune --config studies/ppo_cartpole_tune.yaml # launch study
axiomrl tune --resume-study runs/studies/ppo_cartpole_tune # resume
axiomrl tune-report --study-dir runs/studies/ppo_cartpole_tune # report
💡 Programmatic Examples
Offline RL (IQL)
from axiomrl import IQL, TrainConfig
from axiomrl.data import export_random_transition_dataset
export_random_transition_dataset("Pendulum-v1", "data/pendulum.npz", num_steps=5000, seed=7)
algo = IQL(TrainConfig(
algo="iql", env_id="Pendulum-v1", seed=7, total_timesteps=20_000,
algo_kwargs={"dataset_kind": "npz", "dataset_path": "data/pendulum.npz"},
))
result = algo.learn()
Goal-Conditioned (HER)
from axiomrl import HER, TrainConfig
algo = HER(TrainConfig(
algo="her", env_id="RL-PointGoal1D-v0", seed=7, total_timesteps=20_000,
algo_kwargs={"buffer_capacity": 50_000, "her_ratio": 0.8, "goal_selection_strategy": "future"},
))
result = algo.learn()
Contrib (Experimental)
from axiomrl.contrib import RecurrentPPO
from axiomrl import TrainConfig
algo = RecurrentPPO(TrainConfig(
algo="recurrent_ppo", env_id="BreakoutNoFrameskip-v4", seed=42, total_timesteps=1_000_000,
))
result = algo.learn()
📁 Project Structure
axiomrl/
├── configs/ # Algorithm config YAML files
├── docs/ # Documentation site (MkDocs)
├── examples/ # Reference training scripts
├── src/axiomrl/ # Main package
│ ├── algorithms/ # Algorithm implementations (70+ files)
│ ├── api/ # Public API wrappers
│ ├── contrib/ # Experimental extensions
│ ├── data/ # Buffers, datasets, samplers
│ ├── envs/ # Env factory, wrappers
│ ├── experiment/ # Config, checkpointing, logging
│ ├── models/ # MLP, CNN, LSTM networks
│ ├── policies/ # Policy abstractions
│ └── runtime/ # Trainer, evaluator, collector
├── tests/ # Test suite (100+ tests)
├── zoo/ # Benchmark presets
└── pyproject.toml
📖 Documentation
| Topic | Link |
|---|---|
| Configuration schema | docs/config-schema.md |
| Scheduling (LR / entropy / epsilon / clip) | docs/configuration/scheduling.md |
| Offline RL | docs/guide/offline-rl.md |
| Pixel observations | docs/guide/pixel-observations.md |
| Zoo benchmarks | docs/guide/zoo-benchmarks.md |
| Compatibility & semver policy | docs/compatibility.md |
| Run artifacts | docs/run-artifacts.md |
| FAQ | docs/faq.md |
🤝 Contributing
Contributions are welcome — see CONTRIBUTING.md for the full guide.
pip install -e ".[dev]" # install dev dependencies
pre-commit install # set up git hooks
make lint # linting (ruff)
make typecheck # type checking (mypy)
make test-fast # unit tests
make test-integration # integration tests
🗺 Roadmap
- Core algorithms (PPO, DQN, SAC, TD3, …)
- Offline RL stack (IQL, CQL, BC, BCQ, …)
- Value-based expansion (Rainbow, C51, IQN, …)
- Pixel-based control (DrQ, CURL, DrQ-v2)
- Goal-conditioned (HER)
- Model-based (PETS, MOPO, MBPO)
- World Models (Dreamer, MuZero families)
- Wider offline data mixing
- Training budget controls
- Benchmark validation
- Distributed training
- Multi-agent RL
- Pre-trained model hub
🌐 Design Influences
| Library | Influence |
|---|---|
| Stable-Baselines3 | Stable algorithm core, common API, SB3-Contrib split |
| RL Baselines3 Zoo | Zoo layer, preset-based experiments |
| CleanRL | Readable single-file references, reproducibility focus |
| Tianshou | Modular runtime, collector / trainer / buffer design |
🔒 Semantic Versioning
AxiomRL follows Semantic Versioning for the stable core API. Breaking changes to axiomrl.core, the stable root exports, and TrainConfig contracts land only in major releases. See docs/compatibility.md for the full policy.
Deprecated APIs first ship with warnings and stay available for at least one minor release before removal.
📝 Citation
If you use AxiomRL in your research, please cite:
@software{axiomrl,
title = {AxiomRL: A Modular Reinforcement Learning Library},
author = {skygazer42},
url = {https://github.com/skygazer42/axiomrl},
year = {2026},
}
📄 License
MIT — Copyright © 2026 skygazer42
Built with PyTorch · Gymnasium · TensorBoard
If you find AxiomRL useful, please consider giving it a ⭐ on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file axiomrl-1.0.3.tar.gz.
File metadata
- Download URL: axiomrl-1.0.3.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24aaa1ca5d83498ddef43658b55ab090009a54d7ab439051579bfcb5b76466a7
|
|
| MD5 |
01dbe4b514045718fdc219dd67c277ca
|
|
| BLAKE2b-256 |
65dfbdbd37b7eb23b47c79a74997c87876370a0f29be141d9a467fd3b2dcc78a
|
Provenance
The following attestation bundles were made for axiomrl-1.0.3.tar.gz:
Publisher:
publish.yml on skygazer42/axiomrl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
axiomrl-1.0.3.tar.gz -
Subject digest:
24aaa1ca5d83498ddef43658b55ab090009a54d7ab439051579bfcb5b76466a7 - Sigstore transparency entry: 1417115447
- Sigstore integration time:
-
Permalink:
skygazer42/axiomrl@c07ff4de15c119ac62c74cdbb246507c03fa326d -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/skygazer42
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c07ff4de15c119ac62c74cdbb246507c03fa326d -
Trigger Event:
push
-
Statement type:
File details
Details for the file axiomrl-1.0.3-py3-none-any.whl.
File metadata
- Download URL: axiomrl-1.0.3-py3-none-any.whl
- Upload date:
- Size: 617.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5582ccf9be5498c8dede330172da78ae5815c73ba807365ee7f075486cb5726a
|
|
| MD5 |
7e8102204fbdfc5df0395b7a2b595e3e
|
|
| BLAKE2b-256 |
908a73f4e275e4d4a320a81cb3bd643022cde7ba050afcf4042d3d64e1bd7a8f
|
Provenance
The following attestation bundles were made for axiomrl-1.0.3-py3-none-any.whl:
Publisher:
publish.yml on skygazer42/axiomrl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
axiomrl-1.0.3-py3-none-any.whl -
Subject digest:
5582ccf9be5498c8dede330172da78ae5815c73ba807365ee7f075486cb5726a - Sigstore transparency entry: 1417115462
- Sigstore integration time:
-
Permalink:
skygazer42/axiomrl@c07ff4de15c119ac62c74cdbb246507c03fa326d -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/skygazer42
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c07ff4de15c119ac62c74cdbb246507c03fa326d -
Trigger Event:
push
-
Statement type: