Skip to main content

A stable-core reinforcement learning package with semver-governed APIs, experimental extensions, and benchmark workflows.

Project description

AxiomRL

A modular reinforcement learning library for research and production.
80+ algorithms · unified API · CLI-first workflow · Core / Contrib / Zoo

PyPI Python CI Coverage License Downloads

Quick Start · Installation · Algorithms · Architecture · Docs · Contributing · Citation


✨ Why AxiomRL

 80+ Algorithms
On-policy, off-policy, value-based, offline, model-based, world models, goal-conditioned — all in one package.
 Unified API
Every algorithm shares the same TrainConfig → train / eval / resume contract.
 CLI-first
axiomrl train, eval, resume, zoo, tune — declarative YAML, no boilerplate.
 Modular Layers
Stable Core, experimental Contrib, curated Zoo — pick the surface that matches your risk tolerance.
 Reproducible
Deterministic seeding, checkpoint resume, multi-seed sweeps, semver-governed core.
 Observable
TensorBoard out of the box, structured run artifacts, hyperparameter studies via Optuna.

🚀 Quick Start

pip install axiomrl

Python

from axiomrl import PPO, TrainConfig

algo = PPO(TrainConfig(
    algo="ppo",
    env_id="CartPole-v1",
    seed=42,
    total_timesteps=100_000,
    output_dir="runs/ppo-cartpole",
))
result = algo.learn()

CLI

# environment check
axiomrl doctor

# train · eval · resume
axiomrl train  --config configs/ppo/cartpole.yaml
axiomrl eval   --checkpoint runs/<run>/checkpoints/best.pt
axiomrl resume --checkpoint runs/<run>/checkpoints/step_<n>.pt

# multi-seed sweep
axiomrl train --config configs/ppo/cartpole.yaml --seeds 1,2,3

📦 Installation

pip install axiomrl
Optional extras
pip install "axiomrl[atari]"         # Atari / ALE support
pip install "axiomrl[offline]"       # Minari offline datasets
pip install "axiomrl[tuning]"        # Optuna hyperparameter tuning
pip install "axiomrl[experimental]"  # Experimental namespace
pip install -e ".[dev]"              # Development install (from source)

Requirements: Python 3.10+ · PyTorch 2.0+ · Gymnasium · NumPy · PyYAML · TensorBoard

🧠 Stable Core API

AxiomRL keeps a small semver-governed surface for application engineers while still exposing a broader research playground.

  • Import stable algorithms and TrainConfig from axiomrl.core for production-facing workflows.
  • Use axiomrl.experimental when you want access to faster-moving research APIs.
  • Keep using axiomrl.contrib for extensions that sit outside the stable core contract.
  • Legacy root-level advanced imports remain available for now, but they are deprecated so downstream users can migrate before removal.
from axiomrl.core import PPO, TrainConfig
from axiomrl.experimental import DrQ

🏗 Architecture

AxiomRL three-layer architecture

Layer Purpose Examples
Core Stable train / eval / resume for mainstream algorithms PPO, DQN, SAC, TD3, IQL, CQL, BC
Contrib Experimental extensions with additional dependencies RecurrentPPO (LSTM), GAIL
Zoo Named presets, benchmark manifests, launch recipes Atari DQN/PPO, MuJoCo SAC presets
from axiomrl import PPO, DQN, SAC, IQL, HER, TrainConfig   # Core
from axiomrl.contrib import RecurrentPPO                    # Contrib

📚 Algorithms

AxiomRL implements 80+ algorithms across six categories. Click to expand any category.

🟦 On-Policy — PPO · A2C · TRPO · PPG · GAIL · IMPALA · APPO · RecurrentPPO · ARS · OpenAI ES
Algorithm Type Action Space
PPO Policy gradient Box, Discrete
A2C Policy gradient Box, Discrete
TRPO Policy gradient Box, Discrete
PPG Policy gradient Discrete
GAIL Imitation + PG Box, Discrete
IMPALA Distributed AC Discrete
APPO Distributed AC Discrete
RecurrentPPO Recurrent PG (contrib) Box, Discrete
ARS Evolutionary Box
OpenAI ES Evolutionary Box
🟪 Off-Policy (Continuous) — SAC · TD3 · DDPG · CrossQ · REDQ · TQC · D4PG · NAF · DrQ · CURL · DrQ-v2
Algorithm Type Action Space
SAC Actor-Critic Box
TD3 Actor-Critic Box
DDPG Actor-Critic Box
CrossQ Low-tuning AC Box
REDQ Ensemble AC Box
TQC Quantile AC Box
D4PG Distributed AC Box
NAF Q-learning Box
Discrete SAC Actor-Critic Discrete
DrQ Pixel-based (SAC) Box
CURL Pixel-based (contrastive) Box
DrQ-v2 Pixel-based (TD3) Box
🟧 Value-Based (Discrete) — DQN family · C51 · QR-DQN · IQN · FQF · Rainbow · R2D2 · Agent57 · SPR …
Algorithm Type Notes
DQN Value-based Classic
Double DQN Value-based Reduced overestimation
Dueling DQN Value-based Advantage decomposition
Noisy DQN Exploration Noisy networks
Prioritized DQN Sampling Prioritized replay
N-Step DQN Multi-step N-step returns
C51 Distributional Categorical distribution
QR-DQN Distributional Quantile regression
IQN Distributional Implicit quantiles
FQF Distributional Fully quantile function
Rainbow DQN Combined Multi-enhancement combo
DRQN Recurrent LSTM Q-network
R2D2 Recurrent Replay with recurrence
Agent57 Recurrent Adaptive exploration
SPR Self-predictive Representation learning
and more… Boltzmann · Mellowmax · Munchausen · CQL-DQN · Hysteretic …
🟨 Offline RL — BC · IQL · CQL · BCQ · BEAR · TD3+BC · AWR · AWAC · CRR · Cal-QL · EDAC · RLPD · XQL · ReBRAC · MARWIL
Algorithm Type Notes
BC Imitation Behavioral cloning
IQL Value-based In-sample Q-learning
CQL Conservative Conservative Q-learning
Cal-QL Conservative Calibrated CQL
BCQ Constrained Batch-constrained Q-learning
BEAR Constrained Support-matching AC
TD3+BC Constrained TD3 with BC penalty
CRR Constrained Critic-regularized regression
ReBRAC Constrained Behavior-regularized AC
AWR Weighted Advantage-weighted regression
AWAC Weighted AC Advantage-weighted AC
MARWIL Weighted RLlib-style weighted imitation
XQL Value-based Extreme value regression
EDAC Ensemble Ensemble-diversified AC
RLPD Online-to-offline Prior data with SAC
🟩 Model-Based & World Models — PETS · MOPO · MBPO · Decision Transformer · Dreamer · DreamerV3 · Diamond · MuZero · EfficientZero …
Algorithm Type Notes
PETS MPC Ensemble dynamics + CEM
MOPO Model-based offline Pessimistic reward
MBPO Model-based online Dyna-style rollouts
Decision Transformer Sequence model Offline sequence decision
Dreamer World model Latent imagination
DreamerV3 World model Symlog + discrete latent
Diamond World model Diffusion world model
MuZero Planning Learned model + tree search
Gumbel MuZero Planning Policy improvement with Gumbel
EfficientZero Planning Sample-efficient MuZero
ScaleZero Planning Scalable MuZero
🟫 Goal-Conditioned & Special — HER · PODreamer · EADream · MoW · JOWA · HorizonImagination
Algorithm Type Notes
HER Goal-conditioned Hindsight experience replay
PODreamer World model Partially observable Dreamer
EADream World model Energy-aware Dreamer
MoW World model Mixture of World models
JOWA World model Joint World-Action model
HorizonImagination World model Horizon-aware imagination

🔁 Training Workflow

graph LR
    A[YAML Config] --> B[TrainConfig]
    B --> C[Algorithm]
    C --> D{Training Loop}
    D --> E[Collector]
    D --> F[Buffer]
    D --> G[Evaluator]
    D --> H[TensorBoard]
    G -->|Early Stop?| I{Continue?}
    I -->|Yes| D
    I -->|No| J[Checkpoint]
    J --> K[axiomrl eval]
    J --> L[axiomrl resume]

⚙ Configuration

AxiomRL uses declarative YAML for full reproducibility:

algo: ppo
env_id: CartPole-v1
seed: 42
total_timesteps: 100000
output_dir: runs/ppo-cartpole
eval_episodes: 10
algo_kwargs:
  learning_rate: 0.0003
  n_steps: 2048
  batch_size: 64
  n_epochs: 10
  gamma: 0.99
  gae_lambda: 0.95
  clip_range: 0.2
  ent_coef: 0.01

Use axiomrl config --config <path> to preview the resolved config before training.

🎯 Zoo · Presets & Benchmarks

The Zoo layer provides curated presets and benchmark recipes:

axiomrl zoo --format commands                                        # list presets
axiomrl train --config zoo/atari/dqn_breakout.yaml                   # run preset
axiomrl zoo --format report --runs-dir runs                          # report
axiomrl zoo --format leaderboard --runs-dir runs --group-by preset   # leaderboard

🔬 Hyperparameter Tuning

axiomrl tune --config studies/ppo_cartpole_tune.yaml                 # launch study
axiomrl tune --resume-study runs/studies/ppo_cartpole_tune           # resume
axiomrl tune-report --study-dir runs/studies/ppo_cartpole_tune       # report

💡 Programmatic Examples

Offline RL (IQL)
from axiomrl import IQL, TrainConfig
from axiomrl.data import export_random_transition_dataset

export_random_transition_dataset("Pendulum-v1", "data/pendulum.npz", num_steps=5000, seed=7)

algo = IQL(TrainConfig(
    algo="iql", env_id="Pendulum-v1", seed=7, total_timesteps=20_000,
    algo_kwargs={"dataset_kind": "npz", "dataset_path": "data/pendulum.npz"},
))
result = algo.learn()
Goal-Conditioned (HER)
from axiomrl import HER, TrainConfig

algo = HER(TrainConfig(
    algo="her", env_id="RL-PointGoal1D-v0", seed=7, total_timesteps=20_000,
    algo_kwargs={"buffer_capacity": 50_000, "her_ratio": 0.8, "goal_selection_strategy": "future"},
))
result = algo.learn()
Contrib (Experimental)
from axiomrl.contrib import RecurrentPPO
from axiomrl import TrainConfig

algo = RecurrentPPO(TrainConfig(
    algo="recurrent_ppo", env_id="BreakoutNoFrameskip-v4", seed=42, total_timesteps=1_000_000,
))
result = algo.learn()

📁 Project Structure

axiomrl/
├── configs/                # Algorithm config YAML files
├── docs/                   # Documentation site (MkDocs)
├── examples/               # Reference training scripts
├── src/axiomrl/            # Main package
│   ├── algorithms/         #   Algorithm implementations (70+ files)
│   ├── api/                #   Public API wrappers
│   ├── contrib/            #   Experimental extensions
│   ├── data/               #   Buffers, datasets, samplers
│   ├── envs/               #   Env factory, wrappers
│   ├── experiment/         #   Config, checkpointing, logging
│   ├── models/             #   MLP, CNN, LSTM networks
│   ├── policies/           #   Policy abstractions
│   └── runtime/            #   Trainer, evaluator, collector
├── tests/                  # Test suite (100+ tests)
├── zoo/                    # Benchmark presets
└── pyproject.toml

📖 Documentation

Topic Link
Configuration schema docs/config-schema.md
Scheduling (LR / entropy / epsilon / clip) docs/configuration/scheduling.md
Offline RL docs/guide/offline-rl.md
Pixel observations docs/guide/pixel-observations.md
Zoo benchmarks docs/guide/zoo-benchmarks.md
Compatibility & semver policy docs/compatibility.md
Run artifacts docs/run-artifacts.md
FAQ docs/faq.md

🤝 Contributing

Contributions are welcome — see CONTRIBUTING.md for the full guide.

pip install -e ".[dev]"        # install dev dependencies
pre-commit install             # set up git hooks
make lint                      # linting (ruff)
make typecheck                 # type checking (mypy)
make test-fast                 # unit tests
make test-integration          # integration tests

🗺 Roadmap

  • Core algorithms (PPO, DQN, SAC, TD3, …)
  • Offline RL stack (IQL, CQL, BC, BCQ, …)
  • Value-based expansion (Rainbow, C51, IQN, …)
  • Pixel-based control (DrQ, CURL, DrQ-v2)
  • Goal-conditioned (HER)
  • Model-based (PETS, MOPO, MBPO)
  • World Models (Dreamer, MuZero families)
  • Wider offline data mixing
  • Training budget controls
  • Benchmark validation
  • Distributed training
  • Multi-agent RL
  • Pre-trained model hub

🌐 Design Influences

Library Influence
Stable-Baselines3 Stable algorithm core, common API, SB3-Contrib split
RL Baselines3 Zoo Zoo layer, preset-based experiments
CleanRL Readable single-file references, reproducibility focus
Tianshou Modular runtime, collector / trainer / buffer design

🔒 Semantic Versioning

AxiomRL follows Semantic Versioning for the stable core API. Breaking changes to axiomrl.core, the stable root exports, and TrainConfig contracts land only in major releases. See docs/compatibility.md for the full policy.

Deprecated APIs first ship with warnings and stay available for at least one minor release before removal.

📝 Citation

If you use AxiomRL in your research, please cite:

@software{axiomrl,
  title  = {AxiomRL: A Modular Reinforcement Learning Library},
  author = {skygazer42},
  url    = {https://github.com/skygazer42/axiomrl},
  year   = {2026},
}

📄 License

MIT — Copyright © 2026 skygazer42


Built with PyTorch · Gymnasium · TensorBoard
If you find AxiomRL useful, please consider giving it a ⭐ on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axiomrl-1.0.3.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

axiomrl-1.0.3-py3-none-any.whl (617.9 kB view details)

Uploaded Python 3

File details

Details for the file axiomrl-1.0.3.tar.gz.

File metadata

  • Download URL: axiomrl-1.0.3.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for axiomrl-1.0.3.tar.gz
Algorithm Hash digest
SHA256 24aaa1ca5d83498ddef43658b55ab090009a54d7ab439051579bfcb5b76466a7
MD5 01dbe4b514045718fdc219dd67c277ca
BLAKE2b-256 65dfbdbd37b7eb23b47c79a74997c87876370a0f29be141d9a467fd3b2dcc78a

See more details on using hashes here.

Provenance

The following attestation bundles were made for axiomrl-1.0.3.tar.gz:

Publisher: publish.yml on skygazer42/axiomrl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file axiomrl-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: axiomrl-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 617.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for axiomrl-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5582ccf9be5498c8dede330172da78ae5815c73ba807365ee7f075486cb5726a
MD5 7e8102204fbdfc5df0395b7a2b595e3e
BLAKE2b-256 908a73f4e275e4d4a320a81cb3bd643022cde7ba050afcf4042d3d64e1bd7a8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for axiomrl-1.0.3-py3-none-any.whl:

Publisher: publish.yml on skygazer42/axiomrl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page