axiomrl

A stable-core reinforcement learning package with semver-governed APIs, experimental extensions, and benchmark workflows.

These details have not been verified by PyPI

Project description

AxiomRL

A modular reinforcement learning library for research and production.
80+ algorithms · unified API · CLI-first workflow · Core / Contrib / Zoo

Quick Start · Installation · Algorithms · Architecture · Docs · Contributing · Citation

✨ Why AxiomRL

80+ Algorithms _{On-policy, off-policy, value-based, offline, model-based, world models, goal-conditioned — all in one package.}	Unified API _{Every algorithm shares the same TrainConfig → train / eval / resume contract.}
CLI-first _{axiomrl train, eval, resume, zoo, tune — declarative YAML, no boilerplate.}	Modular Layers _{Stable Core, experimental Contrib, curated Zoo — pick the surface that matches your risk tolerance.}
Reproducible _{Deterministic seeding, checkpoint resume, multi-seed sweeps, semver-governed core.}	Observable _{TensorBoard out of the box, structured run artifacts, hyperparameter studies via Optuna.}

🚀 Quick Start

pip install axiomrl

Python

from axiomrl import PPO, TrainConfig

algo = PPO(TrainConfig(
    algo="ppo",
    env_id="CartPole-v1",
    seed=42,
    total_timesteps=100_000,
    output_dir="runs/ppo-cartpole",
))
result = algo.learn()

CLI

# environment check
axiomrl doctor

# train · eval · resume
axiomrl train  --config configs/ppo/cartpole.yaml
axiomrl eval   --checkpoint runs/<run>/checkpoints/best.pt
axiomrl resume --checkpoint runs/<run>/checkpoints/step_<n>.pt

# multi-seed sweep
axiomrl train --config configs/ppo/cartpole.yaml --seeds 1,2,3

📦 Installation

pip install axiomrl

Optional extras

pip install "axiomrl[atari]"         # Atari / ALE support
pip install "axiomrl[offline]"       # Minari offline datasets
pip install "axiomrl[tuning]"        # Optuna hyperparameter tuning
pip install "axiomrl[experimental]"  # Experimental namespace
pip install -e ".[dev]"              # Development install (from source)

Requirements: Python 3.10+ · PyTorch 2.0+ · Gymnasium · NumPy · PyYAML · TensorBoard

🧠 Stable Core API

AxiomRL keeps a small semver-governed surface for application engineers while still exposing a broader research playground.

Import stable algorithms and TrainConfig from axiomrl.core for production-facing workflows.
Use axiomrl.experimental when you want access to faster-moving research APIs.
Keep using axiomrl.contrib for extensions that sit outside the stable core contract.
Legacy root-level advanced imports remain available for now, but they are deprecated so downstream users can migrate before removal.

from axiomrl.core import PPO, TrainConfig
from axiomrl.experimental import DrQ

🏗 Architecture

AxiomRL three-layer architecture

Layer	Purpose	Examples
Core	Stable train / eval / resume for mainstream algorithms	PPO, DQN, SAC, TD3, IQL, CQL, BC
Contrib	Experimental extensions with additional dependencies	RecurrentPPO (LSTM), GAIL
Zoo	Named presets, benchmark manifests, launch recipes	Atari DQN/PPO, MuJoCo SAC presets

from axiomrl import PPO, DQN, SAC, IQL, HER, TrainConfig   # Core
from axiomrl.contrib import RecurrentPPO                    # Contrib

📚 Algorithms

AxiomRL implements 80+ algorithms across six categories. Click to expand any category.

🟦 On-Policy — PPO · A2C · TRPO · PPG · GAIL · IMPALA · APPO · RecurrentPPO · ARS · OpenAI ES

Algorithm	Type	Action Space
PPO	Policy gradient	Box, Discrete
A2C	Policy gradient	Box, Discrete
TRPO	Policy gradient	Box, Discrete
PPG	Policy gradient	Discrete
GAIL	Imitation + PG	Box, Discrete
IMPALA	Distributed AC	Discrete
APPO	Distributed AC	Discrete
RecurrentPPO	Recurrent PG (contrib)	Box, Discrete
ARS	Evolutionary	Box
OpenAI ES	Evolutionary	Box

🟪 Off-Policy (Continuous) — SAC · TD3 · DDPG · CrossQ · REDQ · TQC · D4PG · NAF · DrQ · CURL · DrQ-v2

Algorithm	Type	Action Space
SAC	Actor-Critic	Box
TD3	Actor-Critic	Box
DDPG	Actor-Critic	Box
CrossQ	Low-tuning AC	Box
REDQ	Ensemble AC	Box
TQC	Quantile AC	Box
D4PG	Distributed AC	Box
NAF	Q-learning	Box
Discrete SAC	Actor-Critic	Discrete
DrQ	Pixel-based (SAC)	Box
CURL	Pixel-based (contrastive)	Box
DrQ-v2	Pixel-based (TD3)	Box

🟧 Value-Based (Discrete) — DQN family · C51 · QR-DQN · IQN · FQF · Rainbow · R2D2 · Agent57 · SPR …

Algorithm	Type	Notes
DQN	Value-based	Classic
Double DQN	Value-based	Reduced overestimation
Dueling DQN	Value-based	Advantage decomposition
Noisy DQN	Exploration	Noisy networks
Prioritized DQN	Sampling	Prioritized replay
N-Step DQN	Multi-step	N-step returns
C51	Distributional	Categorical distribution
QR-DQN	Distributional	Quantile regression
IQN	Distributional	Implicit quantiles
FQF	Distributional	Fully quantile function
Rainbow DQN	Combined	Multi-enhancement combo
DRQN	Recurrent	LSTM Q-network
R2D2	Recurrent	Replay with recurrence
Agent57	Recurrent	Adaptive exploration
SPR	Self-predictive	Representation learning
and more…		Boltzmann · Mellowmax · Munchausen · CQL-DQN · Hysteretic …

🟨 Offline RL — BC · IQL · CQL · BCQ · BEAR · TD3+BC · AWR · AWAC · CRR · Cal-QL · EDAC · RLPD · XQL · ReBRAC · MARWIL

Algorithm	Type	Notes
BC	Imitation	Behavioral cloning
IQL	Value-based	In-sample Q-learning
CQL	Conservative	Conservative Q-learning
Cal-QL	Conservative	Calibrated CQL
BCQ	Constrained	Batch-constrained Q-learning
BEAR	Constrained	Support-matching AC
TD3+BC	Constrained	TD3 with BC penalty
CRR	Constrained	Critic-regularized regression
ReBRAC	Constrained	Behavior-regularized AC
AWR	Weighted	Advantage-weighted regression
AWAC	Weighted AC	Advantage-weighted AC
MARWIL	Weighted	RLlib-style weighted imitation
XQL	Value-based	Extreme value regression
EDAC	Ensemble	Ensemble-diversified AC
RLPD	Online-to-offline	Prior data with SAC

🟩 Model-Based & World Models — PETS · MOPO · MBPO · Decision Transformer · Dreamer · DreamerV3 · Diamond · MuZero · EfficientZero …

Algorithm	Type	Notes
PETS	MPC	Ensemble dynamics + CEM
MOPO	Model-based offline	Pessimistic reward
MBPO	Model-based online	Dyna-style rollouts
Decision Transformer	Sequence model	Offline sequence decision
Dreamer	World model	Latent imagination
DreamerV3	World model	Symlog + discrete latent
Diamond	World model	Diffusion world model
MuZero	Planning	Learned model + tree search
Gumbel MuZero	Planning	Policy improvement with Gumbel
EfficientZero	Planning	Sample-efficient MuZero
ScaleZero	Planning	Scalable MuZero

🟫 Goal-Conditioned & Special — HER · PODreamer · EADream · MoW · JOWA · HorizonImagination

Algorithm	Type	Notes
HER	Goal-conditioned	Hindsight experience replay
PODreamer	World model	Partially observable Dreamer
EADream	World model	Energy-aware Dreamer
MoW	World model	Mixture of World models
JOWA	World model	Joint World-Action model
HorizonImagination	World model	Horizon-aware imagination

🔁 Training Workflow

graph LR
    A[YAML Config] --> B[TrainConfig]
    B --> C[Algorithm]
    C --> D{Training Loop}
    D --> E[Collector]
    D --> F[Buffer]
    D --> G[Evaluator]
    D --> H[TensorBoard]
    G -->|Early Stop?| I{Continue?}
    I -->|Yes| D
    I -->|No| J[Checkpoint]
    J --> K[axiomrl eval]
    J --> L[axiomrl resume]

⚙ Configuration

AxiomRL uses declarative YAML for full reproducibility:

algo: ppo
env_id: CartPole-v1
seed: 42
total_timesteps: 100000
output_dir: runs/ppo-cartpole
eval_episodes: 10
algo_kwargs:
  learning_rate: 0.0003
  n_steps: 2048
  batch_size: 64
  n_epochs: 10
  gamma: 0.99
  gae_lambda: 0.95
  clip_range: 0.2
  ent_coef: 0.01

Use axiomrl config --config <path> to preview the resolved config before training.

🎯 Zoo · Presets & Benchmarks

The Zoo layer provides curated presets and benchmark recipes:

axiomrl zoo --format commands                                        # list presets
axiomrl train --config zoo/atari/dqn_breakout.yaml                   # run preset
axiomrl zoo --format report --runs-dir runs                          # report
axiomrl zoo --format leaderboard --runs-dir runs --group-by preset   # leaderboard

🔬 Hyperparameter Tuning

axiomrl tune --config studies/ppo_cartpole_tune.yaml                 # launch study
axiomrl tune --resume-study runs/studies/ppo_cartpole_tune           # resume
axiomrl tune-report --study-dir runs/studies/ppo_cartpole_tune       # report

💡 Programmatic Examples

Offline RL (IQL)

from axiomrl import IQL, TrainConfig
from axiomrl.data import export_random_transition_dataset

export_random_transition_dataset("Pendulum-v1", "data/pendulum.npz", num_steps=5000, seed=7)

algo = IQL(TrainConfig(
    algo="iql", env_id="Pendulum-v1", seed=7, total_timesteps=20_000,
    algo_kwargs={"dataset_kind": "npz", "dataset_path": "data/pendulum.npz"},
))
result = algo.learn()

Goal-Conditioned (HER)

from axiomrl import HER, TrainConfig

algo = HER(TrainConfig(
    algo="her", env_id="RL-PointGoal1D-v0", seed=7, total_timesteps=20_000,
    algo_kwargs={"buffer_capacity": 50_000, "her_ratio": 0.8, "goal_selection_strategy": "future"},
))
result = algo.learn()

Contrib (Experimental)

from axiomrl.contrib import RecurrentPPO
from axiomrl import TrainConfig

algo = RecurrentPPO(TrainConfig(
    algo="recurrent_ppo", env_id="BreakoutNoFrameskip-v4", seed=42, total_timesteps=1_000_000,
))
result = algo.learn()

📁 Project Structure

axiomrl/
├── configs/                # Algorithm config YAML files
├── docs/                   # Documentation site (MkDocs)
├── examples/               # Reference training scripts
├── src/axiomrl/            # Main package
│   ├── algorithms/         #   Algorithm implementations (70+ files)
│   ├── api/                #   Public API wrappers
│   ├── contrib/            #   Experimental extensions
│   ├── data/               #   Buffers, datasets, samplers
│   ├── envs/               #   Env factory, wrappers
│   ├── experiment/         #   Config, checkpointing, logging
│   ├── models/             #   MLP, CNN, LSTM networks
│   ├── policies/           #   Policy abstractions
│   └── runtime/            #   Trainer, evaluator, collector
├── tests/                  # Test suite (100+ tests)
├── zoo/                    # Benchmark presets
└── pyproject.toml

📖 Documentation

Topic	Link
Configuration schema	`docs/config-schema.md`
Scheduling (LR / entropy / epsilon / clip)	`docs/configuration/scheduling.md`
Offline RL	`docs/guide/offline-rl.md`
Pixel observations	`docs/guide/pixel-observations.md`
Zoo benchmarks	`docs/guide/zoo-benchmarks.md`
Compatibility & semver policy	`docs/compatibility.md`
Run artifacts	`docs/run-artifacts.md`
FAQ	`docs/faq.md`

🤝 Contributing

Contributions are welcome — see CONTRIBUTING.md for the full guide.

pip install -e ".[dev]"        # install dev dependencies
pre-commit install             # set up git hooks
make lint                      # linting (ruff)
make typecheck                 # type checking (mypy)
make test-fast                 # unit tests
make test-integration          # integration tests

🗺 Roadmap

Core algorithms (PPO, DQN, SAC, TD3, …)
Offline RL stack (IQL, CQL, BC, BCQ, …)
Value-based expansion (Rainbow, C51, IQN, …)
Pixel-based control (DrQ, CURL, DrQ-v2)
Goal-conditioned (HER)
Model-based (PETS, MOPO, MBPO)
World Models (Dreamer, MuZero families)
Wider offline data mixing
Training budget controls
Benchmark validation
Distributed training
Multi-agent RL
Pre-trained model hub

🌐 Design Influences

Library	Influence
Stable-Baselines3	Stable algorithm core, common API, SB3-Contrib split
RL Baselines3 Zoo	Zoo layer, preset-based experiments
CleanRL	Readable single-file references, reproducibility focus
Tianshou	Modular runtime, collector / trainer / buffer design

🔒 Semantic Versioning

AxiomRL follows Semantic Versioning for the stable core API. Breaking changes to axiomrl.core, the stable root exports, and TrainConfig contracts land only in major releases. See docs/compatibility.md for the full policy.

Deprecated APIs first ship with warnings and stay available for at least one minor release before removal.

📝 Citation

If you use AxiomRL in your research, please cite:

@software{axiomrl,
  title  = {AxiomRL: A Modular Reinforcement Learning Library},
  author = {skygazer42},
  url    = {https://github.com/skygazer42/axiomrl},
  year   = {2026},
}

📄 License

_{Built with PyTorch · Gymnasium · TensorBoard}
_{If you find AxiomRL useful, please consider giving it a ⭐ on GitHub.}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.3

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axiomrl-1.0.3.tar.gz (3.2 MB view details)

Uploaded May 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

axiomrl-1.0.3-py3-none-any.whl (617.9 kB view details)

Uploaded May 1, 2026 Python 3

File details

Details for the file axiomrl-1.0.3.tar.gz.

File metadata

Download URL: axiomrl-1.0.3.tar.gz
Upload date: May 1, 2026
Size: 3.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for axiomrl-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`24aaa1ca5d83498ddef43658b55ab090009a54d7ab439051579bfcb5b76466a7`
MD5	`01dbe4b514045718fdc219dd67c277ca`
BLAKE2b-256	`65dfbdbd37b7eb23b47c79a74997c87876370a0f29be141d9a467fd3b2dcc78a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for axiomrl-1.0.3.tar.gz:

Publisher: publish.yml on skygazer42/axiomrl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: axiomrl-1.0.3.tar.gz
- Subject digest: 24aaa1ca5d83498ddef43658b55ab090009a54d7ab439051579bfcb5b76466a7
- Sigstore transparency entry: 1417115447
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: skygazer42/axiomrl@c07ff4de15c119ac62c74cdbb246507c03fa326d
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/skygazer42
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c07ff4de15c119ac62c74cdbb246507c03fa326d
- Trigger Event: push

File details

Details for the file axiomrl-1.0.3-py3-none-any.whl.

File metadata

Download URL: axiomrl-1.0.3-py3-none-any.whl
Upload date: May 1, 2026
Size: 617.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for axiomrl-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5582ccf9be5498c8dede330172da78ae5815c73ba807365ee7f075486cb5726a`
MD5	`7e8102204fbdfc5df0395b7a2b595e3e`
BLAKE2b-256	`908a73f4e275e4d4a320a81cb3bd643022cde7ba050afcf4042d3d64e1bd7a8f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for axiomrl-1.0.3-py3-none-any.whl:

Publisher: publish.yml on skygazer42/axiomrl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: axiomrl-1.0.3-py3-none-any.whl
- Subject digest: 5582ccf9be5498c8dede330172da78ae5815c73ba807365ee7f075486cb5726a
- Sigstore transparency entry: 1417115462
- Sigstore integration time: May 1, 2026
Source repository:
- Permalink: skygazer42/axiomrl@c07ff4de15c119ac62c74cdbb246507c03fa326d
- Branch / Tag: refs/tags/v1.0.3
- Owner: https://github.com/skygazer42
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c07ff4de15c119ac62c74cdbb246507c03fa326d
- Trigger Event: push

axiomrl 1.0.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

✨ Why AxiomRL

🚀 Quick Start

📦 Installation

🧠 Stable Core API

🏗 Architecture

📚 Algorithms

🔁 Training Workflow

⚙ Configuration

🎯 Zoo · Presets & Benchmarks

🔬 Hyperparameter Tuning

💡 Programmatic Examples

📁 Project Structure

📖 Documentation

🤝 Contributing

🗺 Roadmap

🌐 Design Influences

🔒 Semantic Versioning

📝 Citation

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance