agilerl

AgileRL is a deep reinforcement learning library focused on improving RL development through RLOps.

These details have been verified by PyPI

Owner

AgileRL

Maintainers

jaimesabalbermudez micdoh mikepratt1999 nicku-a

These details have not been verified by PyPI

Project description

Reinforcement learning streamlined.
Easier and faster reinforcement learning with RLOps. Visit our website. View documentation.
Join the Discord Server for questions, help and collaboration.

🚀 Train super-fast for free on Arena, the RLOps platform from AgileRL 🚀

AgileRL is a Deep Reinforcement Learning library focused on improving development by introducing RLOps - MLOps for reinforcement learning.

This library is initially focused on reducing the time taken for training models and hyperparameter optimization (HPO) by pioneering evolutionary HPO techniques for reinforcement learning.
Evolutionary HPO has been shown to drastically reduce overall training times by automatically converging on optimal hyperparameters, without requiring numerous training runs.
We are constantly adding more algorithms and features. AgileRL already includes state-of-the-art evolvable on-policy, off-policy, offline, multi-agent and contextual multi-armed bandit reinforcement learning algorithms with distributed training.

AgileRL offers 10x faster hyperparameter optimization than SOTA.

Benchmarks
Get Started
Training
Arena
Tutorials
Algorithms
Citing AgileRL

Benchmarks

Reinforcement learning algorithms and libraries are usually benchmarked once the optimal hyperparameters for training are known, but it often takes hundreds or thousands of experiments to discover these. This is unrealistic and does not reflect the true, total time taken for training. What if we could remove the need to conduct all these prior experiments?

In the charts below, a single AgileRL run, which automatically tunes hyperparameters, is benchmarked against Optuna's multiple training runs traditionally required for hyperparameter optimization, demonstrating the real time savings possible. Global steps is the sum of every step taken by any agent in the environment, including across an entire population.

AgileRL offers an order of magnitude speed up in hyperparameter optimization vs popular reinforcement learning training frameworks combined with Optuna. Remove the need for multiple training runs and save yourself hours.

AgileRL also supports multi-agent reinforcement learning using the Petting Zoo-style (parallel API). The charts below highlight the performance of our MADDPG and MATD3 algorithms with evolutionary hyper-parameter optimisation (HPO), benchmarked against epymarl's MADDPG algorithm with grid-search HPO for the simple speaker listener and simple spread environments.

Get Started

To see the full AgileRL documentation, including tutorials, visit our documentation site. To ask questions and get help, collaborate, or discuss anything related to reinforcement learning, join the AgileRL Discord Server.

Install as a package with pip:

pip install agilerl

Or install in development mode:

git clone https://github.com/AgileRL/AgileRL.git && cd AgileRL
pip install -e .

AgileRL ships optional dependency groups that you can install as needed:

Installation	Description
`agilerl[box2d]`	Box2D physics engine for Gymnasium environments
`agilerl[arena]`	Arena SDK & CLI. Validate custom environments, and train & deploy agents on managed cloud infrastructure.
`agilerl[llm]`	LLM reinforcement fine-tuning.
`agilerl[all]`	Cover all functionalities of AgileRL.

In development mode, quote the extras:

pip install -e ".[arena]"

To install the nightly version of AgileRL with the latest features, use:

pip install git+https://github.com/AgileRL/AgileRL.git@nightly

Training Locally

AgileRL provides the tools to train RL algorithms in a variety of ways, focusing on flexibility and modularity as a stepping stone for efficiently training arbitrarily large populations of agents in a distributed manner on Arena.

Training a Single Agent without Evolutionary HPO

The simplest way to train an RL agent with AgileRL is through the LocalTrainer. Here is an example of training a DQN agent on the LunarLander-v3 environment:

from agilerl.training.trainer import LocalTrainer

trainer = LocalTrainer(algorithm="DQN", environment="LunarLander-v3")
population, fitnesses = trainer.train()

With no other arguments provided, LocalTrainer defaults to 1,000,000 steps with a single agent and the algorithm's default hyperparameters — no evolutionary HPO is applied.

Training a Population with Evolutionary HPO

To unlock AgileRL's evolutionary hyperparameter optimization, train a population of agents whose hyperparameters will evolve and mutate towards their optimal values:

from agilerl import LocalTrainer
from agilerl.models import TrainingSpec

trainer = LocalTrainer(
    algorithm="DQN",
    environment="LunarLander-v3",
    training=TrainingSpec(pop_size=4), # Train four agents simultaneously
    hpo=True, # Enable evolutionary HPO using default settings
)
population, fitnesses = trainer.train()

This trains a population of four DQN agents that share experiences but learn individually. Every 10,000 steps (default value for evo_steps in TrainingSpec), tournament selection identifies the best performers and mutations are applied to explore the hyperparameter space. See Evolutionary Hyperparameter Optimization for details on how evolutionary HPO works in AgileRL.

Or via a YAML manifest:

DQN-LunarLander-v3 manifest (configs/training/dqn/dqn.yaml)

---
algorithm:
    name: DQN
    batch_size: 128
    lr: 6.3e-4
    learn_step: 4
    gamma: 0.99
    tau: 0.001
    double: false
    cudagraphs: false

environment:
    name: LunarLander-v3
    num_envs: 16

mutation:
    probabilities:
        no_mut: 0.4
        arch_mut: 0.2
        new_layer: 0.2
        params_mut: 0.2
        act_mut: 0.2
        rl_hp_mut: 0.2
    rl_hp_selection:
        lr:
            min: 0.0000625
            max: 0.01
        batch_size:
            min: 8
            max: 512
        learn_step:
            min: 1
            max: 10
    mutation_sd: 0.1
    rand_seed: 42

network:
    latent_dim: 128
    arch: mlp
    encoder_config:
        hidden_size:
            - 128
    head_config:
        hidden_size:
            - 128

replay_buffer:
    max_size: 100_000

tournament_selection:
    tournament_size: 2
    elitism: true

training:
    max_steps: 1_000_000
    target_score: 200.0
    pop_size: 4
    evo_steps: 10_000
    eval_steps:
    eval_loop: 1
    learning_delay: 0
    eps_start: 1.0
    eps_end: 0.1
    eps_decay: 0.99

Python

from agilerl import LocalTrainer

trainer = LocalTrainer.from_manifest("configs/training/dqn/dqn.yaml")
population, fitnesses = trainer.train()

CLI

python -m agilerl.train configs/training/dqn/dqn.yaml

Every aspect of the training pipeline is customisable — from modifying hyperparameters and mutation strategies in our off-the-shelf tools, to implementing your own evolvable algorithms, network architectures, and training loops.

Custom Training Pipelines

For full control over training, you can build each component individually:

Custom RL pipeline example

import torch

from agilerl.algorithms import DQN
from agilerl.utils.utils import make_vect_envs
from agilerl.components.replay_buffer import ReplayBuffer
from agilerl.hpo.tournament import TournamentSelection
from agilerl.hpo.mutation import Mutations
from agilerl.training.train_off_policy import train_off_policy

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize environment
env = make_vect_envs(env_name="LunarLander-v3", num_envs=16)

# Network configuration
net_config = {
    "latent_dim": 64,
    "encoder_config": {"hidden_size": [64]},
    "head_config": {"hidden_size": [64]}
}

# Algorithm hyperparameters
init_hp = {
    "double": True,
    "batch_size": 256,
    "lr": 1e-3,
    "gamma": 0.99,
    "learn_step": 1,
    "tau": 1e-3
}

# Create a population of DQN agents
population_size=6
agent_pop = DQN.population(
    size=population_size,
    observation_space=env.single_observation_space,
    action_space=env.single_action_space,
    net_config=net_config,
    device=device,
    **init_hp
)

# Replay buffer
memory = ReplayBuffer(max_size=10_000, device=device)

# Evolutionary HPO
tournament = TournamentSelection(
    tournament_size=2,
    elitism=True,
    population_size=population_size
)
mutations = Mutations(
    no_mutation=0.4,
    architecture=0.2,
    new_layer_prob=0.2,
    parameters=0.2,
    activation=0.0,
    rl_hp=0.2,
    mutation_sd=0.1,
    rand_seed=42,
    device=device,
)

trained_pop, pop_fitnesses = train_off_policy(
    env=env,
    env_name="LunarLander-v3",
    algo="DQN",
    pop=agent_pop,
    memory=memory,
    max_steps=1_000_000,
    evo_steps=10_000,
    target=200.0,
    tournament=tournament,
    mutation=mutations,
)

This approach gives you the flexibility to swap in your own Gymnasium or PettingZoo environments, custom evolvable networks, or entirely custom training loops while still leveraging AgileRL's evolutionary HPO.

Training on Arena

Arena is the RLOps platform from AgileRL. We provide tools to create and validate custom reinforcement learning environments on the platform and train RL agents on managed cloud infrastructure specifically tailored to RL workloads.

AgileRL ships a Python SDK and a CLI for interacting with the platform through the agilerl-arena package. It is a separate PyPI distribution that contributes the agilerl.arena namespace. Install it directly, or via the AgileRL extra:

pip install agilerl-arena
# or
pip install "agilerl[arena]"

Python

Use the ArenaClient to interact with Arena programmatically from scripts or notebooks:

from agilerl.arena import ArenaClient

client = ArenaClient()
client.login()

# Register and validate a custom environment
client.validate_environment(source="path/to/my_env.py")

# Train on validated custom environment
client.submit_experiment(
    manifest="path/to/manifest.yaml",
    project="my-project",
)

Arena CLI

The same operations are available from the command line:

# Authenticate with Arena
arena login

# Upload and validate
arena env validate --source path/to/my_env.py

# Train on validated custom environment
arena experiments submit path/to/manifest.yaml --project my-project

For the full CLI and Python SDK reference—including authentication, environment validation, experiments, and deployment—see the Arena Client documentation.

Tutorials

We are constantly updating our tutorials to showcase the latest features of AgileRL and how users can leverage our evolutionary HPO to achieve 10x faster hyperparameter optimization. Please see the available tutorials below.

Tutorial Type	Description	Tutorials
Single-agent tasks	Guides for training both on and off-policy agents to beat a variety of Gymnasium environments.	PPO - Acrobot TD3 - Lunar Lander Rainbow DQN - CartPole Recurrent PPO - Masked Pendulum
Multi-agent tasks	Use of PettingZoo environments such as training DQN to play Connect Four with curriculum learning and self-play, and for multi-agent tasks in MPE environments.	DQN - Connect Four MADDPG - Space Invaders MATD3 - Speaker Listener
Hierarchical curriculum learning	Shows how to teach agents Skills and combine them to achieve an end goal.	PPO - Lunar Lander
Contextual multi-arm bandits	Learn to make the correct decision in environments that only have one timestep.	NeuralUCB - Iris Dataset NeuralTS - PenDigits
Custom Modules & Networks	Learn how to create custom evolvable modules and networks for RL algorithms.	Dueling Distributional Q Network EvolvableSimBa
Training on Arena	Upload and validate custom environments, submit training jobs on managed cloud infrastructure, and deploy trained agents for inference.	PPO - Acrobot Custom Environment
LLM Finetuning	Learn how to finetune an LLM using AgileRL.	GRPO

Evolvable Algorithms (more coming soon!)

Single-agent

RL	Algorithm
On-Policy	Proximal Policy Optimization (PPO)
Off-Policy	Deep Q Learning (DQN) Rainbow DQN Deep Deterministic Policy Gradient (DDPG) Twin Delayed Deep Deterministic Policy Gradient (TD3)
Offline	Conservative Q-Learning (CQL) Implicit Language Q-Learning (ILQL)

Multi-agent

RL	Algorithm
Multi-agent	Multi-Agent Deep Deterministic Policy Gradient (MADDPG) Multi-Agent Twin-Delayed Deep Deterministic Policy Gradient (MATD3) Independent Proximal Policy Optimization (IPPO)

Contextual multi-armed bandit

RL	Algorithm
Bandits	Neural Contextual Bandits with UCB-based Exploration (NeuralUCB) Neural Contextual Bandits with Thompson Sampling (NeuralTS)

LLM Fine-tuning

RL	Algorithm
On-Policy	Group Relative Policy Optimization (GRPO) Clipped Importance Sampling Policy Optimization (CISPO) Grouped Sequence Policy Optimization (GSPO) LLM Proximal Policy Optimization (LLM PPO) LLM REINFORCE
Off-Policy	Direct Preference Optimization (DPO)

Citing AgileRL

If you use AgileRL in your work, please cite the repository:

@software{Ustaran-Anderegg_AgileRL,
author = {Ustaran-Anderegg, Nicholas and Pratt, Michael and Sabal-Bermudez, Jaime},
license = {Apache-2.0},
title = {{AgileRL}},
url = {https://github.com/AgileRL/AgileRL}
}

Project details

These details have been verified by PyPI

Owner

AgileRL

Maintainers

jaimesabalbermudez micdoh mikepratt1999 nicku-a

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.8.0.dev0 pre-release

Jun 22, 2026

2.7.1

Jun 23, 2026

2.7.0

May 15, 2026

2.7.0.dev3 pre-release

May 13, 2026

2.7.0.dev2 pre-release

May 6, 2026

2.7.0.dev1 pre-release

Apr 30, 2026

2.7.0.dev0 pre-release

Apr 20, 2026

2.6.1

Mar 19, 2026

2.6.0

Mar 18, 2026

2.5.1.dev0 pre-release

Mar 17, 2026

2.5.0

Mar 3, 2026

2.5.0.dev5 pre-release

Feb 27, 2026

2.5.0.dev4 pre-release

Feb 24, 2026

2.5.0.dev3 pre-release

Feb 23, 2026

2.5.0.dev2 pre-release

Feb 23, 2026

2.5.0.dev1 pre-release

Feb 18, 2026

2.5.0.dev0 pre-release

Feb 16, 2026

2.4.3

Feb 12, 2026

2.4.3.dev1 pre-release

Feb 10, 2026

2.4.3.dev0 pre-release

Feb 6, 2026

2.4.2

Feb 6, 2026

2.4.2.dev1 pre-release

Feb 3, 2026

2.4.2.dev0 pre-release

Feb 3, 2026

2.4.1

Jan 15, 2026

2.4.1.dev3 pre-release

Dec 18, 2025

2.4.1.dev2 pre-release

Dec 18, 2025

2.4.1.dev1 pre-release

Nov 20, 2025

2.4.1.dev0 pre-release

Nov 3, 2025

2.4.0

Nov 10, 2025

2.4.0.dev0 pre-release

Oct 30, 2025

2.3.5

Oct 16, 2025

2.3.5.dev1 pre-release

Oct 13, 2025

2.3.5.dev0 pre-release

Oct 9, 2025

2.3.4

Sep 5, 2025

2.3.4.dev2 pre-release

Aug 26, 2025

2.3.4.dev1 pre-release

Aug 26, 2025

2.3.4.dev0 pre-release

Aug 12, 2025

2.3.3

Jul 29, 2025

2.3.3.dev1 pre-release

Jul 28, 2025

2.3.3.dev0 pre-release

Jul 24, 2025

2.3.2

Jul 24, 2025

2.3.2.dev0 pre-release

Jul 24, 2025

2.3.1

Jul 21, 2025

2.3.0

Jul 10, 2025

2.2.8

May 12, 2025

2.2.7

May 2, 2025

2.2.6

May 1, 2025

2.2.5

May 1, 2025

2.2.4

Apr 30, 2025

2.2.3

Apr 24, 2025

2.2.2

Apr 16, 2025

2.2.1

Apr 11, 2025

2.2.0

Apr 9, 2025

2.1.4

Apr 3, 2025

2.1.3

Apr 2, 2025

2.1.2

Mar 25, 2025

2.1.1

Mar 19, 2025

2.1.0

Mar 13, 2025

2.0.6

Mar 7, 2025

2.0.5

Mar 5, 2025

2.0.4

Mar 4, 2025

2.0.3

Feb 24, 2025

2.0.2

Feb 17, 2025

2.0.1

Feb 13, 2025

2.0.0

Feb 6, 2025

1.0.30

Feb 4, 2025

1.0.29

Feb 4, 2025

1.0.28

Feb 4, 2025

1.0.27

Feb 4, 2025

1.0.26

Feb 3, 2025

1.0.25

Jan 29, 2025

1.0.24

Jan 13, 2025

1.0.23

Jan 2, 2025

1.0.21

Dec 6, 2024

1.0.20

Dec 2, 2024

1.0.19

Nov 29, 2024

1.0.18

Nov 22, 2024

1.0.17

Nov 7, 2024

1.0.16

Oct 25, 2024

1.0.15

Oct 22, 2024

1.0.14

Oct 11, 2024

1.0.13

Oct 9, 2024

1.0.12

Oct 8, 2024

1.0.11

Aug 30, 2024

1.0.10

Aug 20, 2024

1.0.9

Aug 16, 2024

1.0.8

Aug 15, 2024

1.0.7

Aug 15, 2024

1.0.6

Aug 14, 2024

1.0.5

Aug 13, 2024

1.0.4

Jul 30, 2024

1.0.3

Jul 18, 2024

1.0.2

Jul 11, 2024

1.0.1

Jul 3, 2024

1.0.0

Jun 21, 2024

0.1.35

Jun 17, 2024

0.1.34

Jun 7, 2024

0.1.33

Jun 7, 2024

0.1.32

Jun 5, 2024

0.1.31

Jun 3, 2024

0.1.30

May 30, 2024

0.1.29

May 30, 2024

0.1.28

May 29, 2024

0.1.27

May 24, 2024

0.1.26

May 16, 2024

0.1.25

May 10, 2024

0.1.24

Mar 27, 2024

0.1.23

Mar 27, 2024

0.1.22

Mar 18, 2024

0.1.21

Feb 23, 2024

0.1.20

Feb 9, 2024

0.1.19

Dec 11, 2023

0.1.18

Nov 16, 2023

0.1.17

Nov 15, 2023

0.1.16

Nov 14, 2023

0.1.15

Nov 14, 2023

0.1.14

Nov 13, 2023

0.1.13

Oct 27, 2023

0.1.12

Oct 13, 2023

0.1.11

Sep 8, 2023

0.1.10

Sep 8, 2023

0.1.9

Sep 7, 2023

0.1.8

Aug 31, 2023

0.1.7

Jul 7, 2023

0.1.6

May 24, 2023

0.1.5

May 5, 2023

0.1.4

Apr 4, 2023

0.1.3

Mar 16, 2023

0.1.2

Mar 9, 2023

0.1.1

Mar 7, 2023

0.1.0

Mar 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agilerl-2.8.0.dev0.tar.gz (503.8 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agilerl-2.8.0.dev0-py3-none-any.whl (601.2 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file agilerl-2.8.0.dev0.tar.gz.

File metadata

Download URL: agilerl-2.8.0.dev0.tar.gz
Upload date: Jun 22, 2026
Size: 503.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.8

File hashes

Hashes for agilerl-2.8.0.dev0.tar.gz
Algorithm	Hash digest
SHA256	`aba523aefdc5ab725b05e5ac73bc56bd3e783f107c6fabab810a029944b6c01f`
MD5	`06a858d308878ab5e81bd6f6df0aa1fd`
BLAKE2b-256	`559285c964dc7ae522f5059b680775de5295bc74920899ae165b07d058e79546`

See more details on using hashes here.

File details

Details for the file agilerl-2.8.0.dev0-py3-none-any.whl.

File metadata

Download URL: agilerl-2.8.0.dev0-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 601.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.8

File hashes

Hashes for agilerl-2.8.0.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7059a578337908cd63956d9d8a634241170349350f1ed83ac1b585f8853b437d`
MD5	`20064894aa36e0c08e5b6335281c8a00`
BLAKE2b-256	`8a0a8133187f060663d4192880ec60110cc5dc210f93f88439ef49de3084e2a4`

See more details on using hashes here.

agilerl 2.8.0.dev0

Navigation

Verified details

Owner

Maintainers

Unverified details

Meta

Project description

🚀 Train super-fast for free on Arena, the RLOps platform from AgileRL 🚀

Table of Contents

Benchmarks

Get Started

Training Locally

Training a Single Agent without Evolutionary HPO

Training a Population with Evolutionary HPO

Custom Training Pipelines

Training on Arena

Python

Arena CLI

Tutorials

Evolvable Algorithms (more coming soon!)

Single-agent

Multi-agent

Contextual multi-armed bandit

LLM Fine-tuning

Citing AgileRL

Project details

Verified details

Owner

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes