AgileRL is a deep reinforcement learning library focused on improving RL development through RLOps.

These details have not been verified by PyPI

Project description

AgileRL

Reinforcement learning streamlined.
Easier and faster reinforcement learning with RLOps. Visit our website. View documentation.
Join the Discord Server for questions, help and collaboration.

✨ AgileRL 2.0 is here! Check out the latest powerful updates✨

🚀 Train super-fast for free on Arena, the RLOps platform from AgileRL 🚀

AgileRL is a Deep Reinforcement Learning library focused on improving development by introducing RLOps - MLOps for reinforcement learning.

This library is initially focused on reducing the time taken for training models and hyperparameter optimization (HPO) by pioneering evolutionary HPO techniques for reinforcement learning.
Evolutionary HPO has been shown to drastically reduce overall training times by automatically converging on optimal hyperparameters, without requiring numerous training runs.
We are constantly adding more algorithms and features. AgileRL already includes state-of-the-art evolvable on-policy, off-policy, offline, multi-agent and contextual multi-armed bandit reinforcement learning algorithms with distributed training.

AgileRL offers 10x faster hyperparameter optimization than SOTA.

Get Started
Benchmarks
Tutorials
Algorithms implemented
Train an agent
Citing AgileRL

Get Started

To see the full AgileRL documentation, including tutorials, visit our documentation site. To ask questions and get help, collaborate, or discuss anything related to reinforcement learning, join the AgileRL Discord Server.

Install as a package with pip:

pip install agilerl

Or install in development mode:

git clone https://github.com/AgileRL/AgileRL.git && cd AgileRL
pip install -e .

Benchmarks

Reinforcement learning algorithms and libraries are usually benchmarked once the optimal hyperparameters for training are known, but it often takes hundreds or thousands of experiments to discover these. This is unrealistic and does not reflect the true, total time taken for training. What if we could remove the need to conduct all these prior experiments?

In the charts below, a single AgileRL run, which automatically tunes hyperparameters, is benchmarked against Optuna's multiple training runs traditionally required for hyperparameter optimization, demonstrating the real time savings possible. Global steps is the sum of every step taken by any agent in the environment, including across an entire population.

AgileRL offers an order of magnitude speed up in hyperparameter optimization vs popular reinforcement learning training frameworks combined with Optuna. Remove the need for multiple training runs and save yourself hours.

AgileRL also supports multi-agent reinforcement learning using the Petting Zoo-style (parallel API). The charts below highlight the performance of our MADDPG and MATD3 algorithms with evolutionary hyper-parameter optimisation (HPO), benchmarked against epymarl's MADDPG algorithm with grid-search HPO for the simple speaker listener and simple spread environments.

Tutorials

We are constantly updating our tutorials to showcase the latest features of AgileRL and how users can leverage our evolutionary HPO to achieve 10x faster hyperparameter optimization. Please see the available tutorials below.

Tutorial Type	Description	Tutorials
Single-agent tasks	Guides for training both on and off-policy agents to beat a variety of Gymnasium environments.	PPO - Acrobot TD3 - Lunar Lander Rainbow DQN - CartPole
Multi-agent tasks	Use of PettingZoo environments such as training DQN to play Connect Four with curriculum learning and self-play, and for multi-agent tasks in MPE environments.	DQN - Connect Four MADDPG - Space Invaders MATD3 - Speaker Listener
Hierarchical curriculum learning	Shows how to teach agents Skills and combine them to achieve an end goal.	PPO - Lunar Lander
Contextual multi-arm bandits	Learn to make the correct decision in environments that only have one timestep.	NeuralUCB - Iris Dataset NeuralTS - PenDigits
Custom Modules & Networks	Learn how to create custom evolvable modules and networks for RL algorithms.	Dueling Distributional Q Network EvolvableSimBa
LLM Finetuning	Learn how to finetune an LLM using AgileRL.	GRPO

Evolvable algorithms (more coming soon!)

Single-agent algorithms

RL	Algorithm
On-Policy	Proximal Policy Optimization (PPO)
Off-Policy	Deep Q Learning (DQN) Rainbow DQN Deep Deterministic Policy Gradient (DDPG) Twin Delayed Deep Deterministic Policy Gradient (TD3)
Offline	Conservative Q-Learning (CQL) Implicit Language Q-Learning (ILQL)

Multi-agent algorithms

RL	Algorithm
Multi-agent	Multi-Agent Deep Deterministic Policy Gradient (MADDPG) Multi-Agent Twin-Delayed Deep Deterministic Policy Gradient (MATD3) Independent Proximal Policy Optimization (IPPO)

Contextual multi-armed bandit algorithms

RL	Algorithm
Bandits	Neural Contextual Bandits with UCB-based Exploration (NeuralUCB) Neural Contextual Bandits with Thompson Sampling (NeuralTS)

LLM Reasoning Algorithms

RL	Algorithm
On-Policy	Group Relative Policy Optimization (GRPO)

Train an Agent to Beat a Gym Environment

Before starting training, there are some meta-hyperparameters and settings that must be set. These are defined in INIT_HP, for general parameters, and MUTATION_PARAMS, which define the evolutionary probabilities, and NET_CONFIG, which defines the network architecture. For example:

Basic Hyperparameters

INIT_HP = {
    'ENV_NAME': 'LunarLander-v3',   # Gym environment name
    'ALGO': 'DQN',                  # Algorithm
    'DOUBLE': True,                 # Use double Q-learning
    'CHANNELS_LAST': False,         # Swap image channels dimension from last to first [H, W, C] -> [C, H, W]
    'BATCH_SIZE': 256,              # Batch size
    'LR': 1e-3,                     # Learning rate
    'MAX_STEPS': 1_000_000,         # Max no. steps
    'TARGET_SCORE': 200.,           # Early training stop at avg score of last 100 episodes
    'GAMMA': 0.99,                  # Discount factor
    'MEMORY_SIZE': 10000,           # Max memory buffer size
    'LEARN_STEP': 1,                # Learning frequency
    'TAU': 1e-3,                    # For soft update of target parameters
    'TOURN_SIZE': 2,                # Tournament size
    'ELITISM': True,                # Elitism in tournament selection
    'POP_SIZE': 6,                  # Population size
    'EVO_STEPS': 10_000,            # Evolution frequency
    'EVAL_STEPS': None,             # Evaluation steps
    'EVAL_LOOP': 1,                 # Evaluation episodes
    'LEARNING_DELAY': 1000,         # Steps before starting learning
    'WANDB': True,                  # Log with Weights and Biases
}

Mutation Hyperparameters

MUTATION_PARAMS = {
    # Relative probabilities
    'NO_MUT': 0.4,                              # No mutation
    'ARCH_MUT': 0.2,                            # Architecture mutation
    'NEW_LAYER': 0.2,                           # New layer mutation
    'PARAMS_MUT': 0.2,                          # Network parameters mutation
    'ACT_MUT': 0,                               # Activation layer mutation
    'RL_HP_MUT': 0.2,                           # Learning HP mutation
    'MUT_SD': 0.1,                              # Mutation strength
    'RAND_SEED': 1,                             # Random seed
}

Basic Network Configuration

NET_CONFIG = {
    'latent_dim': 16
    'encoder_config': {
      'hidden_size': [32]     # Observation encoder configuration
    }
    'head_config': {
      'hidden_size': [32]     # Network head configuration
    }

}

Creating a Population of Agents

First, use utils.utils.create_population to create a list of agents - our population that will evolve and mutate to the optimal hyperparameters.

Population Creation Example

import torch
from agilerl.utils.utils import (
    make_vect_envs,
    create_population,
    observation_space_channels_to_first
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

num_envs = 16
env = make_vect_envs(env_name=INIT_HP['ENV_NAME'], num_envs=num_envs)

observation_space = env.single_observation_space
action_space = env.single_action_space
if INIT_HP['CHANNELS_LAST']:
    observation_space = observation_space_channels_to_first(observation_space)

agent_pop = create_population(
    algo=INIT_HP['ALGO'],                 # Algorithm
    observation_space=observation_space,  # Observation space
    action_space=action_space,            # Action space
    net_config=NET_CONFIG,                # Network configuration
    INIT_HP=INIT_HP,                      # Initial hyperparameters
    population_size=INIT_HP['POP_SIZE'],  # Population size
    num_envs=num_envs,                    # Number of vectorized environments
    device=device
)

Initializing Evolutionary HPO

Next, create the tournament, mutations and experience replay buffer objects that allow agents to share memory and efficiently perform evolutionary HPO.

Mutations and Tournament Seelection Example

from agilerl.components.replay_buffer import ReplayBuffer
from agilerl.hpo.tournament import TournamentSelection
from agilerl.hpo.mutation import Mutations

memory = ReplayBuffer(
    max_size=INIT_HP['MEMORY_SIZE'],   # Max replay buffer size
    device=device,
)

tournament = TournamentSelection(
    tournament_size=INIT_HP['TOURN_SIZE'], # Tournament selection size
    elitism=INIT_HP['ELITISM'],            # Elitism in tournament selection
    population_size=INIT_HP['POP_SIZE'],   # Population size
    eval_loop=INIT_HP['EVAL_LOOP'],        # Evaluate using last N fitness scores
)

mutations = Mutations(
    no_mutation=MUTATION_PARAMS['NO_MUT'],                # No mutation
    architecture=MUTATION_PARAMS['ARCH_MUT'],             # Architecture mutation
    new_layer_prob=MUTATION_PARAMS['NEW_LAYER'],          # New layer mutation
    parameters=MUTATION_PARAMS['PARAMS_MUT'],             # Network parameters mutation
    activation=MUTATION_PARAMS['ACT_MUT'],                # Activation layer mutation
    rl_hp=MUTATION_PARAMS['RL_HP_MUT'],                   # Learning HP mutation
    mutation_sd=MUTATION_PARAMS['MUT_SD'],                # Mutation strength
    rand_seed=MUTATION_PARAMS['RAND_SEED'],               # Random seed
    device=device,
)

Train A Population of Agents

The easiest training loop implementation is to use our train_off_policy() function. It requires the agent have methods get_action() and learn().

from agilerl.training.train_off_policy import train_off_policy

trained_pop, pop_fitnesses = train_off_policy(
    env=env,                                   # Gym-style environment
    env_name=INIT_HP['ENV_NAME'],              # Environment name
    algo=INIT_HP['ALGO'],                      # Algorithm
    pop=agent_pop,                             # Population of agents
    memory=memory,                             # Replay buffer
    swap_channels=INIT_HP['CHANNELS_LAST'],    # Swap image channel from last to first
    max_steps=INIT_HP["MAX_STEPS"],            # Max number of training steps
    evo_steps=INIT_HP['EVO_STEPS'],            # Evolution frequency
    eval_steps=INIT_HP["EVAL_STEPS"],          # Number of steps in evaluation episode
    eval_loop=INIT_HP["EVAL_LOOP"],            # Number of evaluation episodes
    learning_delay=INIT_HP['LEARNING_DELAY'],  # Steps before starting learning
    target=INIT_HP['TARGET_SCORE'],            # Target score for early stopping
    tournament=tournament,                     # Tournament selection object
    mutation=mutations,                        # Mutations object
    wb=INIT_HP['WANDB'],                       # Weights and Biases tracking
)

Citing AgileRL

If you use AgileRL in your work, please cite the repository:

@software{Ustaran-Anderegg_AgileRL,
author = {Ustaran-Anderegg, Nicholas and Pratt, Michael and Sabal-Bermudez, Jaime},
license = {Apache-2.0},
title = {{AgileRL}},
url = {https://github.com/AgileRL/AgileRL}
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.3.0

Jul 10, 2025

2.2.8

May 12, 2025

2.2.7

May 2, 2025

2.2.6

May 1, 2025

2.2.5

May 1, 2025

2.2.4

Apr 30, 2025

2.2.3

Apr 24, 2025

2.2.2

Apr 16, 2025

2.2.1

Apr 11, 2025

2.2.0

Apr 9, 2025

2.1.4

Apr 3, 2025

2.1.3

Apr 2, 2025

2.1.2

Mar 25, 2025

2.1.1

Mar 19, 2025

2.1.0

Mar 13, 2025

2.0.6

Mar 7, 2025

2.0.5

Mar 5, 2025

2.0.4

Mar 4, 2025

2.0.3

Feb 24, 2025

2.0.2

Feb 17, 2025

2.0.1

Feb 13, 2025

2.0.0

Feb 6, 2025

1.0.30

Feb 4, 2025

1.0.29

Feb 4, 2025

1.0.28

Feb 4, 2025

1.0.27

Feb 4, 2025

1.0.26

Feb 3, 2025

1.0.25

Jan 29, 2025

1.0.24

Jan 13, 2025

1.0.23

Jan 2, 2025

1.0.21

Dec 6, 2024

1.0.20

Dec 2, 2024

1.0.19

Nov 29, 2024

1.0.18

Nov 22, 2024

1.0.17

Nov 7, 2024

1.0.16

Oct 25, 2024

1.0.15

Oct 22, 2024

1.0.14

Oct 11, 2024

1.0.13

Oct 9, 2024

1.0.12

Oct 8, 2024

1.0.11

Aug 30, 2024

1.0.10

Aug 20, 2024

1.0.9

Aug 16, 2024

1.0.8

Aug 15, 2024

1.0.7

Aug 15, 2024

1.0.6

Aug 14, 2024

1.0.5

Aug 13, 2024

1.0.4

Jul 30, 2024

1.0.3

Jul 18, 2024

1.0.2

Jul 11, 2024

1.0.1

Jul 3, 2024

1.0.0

Jun 21, 2024

0.1.35

Jun 17, 2024

0.1.34

Jun 7, 2024

0.1.33

Jun 7, 2024

0.1.32

Jun 5, 2024

0.1.31

Jun 3, 2024

0.1.30

May 30, 2024

0.1.29

May 30, 2024

0.1.28

May 29, 2024

0.1.27

May 24, 2024

0.1.26

May 16, 2024

0.1.25

May 10, 2024

0.1.24

Mar 27, 2024

0.1.23

Mar 27, 2024

0.1.22

Mar 18, 2024

0.1.21

Feb 23, 2024

0.1.20

Feb 9, 2024

0.1.19

Dec 11, 2023

0.1.18

Nov 16, 2023

0.1.17

Nov 15, 2023

0.1.16

Nov 14, 2023

0.1.15

Nov 14, 2023

0.1.14

Nov 13, 2023

0.1.13

Oct 27, 2023

0.1.12

Oct 13, 2023

0.1.11

Sep 8, 2023

0.1.10

Sep 8, 2023

0.1.9

Sep 7, 2023

0.1.8

Aug 31, 2023

0.1.7

Jul 7, 2023

0.1.6

May 24, 2023

0.1.5

May 5, 2023

0.1.4

Apr 4, 2023

0.1.3

Mar 16, 2023

0.1.2

Mar 9, 2023

0.1.1

Mar 7, 2023

0.1.0

Mar 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agilerl-2.3.0.tar.gz (280.0 kB view details)

Uploaded Jul 10, 2025 Source

Built Distribution

agilerl-2.3.0-py3-none-any.whl (347.8 kB view details)

Uploaded Jul 10, 2025 Python 3

File details

Details for the file agilerl-2.3.0.tar.gz.

File metadata

Download URL: agilerl-2.3.0.tar.gz
Upload date: Jul 10, 2025
Size: 280.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.11.12 Darwin/24.5.0

File hashes

Hashes for agilerl-2.3.0.tar.gz
Algorithm	Hash digest
SHA256	`a0ea1c5b8cf01869bce73354616087d557ba5220aaf38a1a0d74b14d8ef2b255`
MD5	`07746d22dbb7c6e853768c7ed902b8af`
BLAKE2b-256	`020c5ab4b2d60b497d5aff720070b29ced577af0a97d9e9acfd9c87665ed55c4`

See more details on using hashes here.

File details

Details for the file agilerl-2.3.0-py3-none-any.whl.

File metadata

Download URL: agilerl-2.3.0-py3-none-any.whl
Upload date: Jul 10, 2025
Size: 347.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.11.12 Darwin/24.5.0

File hashes

Hashes for agilerl-2.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d92406a245b0c20a7155cbebe9aa004e3d46095da5d85ef10159a82b53620d27`
MD5	`471900d7a0893ece2c130142c2e380ab`
BLAKE2b-256	`049947a256841d7747f3e001c7790878e6cc0a53d813ed05db6da22232f6bcd9`

See more details on using hashes here.

agilerl 2.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

AgileRL

✨ AgileRL 2.0 is here! Check out the latest powerful updates✨

🚀 Train super-fast for free on Arena, the RLOps platform from AgileRL 🚀

Table of Contents

Get Started

Benchmarks

Tutorials

Evolvable algorithms (more coming soon!)

Single-agent algorithms

Multi-agent algorithms

Contextual multi-armed bandit algorithms

LLM Reasoning Algorithms

Train an Agent to Beat a Gym Environment

Creating a Population of Agents

Initializing Evolutionary HPO

Train A Population of Agents

Citing AgileRL

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes