Skip to main content

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features

Project description

CleanRL (Clean Implementation of RL Algorithms)

tests docs Code style: black Imports: isort

CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:

  • 📜 Single-file implementation
    • Every detail about an algorithm variant is put into a single standalone file.
    • For example, our ppo_atari.py only has 340 lines of code but contains all implementation details on how PPO works with Atari games, so it is a great reference implementation to read for folks who do not wish to read an entire modular library.
  • 📊 Benchmarked Implementation (7+ algorithms and 34+ games at https://benchmark.cleanrl.dev)
  • 📈 Tensorboard Logging
  • 🪛 Local Reproducibility via Seeding
  • 🎮 Videos of Gameplay Capturing
  • 🧫 Experiment Management with Weights and Biases
  • 💸 Cloud Integration with docker and AWS

You can read more about CleanRL in our JMLR paper and documentation.

CleanRL only contains implementations of online deep reinforcement learning algorithms. If you are looking for offline algorithms, please check out tinkoff-ai/CORL, which shares a similar design philosophy as CleanRL.

ℹ️ Support for Gymnasium: Farama-Foundation/Gymnasium is the next generation of openai/gym that will continue to be maintained and introduce new features. Please see their announcement for further detail. We are migrating to gymnasium and the progress can be tracked in vwxyzjn/cleanrl#277.

⚠️ NOTE: CleanRL is not a modular library and therefore it is not meant to be imported. At the cost of duplicate code, we make all implementation details of a DRL algorithm variant easy to understand, so CleanRL comes with its own pros and cons. You should consider using CleanRL if you want to 1) understand all implementation details of an algorithm's varaint or 2) prototype advanced features that other modular DRL libraries do not support (CleanRL has minimal lines of code so it gives you great debugging experience and you don't have do a lot of subclassing like sometimes in modular DRL libraries).

Get started

Prerequisites:

To run experiments locally, give the following a try:

git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
poetry install

# alternatively, you could use `poetry shell` and do
# `python run cleanrl/ppo.py`
poetry run python cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000

# open another temrminal and enter `cd cleanrl/cleanrl`
tensorboard --logdir runs

To use experiment tracking with wandb, run

wandb login # only required for the first time
poetry run python cleanrl/ppo.py \
    --seed 1 \
    --env-id CartPole-v0 \
    --total-timesteps 50000 \
    --track \
    --wandb-project-name cleanrltest

To run training scripts in other games:

poetry shell

# classic control
python cleanrl/dqn.py --env-id CartPole-v1
python cleanrl/ppo.py --env-id CartPole-v1
python cleanrl/c51.py --env-id CartPole-v1

# atari
poetry install --with atari
python cleanrl/dqn_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/c51_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/ppo_atari.py --env-id BreakoutNoFrameskip-v4

# NEW: 3-4x side-effects free speed up with envpool's atari (only available to linux)
poetry install --with envpool
python cleanrl/ppo_atari_envpool.py --env-id BreakoutNoFrameskip-v4
# Learn Pong-v5 in ~5-10 mins
# Side effects such as lower sample efficiency might occur
poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3

# pybullet
poetry install --with pybullet
python cleanrl/td3_continuous_action.py --env-id MinitaurBulletDuckEnv-v0
python cleanrl/ddpg_continuous_action.py --env-id MinitaurBulletDuckEnv-v0
python cleanrl/sac_continuous_action.py --env-id MinitaurBulletDuckEnv-v0

# procgen
poetry install --with procgen
python cleanrl/ppo_procgen.py --env-id starpilot
python cleanrl/ppg_procgen.py --env-id starpilot

# ppo + lstm
python cleanrl/ppo_atari_lstm.py --env-id BreakoutNoFrameskip-v4
python cleanrl/ppo_memory_env_lstm.py

You may also use a prebuilt development environment hosted in Gitpod:

Open in Gitpod

Algorithms Implemented

Algorithm Variants Implemented
Proximal Policy Gradient (PPO) ppo.py, docs
ppo_atari.py, docs
ppo_continuous_action.py, docs
ppo_atari_lstm.py, docs
ppo_atari_envpool.py, docs
ppo_atari_envpool_xla_jax.py, docs
ppo_atari_envpool_xla_jax_scan.py, docs)
ppo_procgen.py, docs
ppo_atari_multigpu.py, docs
ppo_pettingzoo_ma_atari.py, docs
ppo_continuous_action_isaacgym.py, docs
Deep Q-Learning (DQN) dqn.py, docs
dqn_atari.py, docs
dqn_jax.py, docs
dqn_atari_jax.py, docs
Categorical DQN (C51) c51.py, docs
c51_atari.py, docs
c51_jax.py, docs
c51_atari_jax.py, docs
Soft Actor-Critic (SAC) sac_continuous_action.py, docs
Deep Deterministic Policy Gradient (DDPG) ddpg_continuous_action.py, docs
ddpg_continuous_action_jax.py, docs
Twin Delayed Deep Deterministic Policy Gradient (TD3) td3_continuous_action.py, docs
td3_continuous_action_jax.py, docs
Phasic Policy Gradient (PPG) ppg_procgen.py, docs
Random Network Distillation (RND) ppo_rnd_envpool.py, docs

Open RL Benchmark

To make our experimental data transparent, CleanRL participates in a related project called Open RL Benchmark, which contains tracked experiments from popular DRL libraries such as ours, Stable-baselines3, openai/baselines, jaxrl, and others.

Check out https://benchmark.cleanrl.dev/ for a collection of Weights and Biases reports showcasing tracked DRL experiments. The reports are interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. In the future, Open RL Benchmark will likely provide an dataset API for researchers to easily access the data (see repo).

Support and get involved

We have a Discord Community for support. Feel free to ask questions. Posting in Github Issues and PRs are also welcome. Also our past video recordings are available at YouTube

Citing CleanRL

If you use CleanRL in your work, please cite our technical paper:

@article{huang2022cleanrl,
  author  = {Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and João G.M. Araújo},
  title   = {CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {274},
  pages   = {1--18},
  url     = {http://jmlr.org/papers/v23/21-1342.html}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanrl_test-1.1.2.tar.gz (16.8 MB view details)

Uploaded Source

Built Distribution

cleanrl_test-1.1.2-py3-none-any.whl (16.9 MB view details)

Uploaded Python 3

File details

Details for the file cleanrl_test-1.1.2.tar.gz.

File metadata

  • Download URL: cleanrl_test-1.1.2.tar.gz
  • Upload date:
  • Size: 16.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.9.5 Linux/5.17.5-76051705-generic

File hashes

Hashes for cleanrl_test-1.1.2.tar.gz
Algorithm Hash digest
SHA256 9cf17555060826627d11a0c2d328548407b1f0a3dca8dffc28a8e659c02fa905
MD5 e24d1b26b4739c12772fc5080858a89a
BLAKE2b-256 1a1d123120fa4fcf6f0ceac861a27f743d43df02793d5609e04ff0d5e15a7d6b

See more details on using hashes here.

File details

Details for the file cleanrl_test-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: cleanrl_test-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 16.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.9.5 Linux/5.17.5-76051705-generic

File hashes

Hashes for cleanrl_test-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7f302480295ef82550c5d51f62c6ed69d37599f58d9f7b39fd574dd76294a532
MD5 dd0f6f9e262cbfe04c55fabe2211fb82
BLAKE2b-256 99ccd3e40295640845e4d908e200cb344ba4f7ef9981b580673cc0b716681e69

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page