The RL-Toolkit: A toolkit for developing and comparing your reinforcement learning agents in various games (OpenAI Gym or Pybullet).
Project description
RL Toolkit
Papers
- Soft Actor-Critic
- Generalized State-Dependent Exploration
- Reverb: A framework for experience replay
- Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
Setting up container
# Preview
docker pull markub3327/rl-toolkit:latest
# Stable
docker pull markub3327/rl-toolkit:2.0.2
Run
# Run learner's container
docker run -p 8000:8000 -it --rm markub3327/rl-toolkit
# Run tester's or agent's container
docker run -it --rm markub3327/rl-toolkit
# Learner container
python3 -m rl_toolkit -e [ENV_NAME] learner --db_server [IP_ADDRESS/HOSTNAME] -s [PATH_TO_MODEL] [--wandb] [-h]
# Agent container
python3 -m rl_toolkit -e [ENV_NAME] agent --db_server [IP_ADDRESS/HOSTNAME] [--wandb] [-h]
# Tester container
python3 -m rl_toolkit -e [ENV_NAME] tester --model_path [PATH_TO_MODEL] [--render] [--wandb] [-h]
Tested environments
Environment | Observation space | Observation bounds | Action space | Action bounds |
---|---|---|---|---|
BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] |
Walker2DBulletEnv-v0 | (22, ) | [-inf, inf] | (6, ) | [-1.0, 1.0] |
AntBulletEnv-v0 | (28, ) | [-inf, inf] | (8, ) | [-1.0, 1.0] |
HalfCheetahBulletEnv-v0 | (26, ) | [-inf, inf] | (6, ) | [-1.0, 1.0] |
HopperBulletEnv-v0 | (15, ) | [-inf, inf] | (3, ) | [-1.0, 1.0] |
HumanoidBulletEnv-v0 | (44, ) | [-inf, inf] | (17, ) | [-1.0, 1.0] |
MinitaurBulletEnv-v0 | (28, ) | [-167.72488, 167.72488] | (8, ) | [-1.0, 1.0] |
Results
Summary
Score
Environment | SAC + gSDE | SAC + gSDE + Huber loss |
TQC + gSDE | TQC + gSDE + Reverb |
---|---|---|---|---|
BipedalWalkerHardcore-v3 | 13 ± 18(2) | - | 228 ± 18(2) | - |
Walker2DBulletEnv-v0 | 2270 ± 28(1) | 2732 ± 96 | 2535 ± 94(2) | - |
AntBulletEnv-v0 | 3106 ± 61(1) | 3460 ± 119 | 3700 ± 37(2) | - |
HalfCheetahBulletEnv-v0 | 2945 ± 95(1) | 3003 ± 226 | 3041 ± 157(2) | - |
HopperBulletEnv-v0 | 2515 ± 50(1) | 2555 ± 405 | 2401 ± 62(2) | - |
HumanoidBulletEnv-v0 | - | - | - | - |
MinitaurBulletEnv-v0 | - | - | - | - |
Model
Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV
Changes
v3.2.4 (July 7, 2021)
Features 🔊
- Reverb
setup.py
(package is available on PyPI)- split into agent, learner and tester roles
- Use custom model and layer for defining Actor-Critic
- MultiCritic - concatenating multiple critic networks into one network
- Truncated Quantile Critics
v2.0.2 (May 23, 2021)
Bug fixes 🛠️
- update Dockerfile
- update
README.md
- formatted code by Black & Flake8
v2.0.1 (April 27, 2021)
Bug fixes 🛠️
- fix Critic model
v2.0.0 (April 22, 2021)
Features 🔊
- Add Huber loss
- In test mode, rendering to the video file
- Normalized observation by Min-max method
- Remove TD3 algorithm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rl-toolkit-3.2.5.tar.gz
(15.8 kB
view hashes)
Built Distribution
rl_toolkit-3.2.5-py3-none-any.whl
(18.9 kB
view hashes)
Close
Hashes for rl_toolkit-3.2.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c0763f813f4bfc9efc658e28226486ec52d994fa53723e4a201750552c60908 |
|
MD5 | 7a67dccb0830d3552e883b7e79a37f2e |
|
BLAKE2b-256 | 0cb69cc8d977edf8cbbe165564b6c868253d06643f34784c73677f5d9c7ea521 |