The RL-Toolkit: A toolkit for developing and comparing your reinforcement learning agents in various games (OpenAI Gym or Pybullet).
Project description
RL toolkit
Papers
Setting up container
# Preview
docker pull markub3327/rl-toolkit:latest
# Stable
docker pull markub3327/rl-toolkit:2.0.2
Run
# Training container (learner)
docker run -it --rm markub3327/rl-toolkit python3 training.py [-h] -env ENV_NAME -s PATH_TO_MODEL_FOLDER [--wandb]
# Simulation container (agent)
docker run -it --rm markub3327/rl-toolkit python3 testing.py [-h] -env ENV_NAME -f PATH_TO_MODEL_FOLDER [--wandb]
Tested environments
Environment | Observation space | Observation bounds | Action space | Action bounds |
---|---|---|---|---|
BipedalWalkerHardcore-v3 | (24, ) | [-inf , inf] | (4, ) | [-1.0 , 1.0] |
Walker2DBulletEnv-v0 | (22, ) | [-inf , inf] | (6, ) | [-1.0 , 1.0] |
AntBulletEnv-v0 | (28, ) | [-inf , inf] | (8, ) | [-1.0 , 1.0] |
HalfCheetahBulletEnv-v0 | (26, ) | [-inf , inf] | (6, ) | [-1.0 , 1.0] |
HopperBulletEnv-v0 | (15, ) | [-inf , inf] | (3, ) | [-1.0 , 1.0] |
HumanoidBulletEnv-v0 | (44, ) | [-inf , inf] | (17, ) | [-1.0 , 1.0] |
Results
Summary
Return from game
Environment | gSDE | gSDE + Huber loss |
---|---|---|
BipedalWalkerHardcore-v3(2) | 13 ± 18 | - |
Walker2DBulletEnv-v0(1) | 2270 ± 28 | 2732 ± 96 |
AntBulletEnv-v0(1) | 3106 ± 61 | 3460 ± 119 |
HalfCheetahBulletEnv-v0(1) | 2945 ± 95 | 3003 ± 226 |
HopperBulletEnv-v0(1) | 2515 ± 50 | 2555 ± 405 |
HumanoidBulletEnv-v0 | - | ** ± ** |
Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV
Languages: Python, Shell
Author: Martin Kubovčík
v3.0.7 (June 1, 2021)
Features 🔊
- Reverb
- updated kernel_initializer for last layers
- without clipping the mean
- setup.py (package is available on PyPI)
- split research process into agent, learner and tester roles
v2.0.2 (May 23, 2021)
Bug fixes 🛠️
- update Dockerfile
- update README.md
- formatted code by Black & Flake8
v2.0.1 (April 27, 2021)
Bug fixes 🛠️
- fix Critic model
v2.0.0 (April 22, 2021)
Features 🔊
- Add Huber loss
- In test mode, rendering to the video file
- Normalized observation by Min-max method
- Remove TD3 algorithm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rl-toolkit-3.1.1.tar.gz
(14.3 kB
view hashes)
Built Distribution
rl_toolkit-3.1.1-py3-none-any.whl
(16.4 kB
view hashes)
Close
Hashes for rl_toolkit-3.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fdb4b1c4e1a8b9be0aeb956f11f02f421e0d68817cd990740005b965729b2c2 |
|
MD5 | bc07e87f10d1c753bb83163e572ec5b6 |
|
BLAKE2b-256 | 17cd7fef836556ac5b828ce67904ac985f818abb3e3a94847a598cd1df9da828 |