The RL-Toolkit: A toolkit for developing and comparing your reinforcement learning agents in various games (OpenAI Gym or Pybullet).
Project description
RL toolkit
Papers
Setting up container
# Preview
docker pull markub3327/rl-toolkit:latest
# Stable
docker pull markub3327/rl-toolkit:2.0.2
Run
# Training container (learner)
docker run -it --rm markub3327/rl-toolkit python3 training.py [-h] -env ENV_NAME -s PATH_TO_MODEL_FOLDER [--wandb]
# Simulation container (agent)
docker run -it --rm markub3327/rl-toolkit python3 testing.py [-h] -env ENV_NAME -f PATH_TO_MODEL_FOLDER [--wandb]
Tested environments
Environment | Observation space | Observation bounds | Action space | Action bounds |
---|---|---|---|---|
BipedalWalkerHardcore-v3 | (24, ) | [-inf , inf] | (4, ) | [-1.0 , 1.0] |
Walker2DBulletEnv-v0 | (22, ) | [-inf , inf] | (6, ) | [-1.0 , 1.0] |
AntBulletEnv-v0 | (28, ) | [-inf , inf] | (8, ) | [-1.0 , 1.0] |
HalfCheetahBulletEnv-v0 | (26, ) | [-inf , inf] | (6, ) | [-1.0 , 1.0] |
HopperBulletEnv-v0 | (15, ) | [-inf , inf] | (3, ) | [-1.0 , 1.0] |
HumanoidBulletEnv-v0 | (44, ) | [-inf , inf] | (17, ) | [-1.0 , 1.0] |
Results
Summary
Score
Environment | gSDE | gSDE + Huber loss |
---|---|---|
BipedalWalkerHardcore-v3(2) | 13 ± 18 | - |
Walker2DBulletEnv-v0(1) | 2270 ± 28 | 2732 ± 96 |
AntBulletEnv-v0(1) | 3106 ± 61 | 3460 ± 119 |
HalfCheetahBulletEnv-v0(1) | 2945 ± 95 | 3003 ± 226 |
HopperBulletEnv-v0(1) | 2515 ± 50 | 2555 ± 405 |
HumanoidBulletEnv-v0 | - | ** ± ** |
Actor
Critic
Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV
Changes
v3.2.1 (June 6, 2021)
Features 🔊
- Reverb (+multi-node learning)
- setup.py (package is available on PyPI)
- split research process into agent, learner, tester and random roles
v2.0.2 (May 23, 2021)
Bug fixes 🛠️
- update Dockerfile
- update README.md
- formatted code by Black & Flake8
v2.0.1 (April 27, 2021)
Bug fixes 🛠️
- fix Critic model
v2.0.0 (April 22, 2021)
Features 🔊
- Add Huber loss
- In test mode, rendering to the video file
- Normalized observation by Min-max method
- Remove TD3 algorithm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rl-toolkit-3.2.3.tar.gz
(14.4 kB
view hashes)
Built Distribution
rl_toolkit-3.2.3-py3-none-any.whl
(16.7 kB
view hashes)
Close
Hashes for rl_toolkit-3.2.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0b70ad701b5c36eb0bbe2d27d8d4b489a7b1870bfb6594f60a584a9a0256b06 |
|
MD5 | 45662fe6c151354164f59d49e3de1c1d |
|
BLAKE2b-256 | 1b61b38eeb46825584049774dfa833a2ecd51a9791828c72cdaa0e45d0e0b14d |