Skip to main content

The RL-Toolkit: A toolkit for developing and comparing your reinforcement learning agents in various games (OpenAI Gym or Pybullet).

Project description

RL Toolkit

Release Tag Issues Commits Languages Size

Papers

Setting up container

# Preview
docker pull markub3327/rl-toolkit:latest

# Stable
docker pull markub3327/rl-toolkit:2.0.2

Run

# Run learner's container
docker run -p 8000:8000 -it --rm markub3327/rl-toolkit

# Run tester's or agent's container
docker run -it --rm markub3327/rl-toolkit


# Learner container
python3 -m rl_toolkit -e [ENV_NAME] learner --db_server [IP_ADDRESS/HOSTNAME] -s [PATH_TO_MODEL] [--wandb] [-h]

# Agent container
python3 -m rl_toolkit -e [ENV_NAME] agent --db_server [IP_ADDRESS/HOSTNAME] [--wandb] [-h]

# Tester container
python3 -m rl_toolkit -e [ENV_NAME] tester --model_path [PATH_TO_MODEL] [--render] [--wandb] [-h]

Tested environments

Environment Observation space Observation bounds Action space Action bounds
BipedalWalkerHardcore-v3 (24, ) [-inf, inf] (4, ) [-1.0, 1.0]
Walker2DBulletEnv-v0 (22, ) [-inf, inf] (6, ) [-1.0, 1.0]
AntBulletEnv-v0 (28, ) [-inf, inf] (8, ) [-1.0, 1.0]
HalfCheetahBulletEnv-v0 (26, ) [-inf, inf] (6, ) [-1.0, 1.0]
HopperBulletEnv-v0 (15, ) [-inf, inf] (3, ) [-1.0, 1.0]
HumanoidBulletEnv-v0 (44, ) [-inf, inf] (17, ) [-1.0, 1.0]
MinitaurBulletEnv-v0 (28, ) [-167.72488, 167.72488] (8, ) [-1.0, 1.0]

Results

Summary

results

Score

Environment SAC + gSDE SAC + gSDE
+ Huber loss
TQC + gSDE TQC + gSDE
+ Reverb
BipedalWalkerHardcore-v3 13 ± 18(2) - 228 ± 18(2) -
Walker2DBulletEnv-v0 2270 ± 28(1) 2732 ± 96 2535 ± 94(2) -
AntBulletEnv-v0 3106 ± 61(1) 3460 ± 119 3700 ± 37(2) -
HalfCheetahBulletEnv-v0 2945 ± 95(1) 3003 ± 226 3041 ± 157(2) -
HopperBulletEnv-v0 2515 ± 50(1) 2555 ± 405 2401 ± 62(2) -
HumanoidBulletEnv-v0 - - - -
MinitaurBulletEnv-v0 - - - -

Model

model


Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV

Changes

v3.2.4 (July 7, 2021)

Features 🔊

  • Reverb
  • setup.py (package is available on PyPI)
  • split into agent, learner and tester roles
  • Use custom model and layer for defining Actor-Critic
  • MultiCritic - concatenating multiple critic networks into one network
  • Truncated Quantile Critics

v2.0.2 (May 23, 2021)

Bug fixes 🛠️

  • update Dockerfile
  • update README.md
  • formatted code by Black & Flake8

v2.0.1 (April 27, 2021)

Bug fixes 🛠️

  • fix Critic model

v2.0.0 (April 22, 2021)

Features 🔊

  • Add Huber loss
  • In test mode, rendering to the video file
  • Normalized observation by Min-max method
  • Remove TD3 algorithm

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rl-toolkit-3.2.5.tar.gz (15.8 kB view hashes)

Uploaded Source

Built Distribution

rl_toolkit-3.2.5-py3-none-any.whl (18.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page