RL-Toolkit: A Research Framework for Robotics
Project description
RL Toolkit
Papers
- Soft Actor-Critic
- Generalized State-Dependent Exploration
- Reverb: A framework for experience replay
- Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
- Acme: A Research Framework for Distributed Reinforcement Learning
Installation with PyPI
On PC AMD64 with Ubuntu/Debian
- Install dependences
apt update -y apt install swig -y
- Install RL-Toolkit
pip3 install rl-toolkit[all]
- Run (for Server)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
Run (for Agent)python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
Run (for Learner)python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
Run (for Tester)python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5
On NVIDIA Jetson
-
Install dependences
Tensorflow for JetPack, follow instructions here for installation.apt update -y apt install swig -y pip3 install 'tensorflow-probability==0.14.1'
-
Install Reverb
Download Bazel 3.7.2 for arm64
GitHub heremv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel chmod +x ~/bin/bazel export PATH=$PATH:~/bin
Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
git clone https://github.com/deepmind/reverb cd reverb/ git checkout r0.5.0 # for TF 2.6.0
Make changes in Reverb before building !
In .bazelrc- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain - build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" + build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1" - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64 + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
In WORKSPACE
- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
In oss_build.sh
- if [ "$python_version" = "3.7" ]; then + if [ "$python_version" = "3.6" ]; then + export PYTHON_BIN_PATH=/usr/bin/python3.6 && export PYTHON_LIB_PATH=/usr/local/lib/python3.6/dist-packages + ABI=cp36 + elif [ "$python_version" = "3.7" ]; then - bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/... + bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/... # Builds Reverb and creates the wheel package. - bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package + bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package ./bazel-bin/reverb/pip_package/build_pip_package --dst $OUTPUT_DIR $PIP_PKG_EXTRA_ARGS
In reverb/cc/platform/default/repo.bzl
urls = [ - "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version), + "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version), ]
In reverb/pip_package/build_pip_package.sh
- "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null + "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
Build and install
bash oss_build.sh --clean true --tf_dep_override "tensorflow=2.6.0" --release --python "3.6" bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release pip3 install /tmp/reverb/dist/dm_reverb-*
Cleaning
cd ../ rm -R reverb/
-
Install RL-Toolkit
pip3 install rl-toolkit
-
Run (for Server)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
Run (for Agent)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
Run (for Learner)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
Run (for Tester)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5
Environments
Environment | Observation space | Observation bounds | Action space | Action bounds |
---|---|---|---|---|
BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] |
Walker2DBulletEnv-v0 | (22, ) | [-inf, inf] | (6, ) | [-1.0, 1.0] |
AntBulletEnv-v0 | (28, ) | [-inf, inf] | (8, ) | [-1.0, 1.0] |
HalfCheetahBulletEnv-v0 | (26, ) | [-inf, inf] | (6, ) | [-1.0, 1.0] |
HopperBulletEnv-v0 | (15, ) | [-inf, inf] | (3, ) | [-1.0, 1.0] |
HumanoidBulletEnv-v0 | (44, ) | [-inf, inf] | (17, ) | [-1.0, 1.0] |
MinitaurBulletEnv-v0 | (28, ) | [-167.72488, 167.72488] | (8, ) | [-1.0, 1.0] |
Results
Environment | SAC + gSDE |
SAC + gSDE + Huber loss |
SAC + TQC + gSDE |
SAC + TQC + gSDE + LogCosh + Reverb |
---|---|---|---|---|
BipedalWalkerHardcore-v3 | 13 ± 18(2) | - | 228 ± 18(2) | - |
Walker2DBulletEnv-v0 | 2270 ± 28(1) | 2732 ± 96 | 2535 ± 94(2) | - |
AntBulletEnv-v0 | 3106 ± 61(1) | 3460 ± 119 | 3700 ± 37(2) | - |
HalfCheetahBulletEnv-v0 | 2945 ± 95(1) | 3003 ± 226 | 3041 ± 157(2) | - |
HopperBulletEnv-v0 | 2515 ± 50(1) | 2555 ± 405 | 2401 ± 62(2) | - |
HumanoidBulletEnv-v0 | - | - | - | - |
MinitaurBulletEnv-v0 | - | - | - | - |
Releases
- SAC + gSDE + Huber loss
is stored here, branch r2.0 - SAC + TQC + gSDE + LogCosh + Reverb
is stored here, branch r4.0
Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV
Changes
v4.0.0 (February 5, 2022)
Features 🔊
- Render environments to WanDB
- Grouping of runs in WanDB
- SampleToInsertRatio rate limiter
- Global Gradient Clipping to avoid exploding gradients
- Softplus for numerical stability
- YAML configuration file
- LogCosh instead of Huber loss
- Critic network with Add layer applied on state & action branches
- Custom uniform initializer
- XLA (Accelerated Linear Algebra) compiler
- Optimized Replay Buffer (https://github.com/deepmind/reverb/issues/90)
- split into Agent, Learner, Tester and Server
Bug fixes 🛠️
- Fixed creating of saving path for models
- Fixed model's
summary()
v3.2.4 (July 7, 2021)
Features 🔊
- Reverb
setup.py
(package is available on PyPI)- split into Agent, Learner and Tester
- Use custom model and layer for defining Actor-Critic
- MultiCritic - concatenating multiple critic networks into one network
- Truncated Quantile Critics
v2.0.2 (May 23, 2021)
Features 🔊
- update Dockerfile
- update
README.md
- formatted code by Black & Flake8
v2.0.1 (April 27, 2021)
Bug fixes 🛠️
- fixed Critic model
v2.0.0 (April 22, 2021)
Features 🔊
- Add Huber loss
- In test mode, rendering to the video file
- Normalized observation by Min-max method
- Remove TD3 algorithm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rl-toolkit-4.0.0.tar.gz
(19.7 kB
view hashes)
Built Distribution
rl_toolkit-4.0.0-py3-none-any.whl
(23.2 kB
view hashes)
Close
Hashes for rl_toolkit-4.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c2a296fe6199cb2caa7c2551daeb5269551a3a271f7797c0ca950c367804402 |
|
MD5 | 51e72b8c13cf77c6ddfba9d96d6f1d73 |
|
BLAKE2b-256 | 8987285ac25c90316d0d0e9d8c6d0ce0f39ce8742426ad348d9fed36a6b56270 |