RL-Toolkit: A Research Framework for Robotics
Project description
RL Toolkit
Papers
- Soft Actor-Critic
- Generalized State-Dependent Exploration
- Reverb: A framework for experience replay
- Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
- Acme: A Research Framework for Distributed Reinforcement Learning
Installation with PyPI
On PC AMD64 with Ubuntu/Debian
- Install dependences
apt update -y apt install swig -y
- Install RL-Toolkit
pip3 install rl-toolkit[all]
- Run (for Server)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
Run (for Agent)python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
Run (for Learner)python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
Run (for Tester)python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5
On NVIDIA Jetson
-
Install dependences
Tensorflow for JetPack, follow instructions here for installation.apt update -y apt install swig -y pip3 install 'tensorflow-probability==0.14.1'
-
Install Reverb
Download Bazel 3.7.2 for arm64
GitHub heremv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel chmod +x ~/bin/bazel export PATH=$PATH:~/bin
Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
git clone https://github.com/deepmind/reverb cd reverb/ git checkout r0.5.0 # for TF 2.6.0
Make changes in Reverb before building !
In .bazelrc- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain - build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" + build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1" - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64 + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
In WORKSPACE
- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
In oss_build.sh
- if [ "$python_version" = "3.7" ]; then + if [ "$python_version" = "3.6" ]; then + export PYTHON_BIN_PATH=/usr/bin/python3.6 && export PYTHON_LIB_PATH=/usr/local/lib/python3.6/dist-packages + ABI=cp36 + elif [ "$python_version" = "3.7" ]; then - bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/... + bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/... # Builds Reverb and creates the wheel package. - bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package + bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package ./bazel-bin/reverb/pip_package/build_pip_package --dst $OUTPUT_DIR $PIP_PKG_EXTRA_ARGS
In reverb/cc/platform/default/repo.bzl
urls = [ - "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version), + "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version), ]
In reverb/pip_package/build_pip_package.sh
- "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null + "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
Build and install
bash oss_build.sh --clean true --tf_dep_override "tensorflow=2.6.0" --release --python "3.6" bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release pip3 install /tmp/reverb/dist/dm_reverb-*
Cleaning
cd ../ rm -R reverb/
-
Install RL-Toolkit
pip3 install rl-toolkit
-
Run (for Server)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
Run (for Agent)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
Run (for Learner)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
Run (for Tester)
python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5
Environments
| Environment | Observation space | Observation bounds | Action space | Action bounds |
|---|---|---|---|---|
| BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] |
| Walker2DBulletEnv-v0 | (22, ) | [-inf, inf] | (6, ) | [-1.0, 1.0] |
| AntBulletEnv-v0 | (28, ) | [-inf, inf] | (8, ) | [-1.0, 1.0] |
| HalfCheetahBulletEnv-v0 | (26, ) | [-inf, inf] | (6, ) | [-1.0, 1.0] |
| HopperBulletEnv-v0 | (15, ) | [-inf, inf] | (3, ) | [-1.0, 1.0] |
| HumanoidBulletEnv-v0 | (44, ) | [-inf, inf] | (17, ) | [-1.0, 1.0] |
| MinitaurBulletEnv-v0 | (28, ) | [-167.72488, 167.72488] | (8, ) | [-1.0, 1.0] |
Results
| Environment | SAC + gSDE |
SAC + gSDE + Huber loss |
SAC + TQC + gSDE |
SAC + TQC + gSDE + LogCosh + Reverb |
|---|---|---|---|---|
| BipedalWalkerHardcore-v3 | 13 ± 18(2) | - | 228 ± 18(2) | - |
| Walker2DBulletEnv-v0 | 2270 ± 28(1) | 2732 ± 96 | 2535 ± 94(2) | - |
| AntBulletEnv-v0 | 3106 ± 61(1) | 3460 ± 119 | 3700 ± 37(2) | - |
| HalfCheetahBulletEnv-v0 | 2945 ± 95(1) | 3003 ± 226 | 3041 ± 157(2) | - |
| HopperBulletEnv-v0 | 2515 ± 50(1) | 2555 ± 405 | 2401 ± 62(2) | - |
| HumanoidBulletEnv-v0 | - | - | - | - |
| MinitaurBulletEnv-v0 | - | - | - | - |
Releases
- SAC + gSDE + Huber loss
is stored here, branch r2.0 - SAC + TQC + gSDE + LogCosh + Reverb
is stored here, branch r4.0
Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV
Changes
v4.1.1 (September 2, 2022)
- update default
config.yaml
v4.1.0 (February 9, 2022)
Features 🔊
- .fit()
- AgentCallback
v4.0.0 (February 5, 2022)
Features 🔊
- Render environments to WanDB
- Grouping of runs in WanDB
- SampleToInsertRatio rate limiter
- Global Gradient Clipping to avoid exploding gradients
- Softplus for numerical stability
- YAML configuration file
- LogCosh instead of Huber loss
- Critic network with Add layer applied on state & action branches
- Custom uniform initializer
- XLA (Accelerated Linear Algebra) compiler
- Optimized Replay Buffer (https://github.com/deepmind/reverb/issues/90)
- split into Agent, Learner, Tester and Server
Bug fixes 🛠️
- Fixed creating of saving path for models
- Fixed model's
summary()
v3.2.4 (July 7, 2021)
Features 🔊
- Reverb
setup.py(package is available on PyPI)- split into Agent, Learner and Tester
- Use custom model and layer for defining Actor-Critic
- MultiCritic - concatenating multiple critic networks into one network
- Truncated Quantile Critics
v2.0.2 (May 23, 2021)
Features 🔊
- update Dockerfile
- update
README.md - formatted code by Black & Flake8
v2.0.1 (April 27, 2021)
Bug fixes 🛠️
- fixed Critic model
v2.0.0 (April 22, 2021)
Features 🔊
- Add Huber loss
- In test mode, rendering to the video file
- Normalized observation by Min-max method
- Remove TD3 algorithm
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rl-toolkit-4.1.1.tar.gz.
File metadata
- Download URL: rl-toolkit-4.1.1.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9cf5ce00718a64b729e46e806cc92d0eee8a2433029cde3ca1834b3cc879659
|
|
| MD5 |
e3722afcc7414a9d60d182a3e2324a02
|
|
| BLAKE2b-256 |
8496c3ee90511a7deed55e1fe7d8b9cc0e81a107a649984662bfe05f6d3d7ca0
|
File details
Details for the file rl_toolkit-4.1.1-py3-none-any.whl.
File metadata
- Download URL: rl_toolkit-4.1.1-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69d72edd01a2695744a79d2df1eb61d3e1c5da79dabf3a92bac4b23bb8240a79
|
|
| MD5 |
57a0bea6a4377dd89ed7c8e936064c01
|
|
| BLAKE2b-256 |
f9e06a09ccf4a6f72c64be95f10986fd9a1610eaa5240a7ee6aae6f31e368a35
|