RL-Toolkit: A Research Framework for Robotics
Project description
RL Toolkit
Papers
- Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor
- Soft Actor-Critic
- Generalized State-Dependent Exploration
- Reverb: A framework for experience replay
- Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
- Acme: A Research Framework for Distributed Reinforcement Learning
- Dueling Network Architectures for Deep Reinforcement Learning
- Attention Is All You Need
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Installation with PyPI
On PC AMD64 with Ubuntu/Debian
- Install dependences
apt update -y apt install swig -y
- Install RL-Toolkit
pip3 install rl-toolkit[all]
- Run (for Server)
rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server
Run (for Agent)rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent
Run (for Learner)rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2
Run (for Tester)rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5
On NVIDIA Jetson
-
Install dependences
Tensorflow for JetPack, follow instructions here for installation.sudo apt install swig -y
-
Install Reverb
Download Bazel 3.7.2 for arm64, heremkdir ~/bin mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel chmod +x ~/bin/bazel export PATH=$PATH:~/bin
Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
git clone https://github.com/deepmind/reverb cd reverb/ git checkout r0.9.0
Make changes in Reverb before building !
In .bazelrc- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64 + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
In WORKSPACE
- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
In oss_build.sh
- bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/... + bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/... # Builds Reverb and creates the wheel package. - bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package + bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
In reverb/cc/platform/default/repo.bzl
urls = [ - "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version), + "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version), ]
In reverb/pip_package/build_pip_package.sh
- "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null + "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
Build and install
bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8" bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release pip3 install /tmp/reverb/dist/dm_reverb-*
Cleaning
cd ../ rm -R reverb/
-
Install RL-Toolkit
pip3 install rl-toolkit
Environments
| Environment | Observation space | Observation bounds | Action space | Action bounds | Reward bounds |
|---|---|---|---|---|---|
| BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] | [-1.0, 1.0] |
| FlappyBird-v0 | (16, 180) | [0, dmax] | (2, ) | {DO NOTHING, FLAP} | [-1.0, 1.0] |
Results
| Environment | SAC + gSDE |
SAC + gSDE + Huber loss |
SAC + TQC + gSDE |
Q-Learning | RL-Toolkit |
|---|---|---|---|---|---|
| BipedalWalkerHardcore-v3 | 13 ± 18(1) | 239 ± 118 | 228 ± 18(1) | - | 205 ± 134 |
| FlappyBird-v0 | - | - | - | 209.298(2) | 13 156 |
Releases
- SAC + gSDE + Huber loss
is stored here, branch r2.0 - SAC + TQC + gSDE + LogCosh + Reverb
is stored here, branch r4.0 - DQN + SAC agents branch r4.0
Frameworks: Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV
RL Toolkit
Papers
- Playing Flappy Bird Based on Motion Recognition Using a Transformer Model and LIDAR Sensor
- Soft Actor-Critic
- Generalized State-Dependent Exploration
- Reverb: A framework for experience replay
- Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
- Acme: A Research Framework for Distributed Reinforcement Learning
- Dueling Network Architectures for Deep Reinforcement Learning
- Attention Is All You Need
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Installation with PyPI
On PC AMD64 with Ubuntu/Debian
- Install dependences
apt update -y apt install swig -y
- Install RL-Toolkit
pip3 install rl-toolkit[all]
- Run (for Server)
rl_toolkit rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 server
Run (for Agent)rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 agent
Run (for Learner)rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 learner --db_server 192.168.1.2
Run (for Tester)rl_toolkit -c ./config/sac.yaml -a sac -e BipedalWalkerHardcore-v3 tester -f save/model/actor.h5
On NVIDIA Jetson
-
Install dependences
Tensorflow for JetPack, follow instructions here for installation.sudo apt install swig -y
-
Install Reverb
Download Bazel 3.7.2 for arm64, heremkdir ~/bin mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel chmod +x ~/bin/bazel export PATH=$PATH:~/bin
Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !
git clone https://github.com/deepmind/reverb cd reverb/ git checkout r0.9.0
Make changes in Reverb before building !
In .bazelrc- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64 + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
In WORKSPACE
- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55" + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
In oss_build.sh
- bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/... + bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/... # Builds Reverb and creates the wheel package. - bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package + bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
In reverb/cc/platform/default/repo.bzl
urls = [ - "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version), + "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version), ]
In reverb/pip_package/build_pip_package.sh
- "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null + "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} > /dev/null
Build and install
bash oss_build.sh --clean true --tf_dep_override "tensorflow~=2.9.1" --release --python "3.8" bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release pip3 install /tmp/reverb/dist/dm_reverb-*
Cleaning
cd ../ rm -R reverb/
-
Install RL-Toolkit
pip3 install rl-toolkit
Environments
| Environment | Observation space | Observation bounds | Action space | Action bounds | Reward bounds |
|---|---|---|---|---|---|
| BipedalWalkerHardcore-v3 | (24, ) | [-inf, inf] | (4, ) | [-1.0, 1.0] | [-1.0, 1.0] |
| FlappyBird-v0 | (16, 180) | [0, dmax] | (2, ) | {DO NOTHING, FLAP} | [-1.0, 1.0] |
Results
| Environment | SAC + gSDE |
SAC + gSDE + Huber loss |
SAC + TQC + gSDE |
Q-Learning | RL-Toolkit |
|---|---|---|---|---|---|
| BipedalWalkerHardcore-v3 | 13 ± 18(1) | 239 ± 118 | 228 ± 18(1) | - | 205 ± 134 |
| FlappyBird-v0 | - | - | - | 209.298(2) | 13 156 |
Releases
- SAC + gSDE + Huber loss
is stored here, branch r2.0 - SAC + TQC + gSDE + LogCosh + Reverb
is stored here, branch r4.0 - DQN + SAC agents branch r4.0
Frameworks: Tensorflow, DeepMind Reverb, Gymnasium, DeepMind Control Suite, WanDB, OpenCV
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rl-toolkit-5.0.0.tar.gz.
File metadata
- Download URL: rl-toolkit-5.0.0.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d64faf5ebdf5bcbdb8fcd76c6b9a6c9e6b2cf33801c73bf69d0e6b07a39de5d
|
|
| MD5 |
8ae5d6577f0a99a04a9ca3996e47eedc
|
|
| BLAKE2b-256 |
9e9fed0d42035d06eca2c09eca4e5079f1019ddaa4735eebc356c9eb9b87e895
|
File details
Details for the file rl_toolkit-5.0.0-py3-none-any.whl.
File metadata
- Download URL: rl_toolkit-5.0.0-py3-none-any.whl
- Upload date:
- Size: 23.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1544438396d539ee0d101eeed6dc6b5fd1a4d94e930d429f1394f2664530df63
|
|
| MD5 |
99d63181e88353c378892f3109bfe182
|
|
| BLAKE2b-256 |
0da8c98dfc5d9163873690a312923d2e5f6bd8797d2a54983b2b39b506185f2a
|