RL-Toolkit: A Research Framework for Robotics

These details have not been verified by PyPI

Project links

Project description

RL Toolkit

Tag Commits Languages Size

Papers

Installation with PyPI

On PC AMD64 with Ubuntu/Debian

Install dependences
```
apt update -y
apt install swig -y
```
Install RL-Toolkit
```
pip3 install rl-toolkit[all]
```

Run (for Server)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server

Run (for Agent)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost

Run (for Learner)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2

Run (for Tester)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5

On NVIDIA Jetson

Install dependences
Tensorflow for JetPack, follow instructions here for installation.

apt update -y
apt install swig -y

pip3 install 'tensorflow-probability==0.14.1'

Install Reverb
Download Bazel 3.7.2 for arm64
GitHub here

mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
chmod +x ~/bin/bazel
export PATH=$PATH:~/bin

Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !

git clone https://github.com/deepmind/reverb
cd reverb/
git checkout r0.5.0   # for TF 2.6.0

Make changes in Reverb before building !
In .bazelrc

- build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
+ # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain

- build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
+ build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1"

- build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
+ build --copt=-DEIGEN_MAX_ALIGN_BYTES=64

In WORKSPACE

- PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
+ PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"

In oss_build.sh

-  if [ "$python_version" = "3.7" ]; then
+  if [ "$python_version" = "3.6" ]; then
+    export PYTHON_BIN_PATH=/usr/bin/python3.6 && export PYTHON_LIB_PATH=/usr/local/lib/python3.6/dist-packages
+    ABI=cp36
+  elif [ "$python_version" = "3.7" ]; then

-  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
+  bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...

# Builds Reverb and creates the wheel package.
-  bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
+  bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
./bazel-bin/reverb/pip_package/build_pip_package --dst $OUTPUT_DIR $PIP_PKG_EXTRA_ARGS

In reverb/cc/platform/default/repo.bzl

urls = [
   -        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
   +        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
]

In reverb/pip_package/build_pip_package.sh

-  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
+  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG}  > /dev/null

Build and install

bash oss_build.sh --clean true --tf_dep_override "tensorflow=2.6.0" --release --python "3.6"
bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
pip3 install /tmp/reverb/dist/dm_reverb-*

Cleaning

cd ../
rm -R reverb/

Install RL-Toolkit
```
pip3 install rl-toolkit
```

Run (for Server)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server

Run (for Agent)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost

Run (for Learner)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2

Run (for Tester)

python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5

Environments

Environment	Observation space	Observation bounds	Action space	Action bounds
BipedalWalkerHardcore-v3	(24, )	[-inf, inf]	(4, )	[-1.0, 1.0]
Walker2DBulletEnv-v0	(22, )	[-inf, inf]	(6, )	[-1.0, 1.0]
AntBulletEnv-v0	(28, )	[-inf, inf]	(8, )	[-1.0, 1.0]
HalfCheetahBulletEnv-v0	(26, )	[-inf, inf]	(6, )	[-1.0, 1.0]
HopperBulletEnv-v0	(15, )	[-inf, inf]	(3, )	[-1.0, 1.0]
HumanoidBulletEnv-v0	(44, )	[-inf, inf]	(17, )	[-1.0, 1.0]
MinitaurBulletEnv-v0	(28, )	[-167.72488, 167.72488]	(8, )	[-1.0, 1.0]

Results

Environment	SAC + gSDE	SAC + gSDE + Huber loss	SAC + TQC + gSDE	SAC + TQC + gSDE + LogCosh + Reverb
BipedalWalkerHardcore-v3	13 ± 18⁽²⁾	-	228 ± 18⁽²⁾	-
Walker2DBulletEnv-v0	2270 ± 28⁽¹⁾	2732 ± 96	2535 ± 94⁽²⁾	-
AntBulletEnv-v0	3106 ± 61⁽¹⁾	3460 ± 119	3700 ± 37⁽²⁾	-
HalfCheetahBulletEnv-v0	2945 ± 95⁽¹⁾	3003 ± 226	3041 ± 157⁽²⁾	-
HopperBulletEnv-v0	2515 ± 50⁽¹⁾	2555 ± 405	2401 ± 62⁽²⁾	-
HumanoidBulletEnv-v0	-	-	-	-
MinitaurBulletEnv-v0	-	-	-	-

results rl-toolkit

Releases

SAC + gSDE + Huber loss
is stored here, branch r2.0
SAC + TQC + gSDE + LogCosh + Reverb
is stored here, branch r4.0

Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV

Changes

v4.0.0 (February 5, 2022)

Features 🔊

Render environments to WanDB
Grouping of runs in WanDB
SampleToInsertRatio rate limiter
Global Gradient Clipping to avoid exploding gradients
Softplus for numerical stability
YAML configuration file
LogCosh instead of Huber loss
Critic network with Add layer applied on state & action branches
Custom uniform initializer
XLA (Accelerated Linear Algebra) compiler
Optimized Replay Buffer (https://github.com/deepmind/reverb/issues/90)
split into Agent, Learner, Tester and Server

Bug fixes 🛠️

Fixed creating of saving path for models
Fixed model's summary()

v3.2.4 (July 7, 2021)

Features 🔊

Reverb
setup.py (package is available on PyPI)
split into Agent, Learner and Tester
Use custom model and layer for defining Actor-Critic
MultiCritic - concatenating multiple critic networks into one network
Truncated Quantile Critics

v2.0.2 (May 23, 2021)

Features 🔊

update Dockerfile
update README.md
formatted code by Black & Flake8

v2.0.1 (April 27, 2021)

Bug fixes 🛠️

fixed Critic model

v2.0.0 (April 22, 2021)

Features 🔊

Add Huber loss
In test mode, rendering to the video file
Normalized observation by Min-max method
Remove TD3 algorithm

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

4.1.1

Sep 2, 2022

4.1.0

Feb 9, 2022

This version

4.0.0

Feb 5, 2022

3.2.5

Aug 3, 2021

3.2.4

Jul 7, 2021

3.2.3

Jun 6, 2021

3.2.2

Jun 6, 2021

3.2.1

Jun 6, 2021

3.2.0

Jun 4, 2021

3.1.9

Jun 3, 2021

3.1.8

Jun 2, 2021

3.1.7

Jun 2, 2021

3.1.6

Jun 2, 2021

3.1.5

Jun 2, 2021

3.1.4

Jun 2, 2021

3.1.3

Jun 2, 2021

3.1.2

Jun 2, 2021

3.1.1

Jun 2, 2021

3.1.0

Jun 2, 2021

3.0.9

Jun 1, 2021

3.0.8

Jun 1, 2021

3.0.7

Jun 1, 2021

3.0.6

Jun 1, 2021

3.0.5

Jun 1, 2021

3.0.4

Jun 1, 2021

3.0.3

Jun 1, 2021

3.0.2

Jun 1, 2021

3.0.1

Jun 1, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rl-toolkit-4.0.0.tar.gz (19.7 kB view hashes)

Uploaded Feb 5, 2022 Source

Built Distribution

rl_toolkit-4.0.0-py3-none-any.whl (23.2 kB view hashes)

Uploaded Feb 5, 2022 Python 3

Hashes for rl-toolkit-4.0.0.tar.gz

Hashes for rl-toolkit-4.0.0.tar.gz
Algorithm	Hash digest
SHA256	`92151373d522ee3f5974f0d70a932228faffb36cd5a571c28a1ee4135bc6e659`
MD5	`e56e2115040d9e5d7d3cddd593c6976a`
BLAKE2b-256	`52e02a82d87ab873be553d7f92984a9c01cb7436c3a0d2e6afceb8aba6ce8380`

Hashes for rl_toolkit-4.0.0-py3-none-any.whl

Hashes for rl_toolkit-4.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c2a296fe6199cb2caa7c2551daeb5269551a3a271f7797c0ca950c367804402`
MD5	`51e72b8c13cf77c6ddfba9d96d6f1d73`
BLAKE2b-256	`8987285ac25c90316d0d0e9d8c6d0ce0f39ce8742426ad348d9fed36a6b56270`