Skip to main content

RL-Toolkit: A Research Framework for Robotics

Project description

RL Toolkit

Release Tag Issues Commits Languages Size

Papers

Installation with PyPI

On PC AMD64 with Ubuntu/Debian

  1. Install dependences
    apt update -y
    apt install swig -y
    
  2. Install RL-Toolkit
    pip3 install rl-toolkit[all]
    
  3. Run (for Server)
    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
    
    Run (for Agent)
    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
    
    Run (for Learner)
    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
    
    Run (for Tester)
    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5
    

On NVIDIA Jetson

  1. Install dependences
    Tensorflow for JetPack, follow instructions here for installation.

    apt update -y
    apt install swig -y
    
    pip3 install 'tensorflow-probability==0.14.1'
    
  2. Install Reverb
    Download Bazel 3.7.2 for arm64
    GitHub here

    mv ~/Downloads/bazel-3.7.2-linux-arm64 ~/bin/bazel
    chmod +x ~/bin/bazel
    export PATH=$PATH:~/bin
    

    Clone Reverb with version that corespond with TF verion installed on NVIDIA Jetson !

    git clone https://github.com/deepmind/reverb
    cd reverb/
    git checkout r0.5.0   # for TF 2.6.0
    

    Make changes in Reverb before building !
    In .bazelrc

    - build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
    + # build:manylinux2010 --crosstool_top=//third_party/toolchains/preconfig/ubuntu16.04/gcc7_manylinux2010:toolchain
    
    - build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
    + build --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=1"
    
    - build --copt=-mavx --copt=-DEIGEN_MAX_ALIGN_BYTES=64
    + build --copt=-DEIGEN_MAX_ALIGN_BYTES=64
    

    In WORKSPACE

    - PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
    + # PROTOC_SHA256 = "15e395b648a1a6dda8fd66868824a396e9d3e89bc2c8648e3b9ab9801bea5d55"
    + PROTOC_SHA256 = "7877fee5793c3aafd704e290230de9348d24e8612036f1d784c8863bc790082e"
    

    In oss_build.sh

    -  if [ "$python_version" = "3.7" ]; then
    +  if [ "$python_version" = "3.6" ]; then
    +    export PYTHON_BIN_PATH=/usr/bin/python3.6 && export PYTHON_LIB_PATH=/usr/local/lib/python3.6/dist-packages
    +    ABI=cp36
    +  elif [ "$python_version" = "3.7" ]; then
    
    -  bazel test -c opt --copt=-mavx --config=manylinux2010 --test_output=errors //reverb/cc/...
    +  bazel test -c opt --copt="-march=armv8-a+crypto" --test_output=errors //reverb/cc/...
    
    # Builds Reverb and creates the wheel package.
    -  bazel build -c opt --copt=-mavx $EXTRA_OPT --config=manylinux2010 reverb/pip_package:build_pip_package
    +  bazel build -c opt --copt="-march=armv8-a+crypto" $EXTRA_OPT reverb/pip_package:build_pip_package
    ./bazel-bin/reverb/pip_package/build_pip_package --dst $OUTPUT_DIR $PIP_PKG_EXTRA_ARGS
    

    In reverb/cc/platform/default/repo.bzl

    urls = [
       -        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-x86_64.zip" % (version, version),
       +        "https://github.com/protocolbuffers/protobuf/releases/download/v%s/protoc-%s-linux-aarch_64.zip" % (version, version),
    ]
    

    In reverb/pip_package/build_pip_package.sh

    -  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG} --plat manylinux2010_x86_64 > /dev/null
    +  "${PYTHON_BIN_PATH}" setup.py bdist_wheel ${PKG_NAME_FLAG} ${RELEASE_FLAG} ${TF_VERSION_FLAG}  > /dev/null
    

    Build and install

    bash oss_build.sh --clean true --tf_dep_override "tensorflow=2.6.0" --release --python "3.6"
    bash ./bazel-bin/reverb/pip_package/build_pip_package --dst /tmp/reverb/dist/ --release
    pip3 install /tmp/reverb/dist/dm_reverb-*
    

    Cleaning

    cd ../
    rm -R reverb/      
    
  3. Install RL-Toolkit

    pip3 install rl-toolkit
    
  4. Run (for Server)

    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 server
    

    Run (for Agent)

    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 agent --db_server localhost
    

    Run (for Learner)

    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 learner --db_server 192.168.1.2
    

    Run (for Tester)

    python3 -m rl_toolkit -c ./rl_toolkit/config.yaml -e MinitaurBulletEnv-v0 tester -f save/model/actor.h5
    

Environments

Environment Observation space Observation bounds Action space Action bounds
BipedalWalkerHardcore-v3 (24, ) [-inf, inf] (4, ) [-1.0, 1.0]
Walker2DBulletEnv-v0 (22, ) [-inf, inf] (6, ) [-1.0, 1.0]
AntBulletEnv-v0 (28, ) [-inf, inf] (8, ) [-1.0, 1.0]
HalfCheetahBulletEnv-v0 (26, ) [-inf, inf] (6, ) [-1.0, 1.0]
HopperBulletEnv-v0 (15, ) [-inf, inf] (3, ) [-1.0, 1.0]
HumanoidBulletEnv-v0 (44, ) [-inf, inf] (17, ) [-1.0, 1.0]
MinitaurBulletEnv-v0 (28, ) [-167.72488, 167.72488] (8, ) [-1.0, 1.0]

Results

Environment SAC
+ gSDE
SAC
+ gSDE
+ Huber loss
SAC
+ TQC
+ gSDE
SAC
+ TQC
+ gSDE
+ LogCosh
+ Reverb
BipedalWalkerHardcore-v3 13 ± 18(2) - 228 ± 18(2) -
Walker2DBulletEnv-v0 2270 ± 28(1) 2732 ± 96 2535 ± 94(2) -
AntBulletEnv-v0 3106 ± 61(1) 3460 ± 119 3700 ± 37(2) -
HalfCheetahBulletEnv-v0 2945 ± 95(1) 3003 ± 226 3041 ± 157(2) -
HopperBulletEnv-v0 2515 ± 50(1) 2555 ± 405 2401 ± 62(2) -
HumanoidBulletEnv-v0 - - - -
MinitaurBulletEnv-v0 - - - -

results rl-toolkit

Releases

  • SAC + gSDE + Huber loss
      is stored here, branch r2.0
  • SAC + TQC + gSDE + LogCosh + Reverb
      is stored here, branch r4.0

Frameworks: Tensorflow, Reverb, OpenAI Gym, PyBullet, WanDB, OpenCV

Changes

v4.1.1 (September 2, 2022)

  • update default config.yaml

v4.1.0 (February 9, 2022)

Features 🔊

  • .fit()
  • AgentCallback

v4.0.0 (February 5, 2022)

Features 🔊

  • Render environments to WanDB
  • Grouping of runs in WanDB
  • SampleToInsertRatio rate limiter
  • Global Gradient Clipping to avoid exploding gradients
  • Softplus for numerical stability
  • YAML configuration file
  • LogCosh instead of Huber loss
  • Critic network with Add layer applied on state & action branches
  • Custom uniform initializer
  • XLA (Accelerated Linear Algebra) compiler
  • Optimized Replay Buffer (https://github.com/deepmind/reverb/issues/90)
  • split into Agent, Learner, Tester and Server

Bug fixes 🛠️

  • Fixed creating of saving path for models
  • Fixed model's summary()

v3.2.4 (July 7, 2021)

Features 🔊

  • Reverb
  • setup.py (package is available on PyPI)
  • split into Agent, Learner and Tester
  • Use custom model and layer for defining Actor-Critic
  • MultiCritic - concatenating multiple critic networks into one network
  • Truncated Quantile Critics

v2.0.2 (May 23, 2021)

Features 🔊

  • update Dockerfile
  • update README.md
  • formatted code by Black & Flake8

v2.0.1 (April 27, 2021)

Bug fixes 🛠️

  • fixed Critic model

v2.0.0 (April 22, 2021)

Features 🔊

  • Add Huber loss
  • In test mode, rendering to the video file
  • Normalized observation by Min-max method
  • Remove TD3 algorithm

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rl-toolkit-4.1.1.tar.gz (20.0 kB view hashes)

Uploaded Source

Built Distribution

rl_toolkit-4.1.1-py3-none-any.whl (23.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page