Skip to main content

Implementations of reinforcement learning algorithms

Project description

rl-algo-impls

Implementations of reinforcement learning algorithms.

Prerequisites: Weights & Biases (WandB)

Training and benchmarking assumes you have a Weights & Biases project to upload runs to. By default training goes to a rl-algo-impls project while benchmarks go to rl-algo-impls-benchmarks. During training and benchmarking runs, videos of the best models and the model weights are uploaded to WandB.

Before doing anything below, you'll need to create a wandb account and run wandb login.

Setup and Usage

Lambda Labs instance for benchmarking

Benchmark runs are uploaded to WandB, which can be made into reports (for example). So far I've found Lambda Labs A10 instances to be a good balance of performance (14 hours to train PPO in 14 environments [5 basic gym, 4 PyBullet, CarRacing-v0, and 4 Atari] across 3 seeds) vs cost ($0.60/hr).

git clone https://github.com/sgoodfriend/rl-algo-impls.git
cd rl-algo-impls
# git checkout BRANCH_NAME if running on non-main branch
bash ./scripts/setup.sh
wandb login
bash ./scripts/benchmark.sh [-a {"ppo"}] [-e ENVS] [-j {6}] [-p {rl-algo-impls-benchmarks}] [-s {"1 2 3"}]

Benchmarking runs are by default upload to a rl-algo-impls-benchmarks project. Runs upload videos of the running best model and the weights of the best and last model. Benchmarking runs are tagged with a shorted commit hash (i.e., benchmark_5598ebc) and hostname (i.e., host_192-9-145-26)

Publishing models to Huggingface

Publishing benchmarks to Huggingface requires logging into Huggingface with a write-capable API token:

git config --global credential.helper store
huggingface-cli login
# For example: python benchmark_publish.py --wandb-tags host_192-9-147-166 benchmark_1d4094f --wandb-report-url https://api.wandb.ai/links/sgoodfriend/099h4lvj
# --virtual-display likely must be specified if running on a remote machine.
python benchmark_publish.py --wandb-tags HOST_TAG COMMIT_TAG --wandb-report-url WANDB_REPORT_URL [--virtual-display]

Hyperparameter tuning with Optuna

Hyperparameter tuning can be done with the tuning/tuning.sh script, which runs multiple processes of optimize.py. Start by doing all the setup meant for training before running tuning/tuning.sh:

# Setup similar to training above
wandb login
bash scripts/tuning.sh -a ALGO -e ENV -j N_JOBS -s NUM_SEEDS

Google Colab Pro+

3 notebooks in the colab directory are setup to be used with Google Colab:

  • colab_benchmark.ipynb: Even with a Google Colab Pro+ subscription you'd need to only run parts of the benchmark. The file recommends 4 splits (basic+pybullet, carcarcing, atari1, atari2) because it would otherwise exceed the 24-hour session limit. This mostly comes from being unable to get pool_size above 1 because of WandB errors.
  • colab_train.ipynb: Train models while being able to specify the env, seeds, and algo. By default training runs are uploaded to the rl-algo-impls project.
  • colab_enjoy.ipynb: Download models from WandB and evaluate them. Training is likely to be more interesting given videos are uploaded.

macOS

Installation

My local development has been on an M1 Mac. These instructions might not be complete, but these are the approximate setup and usage I've been using:

  1. Install libraries with homebrew
brew install swig
brew install --cask xquartz
  1. Download and install Miniconda for arm64
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
sh Miniconda3-latest-MacOSX-arm64.sh
  1. Create a conda environment from this repo's environment.yml
conda env create -f environment.yml -n rl_algo_impls
conda activate rl_algo_impls
  1. Install other dependencies with poetry
poetry install

Usage

Training, benchmarking, and watching the agents playing the environments can be done locally:

python train.py [-h] [--algo {ppo}] [--env ENV [ENV ...]] [--seed [SEED ...]] [--wandb-project-name WANDB_PROJECT_NAME] [--wandb-tags [WANDB_TAGS ...]] [--pool-size POOL_SIZE] [-virtual-display]

train.py by default uploads to the rl-algo-impls WandB project. Training creates videos of the running best model, which will cause popups. Creating the first video requires a display, so you shouldn't shutoff the display until the video of the initial model is created (1-5 minutes depending on environment). The --virtual-display flag should allow headless mode, but that hasn't been reliable on macOS.

python enjoy.py [-h] [--algo {ppo}] [--env ENV] [--seed SEED] [--render RENDER] [--best BEST] [--n_episodes N_EPISODES] [--deterministic-eval DETERMINISTIC_EVAL] [--no-print-returns]
# OR
python enjoy.py [--wandb-run-path WANDB_RUN_PATH]

The first enjoy.py where you specify algo, env, and seed loads a model you locally trained with those parameters and renders the agent playing the environment.

The second enjoy.py downloads the model and hyperparameters from a WandB run. An example run path is sgoodfriend/rl-algo-impls-benchmarks/09gea50g

Hyperparameters

These are specified in yaml files in the hyperparams directory by game (atari is a special case for all Atari games).

procgen Setup

procgen envs use gym3, which don't expose a straightforward way to set seed to allow for repeatable runs.

openai/procgen doesn't support Apple Silicon, but patch instructions exist. The changes to the repo are for now in a fork since the openai/procgen project is in maintenance mode:

brew install wget cmake glow qt5
git clone https://github.com/sgoodfriend/procgen.git
cd procgen
pip install -e .
python -c "from procgen import ProcgenGym3Env; ProcgenGym3Env(num=1, env_name='coinrun')"
python -m procgen.interactive

amd64 Linux machines (e.g., Lambda Labs and Google Colab) should install procgen with python -m pip install '.[procgen]'

gym-microrts Setup

python -m pip install -e '.[microrts]'

Requires Java SDK to also be installed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rl_algo_impls-0.0.13.tar.gz (108.3 kB view hashes)

Uploaded Source

Built Distribution

rl_algo_impls-0.0.13-py3-none-any.whl (142.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page