Algorithm and utilities for deep reinforcement learning
Project description
Rainy
Reinforcement learning utilities and algrithm implementations using PyTorch.
API documentation
COMING SOON
Supported python version
Python >= 3.6.1
Run examples
Though this project is still WIP, all examples are verified to work.
First, install pipenv. E.g. you can install it via
pip3 install pipenv --user
Then, clone this repository and create a virtual environment in it.
git clone https://github.com/kngwyu/Rainy.git
cd Rainy
pipenv --site-packages --three install
Now you are ready to start!
pipenv run python examples/acktr_cart_pole.py train
After training, you can run learned agents.
Please replace (log-directory) in the below command with your real log directory.
It should be named like acktr_cart_pole-190219-134651-35a14601.
pipenv run python acktr_cart_pole.py eval (log-directory) --render
You can also plot training results in your log directory. This command opens an ipython shell with your log file.
pipenv run python -m rainy.ipython
Then you can plot training rewards via
log = open_log('log-directory')
log.plot_reward(12 * 20, max_steps=int(4e5), title='ACKTR cart pole')
Run examples with MPI or NCCL
Distributed training is supported via horovod.
I recommend installing it in your site-packages(= not in the virtualenv).
E.g., if you want to use it with NCLL, you can install it via
sudo env HOROVOD_GPU_ALLREDUCE=NCCL pip3 install --no-cache-dir horovod
Once you install it, you can run the training script using horovodrun command.
E.g., if you want to use two hosts(localhost and anotherhost) and run ppo_atari.py, use
horovodrun -np 2 -H localhost:1,anotherhost:1 pipenv run python examples/ppo_atari.py train
Override configuration from CLI
Currently, Rainy provides an easy-to-use CLI via click. You can view its usages by, say,
pipenv run python examples/a2c_cart_pole.py --help
This CLI has a simple data-driven interface. I.e., once you fill a config object, then all commands(train, eval, retarain, and etc.) work. So you can start experiments easily without copying and pasting, say, argument parser codes.
However, it has a limitation that you cannot add new options.
So Rainy-CLI provides an option named override, which executes the given string as a Python code
with the config object set as config.
Example usage:
pipenv run python examples/a2c_cart_pole.py --override='config.grad_clip=0.5; config.nsteps=10' train
If this feature still doesn't satisfy your requirement, you can
override subcommands by ctx.invoke.
Implementation Status
| Algorithm | Multi Worker(Sync) | Recurrent | Discrete Action | Continuous Action | MPI |
|---|---|---|---|---|---|
| DQN/Double DQN | :x: | :x: | :heavy_check_mark: | :x: | :x: |
| DDPG | :x: | :x: | :x: | :heavy_check_mark: | :x: |
| TD3 | :x: | :x: | :x: | :heavy_check_mark: | :x: |
| PPO | :heavy_check_mark: | :heavy_check_mark:(1) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| A2C | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :x: |
| ACKTR | :heavy_check_mark: | :x:(2) | :heavy_check_mark: | :heavy_check_mark: | :x: |
| AOC | :heavy_check_mark: | :x: | :heavy_check_mark: | :heavy_check_mark: | :x: |
(1): It's very unstable
(2): Needs https://openreview.net/forum?id=HyMTkQZAb implemented
Sub packages
- intrinsic-rewards
- Contains an implementation of RND(Random Network Distillation)
References
DQN (Deep Q Network)
DDQN (Double DQN)
DDPQ(Deep Deterministic Policy Gradient)
TD3(Twin Delayed Deep Deterministic Policy Gradient)
A2C (Advantage Actor Critic)
- http://proceedings.mlr.press/v48/mniha16.pdf , https://arxiv.org/abs/1602.01783 (A3C, original version)
- https://blog.openai.com/baselines-acktr-a2c/ (A2C, synchronized version)
ACKTR (Actor Critic using Kronecker-Factored Trust Region)
PPO (Proximal Policy Optimization)
AOC (Advantage Option Critic)
- https://arxiv.org/abs/1609.05140 (DQN-like option critic)
- https://arxiv.org/abs/1709.04571 (A3C-like option critic called A2OC)
Implementaions I referenced
I referenced mainly openai baselines, but all these pacakages were useful.
Thanks!
https://github.com/openai/baselines
https://github.com/ikostrikov/pytorch-a2c-ppo-acktr
https://github.com/ShangtongZhang/DeepRL
https://github.com/chainer/chainerrl
https://github.com/Thrandis/EKFAC-pytorch (for ACKTR)
https://github.com/jeanharb/a2oc_delib (for AOC)
https://github.com/sfujim/TD3 (for DDPG and TD3)
License
This project is licensed under Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rainy-0.4.0.tar.gz.
File metadata
- Download URL: rainy-0.4.0.tar.gz
- Upload date:
- Size: 48.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ed888e568dad96712211f0648bd8966923f7527731fc420c766e2f92bcbaec8
|
|
| MD5 |
002b4987582c77d9da5d921f063dbe99
|
|
| BLAKE2b-256 |
5322c19999575577687fca74000a92b68cf80b66ca7f309db75430a274949cdc
|
File details
Details for the file rainy-0.4.0-py3-none-any.whl.
File metadata
- Download URL: rainy-0.4.0-py3-none-any.whl
- Upload date:
- Size: 81.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90fe69b284463ac23a8ae611fe1a5ad80c1404daab9ba090d5cdac0465ded962
|
|
| MD5 |
90af222cd09864ff18d07bef54ba6467
|
|
| BLAKE2b-256 |
655628ca6c930556931be8cbcc99bce576e967dab0e4aa8cc1d3b94052538170
|