Deep reinforcement learning framework for fast prototyping based on PyTorch
Project description
Welcome to actorch
, a deep reinforcement learning framework for fast prototyping based on
PyTorch. The following algorithms are included:
- REINFORCE
- Advantage Actor-Critic (A2C)
- Actor-Critic Kronecker-Factored Trust Region (ACKTR)
- Proximal Policy Optimization (PPO)
💡 Key features
- Support for custom observation/action spaces
- Support for custom multimodal input multimodal output (recurrent) models
- Support for custom policy/value distributions
- Support for custom preprocessing/postprocessing pipelines
- Support for custom exploration strategies
- Support for normalizing flows
- Batched environments (both for training and evaluation)
- Batched trajectory replay
- Batched and distributional value estimation (e.g. distributional V-trace)
- Data parallel and distributed data parallel multi-GPU training and evaluation
- Automatic mixed precision training
- Integration with Ray Tune for experiment execution and hyperparameter tuning at any scale
- Effortless experiment definition through Python-based configuration files
- Built-in visualizer to monitor experiment progress
- Highly modular object-oriented design
- Detailed API documentation
🛠️️ Installation
For Windows, make sure the latest Visual C++ runtime is installed.
Using Pip
First of all, install Python. Open a terminal and run:
pip install actorch[visualizer]
If you don't need the visualizer
(e.g. you are installing actorch
on a headless server),
run:
pip install actorch
Using Conda
Clone or download and extract the repository, navigate to <path-to-repository>/bin
and run
the installation script (install.sh
for Linux/macOS, install.bat
for Windows).
actorch
(including the visualizer
) and its dependencies (pinned to a specific version) will
be installed in a Conda virtual environment named actorch-env
.
NOTE: you can directly use actorch-env
and the actorch
package in the local project
directory for development (see For development).
Using Docker (Linux/macOS only)
First of all, install Docker and NVIDIA Container Runtime.
Clone or download and extract the repository, navigate to <path-to-repository>
, open a
terminal and run:
docker build -t <desired-image-name> . # Build image
docker run -it --runtime=nvidia <desired-image-name> # Run container from image
actorch
(including the visualizer
) and its dependencies (pinned to a specific version) will
be installed in the specified Docker image.
NOTE: you can directly use the actorch
package in the local project directory inside
a Docker container run from the specified Docker image for development (see For development).
From source
First of all, install Python.
Clone or download and extract the repository, navigate to <path-to-repository>
, open a
terminal and run:
pip install .[visualizer]
If you don't need the visualizer
(e.g. you are installing actorch
on a headless server),
run:
pip install .
For development
First of all, install Python and Git.
Clone or download and extract the repository, navigate to <path-to-repository>
, open a
terminal and run:
pip install -e .[all]
pre-commit install -f
This will install the package in editable mode (any change to the package in the local
project directory will automatically reflect on the environment-wide package installed
in the site-packages
directory of your environment) along with its development, test
and optional dependencies.
Additionally, it installs a git commit hook.
Each time you commit, unit tests, static type checkers, code formatters and linters are
run automatically. Run pre-commit run --all-files
to check that the hook was successfully
installed. For more details, see pre-commit
's documentation.
▶️ Quickstart
In this example we will solve the OpenAI Gym environment
CartPole-v1
using REINFORCE.
Copy the following configuration in a file named REINFORCE_CartPole-v1.py
(with the
same indentation):
import gym
from torch.optim import Adam
from actorch import *
experiment_params = ExperimentParams(
run_or_experiment=REINFORCE,
stop={"training_iteration": 30},
resources_per_trial={"cpu": 1, "gpu": 0},
checkpoint_freq=10,
checkpoint_at_end=True,
log_to_file=True,
export_formats=["checkpoint", "model"],
config=REINFORCE.Config(
train_env_builder=lambda **config: ParallelBatchedEnv(
lambda **config: gym.make("CartPole-v1", **config),
config,
num_workers=2,
),
train_num_episodes_per_iteration=10,
eval_interval_iterations=10,
eval_env_config={"render_mode": None},
eval_num_episodes_per_iteration=10,
policy_network_model_builder=FCNet,
policy_network_model_config={
"torso_fc_configs": [
{"out_features": 64, "bias": True}
],
},
policy_network_optimizer_builder=Adam,
policy_network_optimizer_config={"lr": 1e-1},
discount=0.99,
entropy_coeff=0.001,
max_grad_l2_norm=0.5,
seed=0,
enable_amp=False,
enable_reproducibility=True,
log_sys_usage=True,
suppress_warnings=False,
),
)
Open a terminal in the directory where you saved the configuration file and run:
pip install gym[classic_control] # Install dependencies for CartPole-v1
actorch run REINFORCE_CartPole-v1.py # Run experiment
Wait for a few minutes until the training has finished. The mean cumulative reward over the last 100 episodes should exceed 475, which means that the environment has been solved. You can now visualize the experiment progress stored in the generated TensorBoard files using Plotly:
cd experiments/REINFORCE_CartPole-v1/<auto-generated-experiment-name>
actorch visualize plotly tensorboard
You can find the generated plots in plots
.
Congratulations, you have run your first experiment!
NOTE: if you installed actorch
in a virtual environment, you first need to activate
it (conda activate actorch-env
if you installed actorch
using Conda).
HINT: since a configuration file is a regular Python script, you can use all the features of the language (e.g. inheritance).
📧 Contact
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for actorch-0.0.3-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08b0216dc080e5533134e9fcb5ffe104a90cc85cac177530fd020cbfc82258c1 |
|
MD5 | 302939503d0f576db3e38a6708cd0cfa |
|
BLAKE2b-256 | 6512d7893c0298c3474e75c0597c3521ab0442e0970ef3487561ed16928f728f |
Hashes for actorch-0.0.3-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 786aecf117b28d5c6f6e3618bf0162de291b236dd4bc2f27a78deb71218a801e |
|
MD5 | 3f8159d8ba329a4b1be4dcddd3ffb5ae |
|
BLAKE2b-256 | d452464e362410505ca0f8053f76310d5f22c22521a409a4084b4045e680ecbd |
Hashes for actorch-0.0.3-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a20c351d1e8b2da702b41589bb0f0e06fd880fbae4e71e460f7ccc5fb53affa1 |
|
MD5 | 7c0b5bdf007b190a69891945cf1fe0f2 |
|
BLAKE2b-256 | aef0f0a9f5fa98ec114297fd9bc100131b6f45e2f8b643c54908cd6e795bcef7 |
Hashes for actorch-0.0.3-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcca4f0d9353036b113c8cd7b424debea730735184f65d4cea278ea4bc0c41ac |
|
MD5 | 28469913f3754f0a49d0a4461eb3dfae |
|
BLAKE2b-256 | 10f5a5ed38f3cc81762b626986ecc2dd9ae13f257ad6a8e6c801494b3e903d80 |
Hashes for actorch-0.0.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3caab9e2978a7e0a1e8ce565dbba245c010f03f57fed755892f4992fdaf8c8b8 |
|
MD5 | d7c78bb27e334a9256eea5f66cf84d2b |
|
BLAKE2b-256 | bafeab63529e618226eed6e76da81d2186eecd1ef03be0bcd3243a4824b9eb19 |
Hashes for actorch-0.0.3-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4432df73499ba9c6d9f3484bd9e3181ed9a4f2040fb75fd6d7b1360d2a41a4d7 |
|
MD5 | ecffae25904525dd8f2cb969a300c9e2 |
|
BLAKE2b-256 | 226d575f182616f0a0e22b35fd100882775715bb43d60eafabbde4899ccd1a40 |
Hashes for actorch-0.0.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aac693a9e4b94e0fe71c0285e60047a344b0d5953c74bb076db28954dd4a01f0 |
|
MD5 | 3c7148600f2062a404116890b71dd7a6 |
|
BLAKE2b-256 | ee41bf494df33e35f5c9931242d9956b47e7595cdfb5d36adc1df6dfa112a270 |
Hashes for actorch-0.0.3-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 191225186edc4e6006e6234eae3e07c2a2fed8f4e0b33d5d6fde4e11782f25b7 |
|
MD5 | 2af94e67b72f5da68b98cb8884d700a6 |
|
BLAKE2b-256 | 4c0acdddb7f61157ac9aa77a6817eb13d55927ea9b942229737c50404e1d2420 |
Hashes for actorch-0.0.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d90a8d5efe810f28a41d8b90c8350c4f0fe03388afceb11c7cc1619f82408af9 |
|
MD5 | 9d4ac2fe37767d2471a034ddda86626f |
|
BLAKE2b-256 | 19617ffc5cdce91e816d964af7082654a380ab4a54d5e86a9f386e74cd6b71d0 |
Hashes for actorch-0.0.3-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 705a82622936f5ae8f75690c64aa0dbe8dca23f1320bbd5dee7365e751318262 |
|
MD5 | 9da4f481c1aed6147aa2510df0e08d59 |
|
BLAKE2b-256 | 60e991e3569dd87f43c7a6bf43e409a0095a6db8f086c6e2512411093dd5e3ca |
Hashes for actorch-0.0.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6096f2fa67cd54e97a9a40b880eaf0af690af45f678c9443a258311d0adaa3e2 |
|
MD5 | 8e09d90c50520f90f13a6fc2c9f70888 |
|
BLAKE2b-256 | 2b8602d93e3c1c1dada1a235aa1294f963c6ce6ebf2757b26e23b32bea7bb37b |
Hashes for actorch-0.0.3-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d754a3449675a5517aae5516c004d0d231eb16eb0d6ac55cee6e82eedd05368 |
|
MD5 | 3cd0f13e2c147fc39f887b8857d53096 |
|
BLAKE2b-256 | f262d542277b0480ac534a26dffa6a5abebf96d257d4ff6023df150c9b308846 |
Hashes for actorch-0.0.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 937e91b43c6a033750168a1838cff032c9d82836dd8e25a2bde85d2a300200bb |
|
MD5 | a2a970862f8d3ef098d589ebdfe4ce78 |
|
BLAKE2b-256 | bc8d60bca7928568f4b9c42ff2faba9210c85fb381869ca27226624dbdbd5d0f |
Hashes for actorch-0.0.3-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0557df68d998da925d0c49ce46613aa55c8ba311ad62f3439e0d0fdf8404ee0b |
|
MD5 | 4c2ed054777a261c8b8bd67d871c9ab6 |
|
BLAKE2b-256 | 82abd622d959b9565a524df6c8750a6c3f6015d339dd1c3ca39f7053db745166 |