Reinforcement Learning Framework
Project description
adept is a reinforcement learning framework designed to accelerate research by providing:
- a modular interface for using custom networks, agents, and environments
- baseline reinforcement learning models and algorithms for PyTorch
- multi-GPU support
- access to various environments
- built-in tensorboard logging, model saving, reloading, evaluation, and rendering
- proven hyperparameter defaults
This code is early-access, expect rough edges. Interfaces subject to change. We're happy to accept feedback and contributions.
Read More
Documentation
- Architecture Overview
- ModularNetwork Overview
- Resume training
- Evaluate a model
- Render environment
Examples
- Custom Network (stub | example)
- Custom SubModule (stub | example)
- Custom Agent (stub | example)
- Custom Environment (stub | example)
Installation
Dependencies:
- gym
- PyTorch 1.x
- Python 3.5+
- We recommend CUDA 10, pytorch 1.0, python 3.6
From source:
- Follow instructions for PyTorch
- (Optional) Follow instructions for StarCraft 2
git clone https://github.com/heronsystems/adeptRL
cd adeptRL
# Remove mpi, sc2, profiler if you don't plan on using these features:
pip install .[mpi,sc2,profiler]
From docker:
Quickstart
Train an Agent
Logs go to /tmp/adept_logs/
by default. The log directory contains the
tensorboard file, saved models, and other metadata.
# Local Mode (A2C)
# We recommend 4GB+ GPU memory, 8GB+ RAM, 4+ Cores
python -m adept.app local --env BeamRiderNoFrameskip-v4
# Distributed Mode (A2C, requires NCCL)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m adept.app distrib --env BeamRiderNoFrameskip-v4
# IMPALA (requires mpi4py and is resource intensive)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m adept.app impala --agent ActorCriticVtrace --env BeamRiderNoFrameskip-v4
# StarCraft 2 (IMPALA not supported yet)
# Warning: much more resource intensive than Atari
python -m adept.app local --env CollectMineralShards
# To see a full list of options:
python -m adept.app -h
python -m adept.app help <command>
Use your own Agent, Environment, Network, or SubModule
"""
my_script.py
Train an agent on a single GPU.
"""
from adept.scripts.local import parse_args, main
from adept.networks import NetworkModule, NetworkRegistry, SubModule1D
from adept.agents import AgentModule, AgentRegistry
from adept.environments import EnvModule, EnvRegistry
class MyAgent(AgentModule):
pass # Implement
class MyEnv(EnvModule):
pass # Implement
class MyNet(NetworkModule):
pass # Implement
class MySubModule1D(SubModule1D):
pass # Implement
if __name__ == '__main__':
agent_registry = AgentRegistry()
agent_registry.register_agent(MyAgent)
env_registry = EnvRegistry()
env_registry.register_env(MyEnv, ['env-id-1', 'env-id-2'])
network_registry = NetworkRegistry()
network_registry.register_custom_net(MyNet)
network_registry.register_submodule(MySubModule1D)
main(
parse_args(),
agent_registry=agent_registry,
env_registry=env_registry,
net_registry=network_registry
)
- Call your script like this:
python my_script.py --agent MyAgent --env env-id-1 --custom-network MyNet
- You can see all the args here or how to implement the stubs in the examples section above.
Features
Scripts
Local (Single-node, Single-GPU)
- Best place to start if you're trying to understand code.
Distributed (Multi-node, Multi-GPU)
- Uses NCCL backend to all-reduce gradients across GPUs without a parameter server or host process.
- Supports NVLINK and InfiniBand to reduce communication overhead
- InfiniBand untested since we do not have a setup to test on.
Importance Weighted Actor Learner Architectures, IMPALA (Single Node, Multi-GPU)
- Our implementation uses GPU workers rather than CPU workers for forward passes.
- On Atari we achieve ~4k SPS = ~16k FPS with two GPUs and an 8-core CPU.
- "Note that the shallow IMPALA experiment completes training over 200 million frames in less than one hour."
- IMPALA official experiments use 48 cores.
- Ours: 2000 frame / (second * # CPU core) DeepMind: 1157 frame / (second * # CPU core)
- Does not yet support multiple nodes or direct GPU memory transfers.
Agents
Networks
- Modular Network Interface: supports arbitrary input and output shapes up to 4D via a SubModule API.
- Stateful networks (ie. LSTMs)
- Batch normalization (paper)
Environments
- OpenAI Gym
- StarCraft 2 (unstable)
Performance
- ~ 3,000 Steps/second = 12,000 FPS (Atari)
- Local Mode
- 64 environments
- GeForce 2080 Ti
- Ryzen 2700x 8-core
- Used to win a Doom competition (Ben Bell / Marv2in)
- Trained for 50M Steps / 200M Frames
- Up to 30 no-ops at start of each episode
- Evaluated on different seeds than trained on
- Architecture: Four Convs (F=32) followed by an LSTM (F=512)
- Reproduce with
python -m adept.app local --logdir ~/local64_benchmark --eval -y --nb-step 50e6 --env <env-id>
Acknowledgements
We borrow pieces of OpenAI's gym and baselines code. We indicate where this is done.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file adeptRL-0.2.0.tar.gz
.
File metadata
- Download URL: adeptRL-0.2.0.tar.gz
- Upload date:
- Size: 73.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 859502c63595e7acc2126d663638883875d22bd24a5f0dbce0600f94da677aaf |
|
MD5 | 7c56a40231743739d2103fcf06783270 |
|
BLAKE2b-256 | ead0d33e65f92fdb05be4be393722c972a38597000f6fb12a72cd75afbd92147 |
File details
Details for the file adeptRL-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: adeptRL-0.2.0-py3-none-any.whl
- Upload date:
- Size: 154.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2ad6228c71a783bfb040099981530a246983627502d79410946509435bfb3bd |
|
MD5 | 4490838656cb0650ac9fc4d2e49e8421 |
|
BLAKE2b-256 | 35706659eef7a0d695ffab0511a5601f40423f38a0c27eedf4955e9025b0fa3f |