Skip to main content

Reinforcement Learning Framework

Project description


adept is a reinforcement learning framework designed to accelerate research by providing:

  • a modular interface for using custom networks, agents, and environments
  • baseline reinforcement learning models and algorithms for PyTorch
  • multi-GPU support
  • access to various environments
  • built-in tensorboard logging, model saving, reloading, evaluation, and rendering
  • proven hyperparameter defaults

This code is early-access, expect rough edges. Interfaces subject to change. We're happy to accept feedback and contributions.

Read More





  • gym
  • PyTorch 1.x
  • Python 3.5+
  • We recommend CUDA 10, pytorch 1.0, python 3.6

From source:

git clone
cd adeptRL
# Remove mpi, sc2, profiler if you don't plan on using these features:
pip install .[mpi,sc2,profiler]

From docker:


Train an Agent Logs go to /tmp/adept_logs/ by default. The log directory contains the tensorboard file, saved models, and other metadata.

# Local Mode (A2C)
# We recommend 4GB+ GPU memory, 8GB+ RAM, 4+ Cores
python -m local --env BeamRiderNoFrameskip-v4

# Distributed Mode (A2C, requires NCCL)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m distrib --env BeamRiderNoFrameskip-v4

# IMPALA (requires mpi4py and is resource intensive)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m impala --agent ActorCriticVtrace --env BeamRiderNoFrameskip-v4

# StarCraft 2 (IMPALA not supported yet)
# Warning: much more resource intensive than Atari
python -m local --env CollectMineralShards

# To see a full list of options:
python -m -h
python -m help <command>

Use your own Agent, Environment, Network, or SubModule


Train an agent on a single GPU.
from adept.scripts.local import parse_args, main
from adept.networks import NetworkModule, NetworkRegistry, SubModule1D
from adept.agents import AgentModule, AgentRegistry
from adept.environments import EnvModule, EnvRegistry

class MyAgent(AgentModule):
    pass  # Implement

class MyEnv(EnvModule):
    pass  # Implement

class MyNet(NetworkModule):
    pass  # Implement

class MySubModule1D(SubModule1D):
    pass  # Implement

if __name__ == '__main__':
    agent_registry = AgentRegistry()

    env_registry = EnvRegistry()
    env_registry.register_env(MyEnv, ['env-id-1', 'env-id-2'])

    network_registry = NetworkRegistry()

  • Call your script like this: python --agent MyAgent --env env-id-1 --custom-network MyNet
  • You can see all the args here or how to implement the stubs in the examples section above.



Local (Single-node, Single-GPU)

  • Best place to start if you're trying to understand code.

Distributed (Multi-node, Multi-GPU)

  • Uses NCCL backend to all-reduce gradients across GPUs without a parameter server or host process.
  • Supports NVLINK and InfiniBand to reduce communication overhead
  • InfiniBand untested since we do not have a setup to test on.

Importance Weighted Actor Learner Architectures, IMPALA (Single Node, Multi-GPU)

  • Our implementation uses GPU workers rather than CPU workers for forward passes.
  • On Atari we achieve ~4k SPS = ~16k FPS with two GPUs and an 8-core CPU.
  • "Note that the shallow IMPALA experiment completes training over 200 million frames in less than one hour."
  • IMPALA official experiments use 48 cores.
  • Ours: 2000 frame / (second * # CPU core) DeepMind: 1157 frame / (second * # CPU core)
  • Does not yet support multiple nodes or direct GPU memory transfers.



  • Modular Network Interface: supports arbitrary input and output shapes up to 4D via a SubModule API.
  • Stateful networks (ie. LSTMs)
  • Batch normalization (paper)


  • OpenAI Gym
  • StarCraft 2 (unstable)


  • ~ 3,000 Steps/second = 12,000 FPS (Atari)
    • Local Mode
    • 64 environments
    • GeForce 2080 Ti
    • Ryzen 2700x 8-core
  • Used to win a Doom competition (Ben Bell / Marv2in) architecture
  • Trained for 50M Steps / 200M Frames
  • Up to 30 no-ops at start of each episode
  • Evaluated on different seeds than trained on
  • Architecture: Four Convs (F=32) followed by an LSTM (F=512)
  • Reproduce with python -m local --logdir ~/local64_benchmark --eval -y --nb-step 50e6 --env <env-id>


We borrow pieces of OpenAI's gym and baselines code. We indicate where this is done.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adeptRL-0.2.0.tar.gz (73.1 kB view hashes)

Uploaded source

Built Distribution

adeptRL-0.2.0-py3-none-any.whl (154.0 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page