Skip to main content

Reinforcement Learning Framework

Project description

banner

adept is a reinforcement learning framework designed to accelerate research by providing:

  • a modular interface for using custom networks, agents, and environments
  • baseline reinforcement learning models and algorithms for PyTorch
  • multi-GPU support
  • access to various environments
  • built-in tensorboard logging, model saving, reloading, evaluation, and rendering
  • proven hyperparameter defaults

This code is early-access, expect rough edges. Interfaces subject to change. We're happy to accept feedback and contributions.

Read More

Documentation

Examples

Installation

Dependencies:

  • gym
  • PyTorch 1.x
  • Python 3.5+
  • We recommend CUDA 10, pytorch 1.0, python 3.6

From source:

git clone https://github.com/heronsystems/adeptRL
cd adeptRL
# Remove mpi, sc2, profiler if you don't plan on using these features:
pip install .[mpi,sc2,profiler]

From docker:

Quickstart

Train an Agent Logs go to /tmp/adept_logs/ by default. The log directory contains the tensorboard file, saved models, and other metadata.

# Local Mode (A2C)
# We recommend 4GB+ GPU memory, 8GB+ RAM, 4+ Cores
python -m adept.app local --env BeamRiderNoFrameskip-v4

# Distributed Mode (A2C, requires NCCL)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m adept.app distrib --env BeamRiderNoFrameskip-v4

# IMPALA (requires mpi4py and is resource intensive)
# We recommend 2+ GPUs, 8GB+ GPU memory, 32GB+ RAM, 4+ Cores
python -m adept.app impala --agent ActorCriticVtrace --env BeamRiderNoFrameskip-v4

# StarCraft 2 (IMPALA not supported yet)
# Warning: much more resource intensive than Atari
python -m adept.app local --env CollectMineralShards

# To see a full list of options:
python -m adept.app -h
python -m adept.app help <command>

Use your own Agent, Environment, Network, or SubModule

"""
my_script.py

Train an agent on a single GPU.
"""
from adept.scripts.local import parse_args, main
from adept.networks import NetworkModule, NetworkRegistry, SubModule1D
from adept.agents import AgentModule, AgentRegistry
from adept.environments import EnvModule, EnvRegistry


class MyAgent(AgentModule):
    pass  # Implement


class MyEnv(EnvModule):
    pass  # Implement


class MyNet(NetworkModule):
    pass  # Implement


class MySubModule1D(SubModule1D):
    pass  # Implement


if __name__ == '__main__':
    agent_registry = AgentRegistry()
    agent_registry.register_agent(MyAgent)

    env_registry = EnvRegistry()
    env_registry.register_env(MyEnv, ['env-id-1', 'env-id-2'])

    network_registry = NetworkRegistry()
    network_registry.register_custom_net(MyNet)
    network_registry.register_submodule(MySubModule1D)

    main(
        parse_args(),
        agent_registry=agent_registry,
        env_registry=env_registry,
        net_registry=network_registry
    )
  • Call your script like this: python my_script.py --agent MyAgent --env env-id-1 --custom-network MyNet
  • You can see all the args here or how to implement the stubs in the examples section above.

Features

Scripts

Local (Single-node, Single-GPU)

  • Best place to start if you're trying to understand code.

Distributed (Multi-node, Multi-GPU)

  • Uses NCCL backend to all-reduce gradients across GPUs without a parameter server or host process.
  • Supports NVLINK and InfiniBand to reduce communication overhead
  • InfiniBand untested since we do not have a setup to test on.

Importance Weighted Actor Learner Architectures, IMPALA (Single Node, Multi-GPU)

  • Our implementation uses GPU workers rather than CPU workers for forward passes.
  • On Atari we achieve ~4k SPS = ~16k FPS with two GPUs and an 8-core CPU.
  • "Note that the shallow IMPALA experiment completes training over 200 million frames in less than one hour."
  • IMPALA official experiments use 48 cores.
  • Ours: 2000 frame / (second * # CPU core) DeepMind: 1157 frame / (second * # CPU core)
  • Does not yet support multiple nodes or direct GPU memory transfers.

Agents

Networks

  • Modular Network Interface: supports arbitrary input and output shapes up to 4D via a SubModule API.
  • Stateful networks (ie. LSTMs)
  • Batch normalization (paper)

Environments

  • OpenAI Gym
  • StarCraft 2 (unstable)

Performance

  • ~ 3,000 Steps/second = 12,000 FPS (Atari)
    • Local Mode
    • 64 environments
    • GeForce 2080 Ti
    • Ryzen 2700x 8-core
  • Used to win a Doom competition (Ben Bell / Marv2in) architecture
  • Trained for 50M Steps / 200M Frames
  • Up to 30 no-ops at start of each episode
  • Evaluated on different seeds than trained on
  • Architecture: Four Convs (F=32) followed by an LSTM (F=512)
  • Reproduce with python -m adept.app local --logdir ~/local64_benchmark --eval -y --nb-step 50e6 --env <env-id>

Acknowledgements

We borrow pieces of OpenAI's gym and baselines code. We indicate where this is done.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adeptRL-0.2.0.tar.gz (73.1 kB view details)

Uploaded Source

Built Distribution

adeptRL-0.2.0-py3-none-any.whl (154.0 kB view details)

Uploaded Python 3

File details

Details for the file adeptRL-0.2.0.tar.gz.

File metadata

  • Download URL: adeptRL-0.2.0.tar.gz
  • Upload date:
  • Size: 73.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.3

File hashes

Hashes for adeptRL-0.2.0.tar.gz
Algorithm Hash digest
SHA256 859502c63595e7acc2126d663638883875d22bd24a5f0dbce0600f94da677aaf
MD5 7c56a40231743739d2103fcf06783270
BLAKE2b-256 ead0d33e65f92fdb05be4be393722c972a38597000f6fb12a72cd75afbd92147

See more details on using hashes here.

File details

Details for the file adeptRL-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: adeptRL-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 154.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.3

File hashes

Hashes for adeptRL-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f2ad6228c71a783bfb040099981530a246983627502d79410946509435bfb3bd
MD5 4490838656cb0650ac9fc4d2e49e8421
BLAKE2b-256 35706659eef7a0d695ffab0511a5601f40423f38a0c27eedf4955e9025b0fa3f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page