nappo

Disributed RL implementations with ray and pytorch.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Nappo: A PyTorch Library for distributed Reinforcement Learning

Nappo is a pytorch-based library for RL that focuses on distributed implementations, yet flexible enough to allow for method experimentation.

Installation


    conda create -y -n nappo
    conda activate nappo
    conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

    pip install nappo
    pip install git+git://github.com/openai/baselines.git

Documentation

NAPPO documentation can be found here.

Minimal code example

import ray
from nappo import Learner
from nappo.core.algos import PPO
from nappo.core.envs import VecEnv
from nappo.core.storages import OnPolicyGAEBuffer
from nappo.core.actors import OnPolicyActorCritic, get_model
from nappo.distributed_schemes.scheme_dadacs import Workers
from nappo.envs import make_pybullet_train_env

# 0. init ray
ray.init(address="auto")

The first part in any Nappo training script consists in defining the core components, the lower level modules. All core components have a create_factory method, which returns a function that allows to later create independent instances in different workers if required by the training scheme.

We can start with the environment. Nappo supports by default pybullet, atari and mujoco environments, but it is easy to extend it to any other environment. A detailed explanation about how to do it can be found here.

# 1. Define Train Vector of Envs
train_envs_factory, action_space, obs_space = VecEnv.create_factory(
    vec_env_size=1, log_dir="/tmp/train_example", env_fn=make_pybullet_train_env,
    env_kwargs={"env_id": "HalfCheetahBulletEnv-v0"})

We can continue by defining an on-policy or off-policy set of Actor (or ActorCritic), Algo and Storage core components.

# 2. Define RL Actor
actor_factory = OnPolicyActorCritic.create_factory(
    obs_space, action_space, feature_extractor_network=get_model("MLP"))

# 3. Define RL training algorithm
algo_factory = PPO.create_factory(
    lr=1e-4, num_epochs=4, clip_param=0.2, entropy_coef=0.01,
    value_loss_coef=.5, max_grad_norm=.5, num_mini_batch=4,
    use_clipped_value_loss=True, gamma=0.99)

# 4. Define rollouts storage
storage_factory = OnPolicyGAEBuffer.create_factory(size=1000, gae_lambda=0.95)

One of the main ideas behind Nappo is to allow single components to be replaced for experimentation without needing to change anything else. Since in RL not all components are compatible with each other (e.g. an on policy actor with an off-policy algorithm), some libraries advocate or higher level implementations with a single function call with many parameters that handles components creation. This approach might be generally more suitable to generate benchmarks and to use out-of-the-box solutions in industry, but less so for researchers trying to improve the state-of-the-art by switching and changing components. Furthermore, to a certain extend some components can be reused in different components set. If the components within the defined set do not match, an error will be raised at training execution.

We encourage users to create their own core components to extend current functionality, following the base.py templates associated with each one of them. Neural networks used as function approximators in the actor components can also be modified by the used. A more detailed explanation about how to do it can be found here.

Following, we instantiate the Workers of the training scheme of our choice. Worker components were designed to work for any combination of core components.

# 5. Define workers
workers = Workers(
    algo_factory=algo_factory,
    actor_factory=actor_factory,
    storage_factory=storage_factory,
    train_envs_factory=train_envs_factory,
    num_col_workers=2, num_grad_workers=6)

Finally, we create a Learner class instance and define the training loop.

# 6. Define learner
learner = Learner(workers, target_steps=1000000, log_dir="/tmp/train_example")

# 7. Define train loop
iterations = 0
while not learner.done():
    learner.step()
    if iterations % 1 == 0:
        learner.print_info()
    if iterations % 100 == 0:
        save_name = learner.save_model()
    iterations += 1

Available core components and distributed training schemes

Core components
- envs: VecEnv
- algos:
  - On-policy: PPO
  - Off-policy: SAC
- actors:
  - On-policy: OnPolicyActorCritic
  - Off-policy: OffPolicyActorCritic
- storages:
  - On-policy: OnPolicyBuffer, OnPolicyGAEBuffer, OnPolicyVTraceBuffer
  - Off-policy: ReplayBuffer: HindsightExperienceReplayBuffer
Distributed schemes
- 3cs
- 3ds
- 2dacs
- 2daca
- da2cs
- dadacs
- dadaca

A more detailed explanation of the meaning of distributed scheme naming be found here.

Current limitations

Citing Nappo

@misc{nappo2020rl,
  author = {Bou, Albert},
  title = {Nappo: A PyTorch Library for distributed Reinforcement Learning},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nappo/nappo}},
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.44

Dec 21, 2020

0.0.43

Dec 18, 2020

0.0.42

Dec 18, 2020

0.0.41

Dec 18, 2020

0.0.40

Dec 18, 2020

0.0.39

Dec 17, 2020

0.0.38

Dec 17, 2020

0.0.37

Dec 17, 2020

0.0.36

Dec 16, 2020

0.0.35

Dec 16, 2020

0.0.34

Dec 16, 2020

0.0.33

Dec 16, 2020

0.0.32

Dec 4, 2020

0.0.31

Dec 3, 2020

0.0.30

Dec 3, 2020

0.0.29

Nov 30, 2020

0.0.28

Nov 30, 2020

0.0.27

Nov 30, 2020

0.0.26

Nov 29, 2020

0.0.25

Nov 29, 2020

0.0.24

Nov 29, 2020

0.0.23

Nov 28, 2020

0.0.22

Nov 27, 2020

0.0.21

Nov 27, 2020

This version

0.0.20

Nov 19, 2020

0.0.19

Nov 19, 2020

0.0.18

Nov 19, 2020

0.0.17

Nov 19, 2020

0.0.16

Nov 19, 2020

0.0.15

Nov 19, 2020

0.0.14

Nov 9, 2020

0.0.13

Nov 9, 2020

0.0.12

Nov 8, 2020

0.0.11

Nov 5, 2020

0.0.10

Nov 5, 2020

0.0.9

Nov 3, 2020

0.0.8

Oct 30, 2020

0.0.5

Oct 30, 2020

0.0.4

Oct 30, 2020

0.0.3

Oct 30, 2020

0.0.2

Oct 30, 2020

0.0.1

Oct 30, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nappo-0.0.20.tar.gz (51.2 kB view hashes)

Uploaded Nov 19, 2020 Source

Built Distribution

nappo-0.0.20-py3-none-any.whl (93.7 kB view hashes)

Uploaded Nov 19, 2020 Python 3

Hashes for nappo-0.0.20.tar.gz

Hashes for nappo-0.0.20.tar.gz
Algorithm	Hash digest
SHA256	`b9e5e5b70e3942900f41e5137e08b1d4ad78b548730c0da65727e97acc3d1307`
MD5	`2387fe77bd6abc420d192a2aacd5de73`
BLAKE2b-256	`d09ffaeb25dc7eee1c70889715e33347d09c31d4de01fe86ebda57d4edb932d7`

Hashes for nappo-0.0.20-py3-none-any.whl

Hashes for nappo-0.0.20-py3-none-any.whl
Algorithm	Hash digest
SHA256	`efb6c0b0597f2a93d02ea768fbece4f2973a16be13a8a15711b48393ecf17ea1`
MD5	`34af3938c38a52a7e170a6b10107bedf`
BLAKE2b-256	`fa5048bd04cbcddff733e1529f9bc8ae2892746b01acf7c29476dc4af7204e25`