Skip to main content

No project description provided

Project description

# Deep RL PyTorch [![https://www.singularity-hub.org/static/img/hosted-singularity–hub-%23e32929.svg](https://www.singularity-hub.org/static/img/hosted-singularity–hub-%23e32929.svg)](https://singularity-hub.org/collections/2581)

This repo contains implementation of popular Deep RL algorithms. Furthermore it contains unified interface for training and evaluation with unified model saving and visualization. It can be used as a good starting point when implementing new RL algorithm in PyTorch.

## Getting started If you want to base your algorithm on this repository, start by installing it as a package ` pip install git+https://github.com/jkulhanek/deep-rl-pytorch.git `

If you want to run attached experiments yourself, feel free to clone this repository. ` git clone https://github.com/jkulhanek/deep-rl-pytorch.git `

All dependencies are prepared in a docker container. If you have nvidia-docker enabled, you can use this image. To pull and start the image just run:

` docker run --runtime=nvidia --net=host -it kulhanek/deep-rl-pytorch:latest bash `

From there, you can either clone your own repository containing your experiments or clone this one.

## Concepts All algorithms are implemented as base classes. In your experiment your need to subclass from those base classes. The deep_rl.core.AbstractTrainer class is used for all trainers and all algorithms inherit this class. Each trainer can be wrapped in several wrappers (classes extending deep_rl.core.AbstractWrapper). Those wrappers are used for saving, logging, terminating the experiment and etc. All experiments should be registered using @deep_rl.register_trainer decorator. This decorator than wraps the trainer with default wrappers. This can be controlled by passing arguments to the decorator. All registered trainers (experiments) can be run by calling deep_rl.make_trainer(<<name>>).run().

## Implemented algorithms ### A2C A2C is a synchronous, deterministic variant of Asynchronous Advantage Actor Critic (A3C) [2] which according to OpenAI [1] gives equal performance. It is however more efficient for GPU utilization.

Start your experiment by subclassing deep_rl.a2c.A2CTrainer. Several models are included in deep_rl.a2c.model. You may want to use at least some helper modules contained in this package when designing your own experiment.

In most of the models, initialization is done according to [3].

### Asynchronous Advantage Actor Critic (A3C) [2] This implementation uses multiprocessing. It comes with two optimizers - RMSprop and Adam.

### Actor Critic using Kronecker-Factored Trust Region (ACKTR) [1] This is an improvement of A2C described in [1].

## Experiments > Comming soon

## Requirements Those packages must be installed before using the framework for your own algorithm: - OpenAI baselines (can be installed by running pip install git+https://github.com/openai/baselines.git) - PyTorch - Visdom (pip install visdom) - Gym (pip install gym) - MatPlotLib

Those packages must be installed prior running experiments: - DeepMind Lab - Gym[atari]

## Sources This repository is based on work of several other authors. We would like to express our thanks. - https://github.com/openai/baselines/tree/master/baselines - https://github.com/ikostrikov/pytorch-a2c-ppo-acktr/tree/master/a2c_ppo_acktr - https://github.com/miyosuda/unreal - https://github.com/openai/gym

## References [1] Wu, Y., Mansimov, E., Grosse, R.B., Liao, S. and Ba, J., 2017. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In Advances in neural information processing systems (pp. 5279-5288).

[2] Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. and Kavukcuoglu, K., 2016, June. Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928-1937).

[3] Saxe, A.M., McClelland, J.L. and Ganguli, S., 2013. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deep_rl-0.3.7.tar.gz (56.6 kB view details)

Uploaded Source

Built Distribution

deep_rl-0.3.7-py3-none-any.whl (75.2 kB view details)

Uploaded Python 3

File details

Details for the file deep_rl-0.3.7.tar.gz.

File metadata

  • Download URL: deep_rl-0.3.7.tar.gz
  • Upload date:
  • Size: 56.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for deep_rl-0.3.7.tar.gz
Algorithm Hash digest
SHA256 64771a747e5bfde87f5c50c445f382d069ed332de1ff3ffeeaab1c334ec7ded9
MD5 2ac17dee4fe99d5b58ba5a8a8c18b7e9
BLAKE2b-256 07bad1239c8821bf44d0696b3fd8e43fe50d7e56358c44a8f413bde4ad94c81c

See more details on using hashes here.

File details

Details for the file deep_rl-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: deep_rl-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 75.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for deep_rl-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 9634734b534893faa36f885d6552877d78488ba9f4789b6b89ca703e0abbf9f1
MD5 b9bbfe0846ae76e3f2be1101cbcfc310
BLAKE2b-256 90e4390a5764f5a8af44e2e3568b68f7665429c87bcb14db7a40de7a195719db

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page