Reinforcement Learning in Pytorch
Project description
Deep Reinforcement Learning in Pytorch
This repository contains all standard model-free and model-based(coming) RL algorithms in Pytorch. (May also contain some research ideas I am working on currently)
What is it?
pytorch-rl implements some state-of-the art deep reinforcement learning algorithms in Pytorch, especially those concerned with continuous action spaces. You can train your algorithm efficiently either on CPU or GPU. Furthermore, pytorch-rl works with OpenAI Gym out of the box. This means that evaluating and playing around with different algorithms is easy. Of course you can extend pytorch-rl according to your own needs. TL:DR : pytorch-rl makes it really easy to run state-of-the-art deep reinforcement learning algorithms.
Dependencies
- Pytorch
- Gym (OpenAI)
- mujoco-py (For the physics simulation and the robotics env in gym)
- Pybullet (Coming Soon)
- MPI (Only supported with mpi backend Pytorch installation)
- Tensorboardx (https://github.com/lanpa/tensorboardX)
RL algorithms
- DQN (with Double Q learning)
- DDPG
- DDPG with HER (For the OpenAI Fetch Environments)
- Heirarchical Reinforcement Learning
- Prioritized Experience Replay + DDPG
- DDPG with Prioritized Hindsight experience replay (Research)
- Neural Map with A3C (Coming Soon)
- Rainbow DQN (Coming Soon)
- PPO
- HER with self attention for goal substitution (Research)
- A3C (Coming Soon)
- ACER (Coming Soon)
- DARLA
- TDM
- World Models
- Soft Actor-Critic
Environments
- Breakout
- Pong (coming soon)
- Hand Manipulation Robotic Task
- Fetch-Reach Robotic Task
- Hand-Reach Robotic Task
- Block Manipulation Robotic Task
- Montezuma's Revenge (Current Research)
- Pitfall
- Gravitar
- CarRacing
- OpenSim Prosthetics Nips Challenge (https://www.crowdai.org/challenges/nips-2018-ai-for-prosthetics-challenge)
Environment Modelling (For exploration and domain adaptation)
Multiple GAN training tricks have been used because of the instability in training the generators and discriminators. Please refer to https://github.com/soumith/ganhacks for more information.
Even after using the tricks, it was really hard to train a GAN to convergence. However, after using Spectral Normalization (https://arxiv.org/abs/1802.05957) the infogan was trained to convergence.
For image to image translation tasks with GANs and for VAEs in general, training with Skip Connection really helps the training.
- beta-VAE
- InfoGAN
- CVAE-GAN
- Flow based generative models (Research)
- SAGAN
- Sequential Attend, Infer, Repeat
- Curiosity driven exploration
- Parameter Space Noise for Exploration
- Noisy Network
References
- Playing Atari with Deep Reinforcement Learning, Mnih et al., 2013
- Human-level control through deep reinforcement learning, Mnih et al., 2015
- Deep Reinforcement Learning with Double Q-learning, van Hasselt et al., 2015
- Continuous control with deep reinforcement learning, Lillicrap et al., 2015
- CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training, Bao et al., 2017
- beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, Higgins et al., 2017
- Hindsight Experience Replay, Andrychowicz et al., 2017
- InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets, Chen et al., 2016
- World Models, Ha et al., 2018
- Spectral Normalization for Generative Adversarial Networks, Miyato et al., 2018
- Self-Attention Generative Adversarial Networks, Zhang et al., 2018
- Curiosity-driven Exploration by Self-supervised Prediction, Pathak et al., 2017
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, Haarnoja et al., 2018
- Parameter Space Noise for Exploration, Plappert et al., 2018
- Noisy Network for Exploration, Fortunato et al., 2018
- Proximal Policy Optimization Algorithms, Schulman et al., 2017
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pytorch_policy-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3ae7ddbcb6bf4ebe15e891dd4847e05ee9ac41c4eda4cb0c85cf35922019ce61 |
|
MD5 | 32cb647c014ce54a6b1c15662b1ec077 |
|
BLAKE2b-256 | d0cd5715872d9ccd0441f59aab7a1723aee6dfabb395454dc2991e5b7cb65953 |