Skip to main content

Fastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL. The purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing new agents.

Project description

Build Status pypi fasti_rl version github_master version

Note: Test passing will not be a useful stability indicator until version 1.0+

Fast Reinforcement Learning

This repo is not affiliated with Jeremy Howard or his course which can be found here: here We will be using components from the Fastai library however for building and training our reinforcement learning (RL) agents.

As a note, here is a run down of existing RL frameworks:

However there are also frameworks in PyTorch most notably Facebook's Horizon:

Fastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL. The purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing new agents.

Table of Contents

  1. Installation
  2. Alpha TODO
  3. Code
  4. Versioning
  5. Contributing
  6. Style

Installation

Very soon we would like to add some form of scripting to install some complicated dependencies. We have 2 steps:

1.a FastAI Install Fastai or if you are Anaconda (which is a good idea to use Anaconda) you can do:
conda install -c pytorch -c fastai fastai

1.b Optional / Extra Envs OpenAI all gyms:
pip install gym[all]

Mazes:
git clone https://github.com/MattChanTK/gym-maze.git
cd gym-maze
python setup.py install

2 Actual Repo
git clone https://github.com/josiahls/fast-reinforcement-learning.git
cd fast-reinforcement-learning
python setup.py install

Alpha TODO

At the moment these are the things we personally urgently need, and then the nice things that will make this repo something akin to valuable. These are listed in kind of the order we are planning on executing them.

At present, we are in the Alpha stages of agents not being fully tested / debugged. The final step (before 1.0.0) will be doing an evaluation of the DQN and DDPG agent implementations and verifying each performs the best it can at known environments. Prior to 1.0.0, new changes might break previous code versions, and models are not guaranteed to be working at their best. Post 1.0.0 will be more formal feature development with CI, unit testing, etc.

Critical Testable code:

from fast_rl.agents.dqn import *
from fast_rl.agents.dqn_models import *
from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
from fast_rl.core.data_block import MDPDataBunch
from fast_rl.core.metrics import *

data = MDPDataBunch.from_env('CartPole-v1', render='rgb_array', bs=32, add_valid=False)
model = create_dqn_model(data, FixedTargetDQNModule, opt=torch.optim.RMSprop, lr=0.00025)
memory = ExperienceReplay(memory_size=1000, reduce_ram=True)
exploration_method = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
learner = dqn_learner(data=data, model=model, memory=memory, exploration_method=exploration_method)
learner.fit(10)
  • 0.7.0 Full test suite using multi-processing. Connect to CI.
  • 0.8.0 Comprehensive model eval debug/verify. Each model should succeed at at least a few known environments. Also, massive refactoring will be needed.
  • 0.9.0 Notebook demonstrations of basic model usage.
  • Working on 1.0.0 Base version is completed with working model visualizations proving performance / expected failure. At this point, all models should have guaranteed environments they should succeed in.
  • 1.1.0 More Traditional RL models
    • Add PPO
    • Add TRPO
    • Add D4PG
    • Add A3C
  • 1.2.0 HRL models Possibly might change version to 2.0 depending on SMDP issues
    • Add SMDP
    • Add Goal oriented MDPs. Will Require a new "Step"
    • Add FeUdal Network
    • Add HAC
    • Add MAXQ
    • Add HIRO
    • Add h-DQN
    • Add Modulated Policy Hierarchies
    • Add Meta Learning Shared Hierarchies
    • Add STRategic Attentive Writer (STRAW)
    • Add H-DRLN
    • Add Abstract Markov Decision Process (AMDP)
  • 1.3.0 HRL Options models Possibly will already be implemented in a previous model
    • Options augmentation to DQN based models
    • Options augmentation to actor critic models
    • Options augmentation to async actor critic models
  • 1.4.0 HRL Skills
    • Skills augmentation to DQN based models
    • Skills augmentation to actor critic models
    • Skills augmentation to async actor critic models
  • 1.5.0
  • 1.6.0
  • 1.7.0
  • 1.8.0
  • 1.9.0
  • 2.0.0 Add PyBullet Fetch Environments
    • 2.0.0 Not part of this repo, however the envs need to subclass the OpenAI gym.GoalEnv
    • 2.0.0 Add HER

Code

Some of the key take aways is Fastai's use of callbacks. Not only do callbacks allow for logging, but in fact adding a callback to a generic fit function can change its behavior drastically. My goal is to have a library that is as easy as possible to run on a server or on one's own computer. We are also interested in this being easy to extend.

We have a few assumptions that the code / support algorithms I believe should adhere to:

  • Environments should be pickle-able, and serializable. They should be able to shut down and start up multiple times during run time.
  • Agents should not need more information than images or state values for an environment per step. This means that environments should not be expected to allow output of contact points, sub-goals, or STRIPS style logical outputs.

Rational:

  • Shutdown / Startup: Some environments (pybullet) have the issue of shutting down and starting different environments. Luckily, we have a fork of pybullet, so these modifications will be forced.
  • Pickling: Being able to encapsulate an environment as a .pkl can be important for saving it and all the information it generated.
  • Serializable: If we want to do parallel processing, environments need to be serializable to transport them between those processes.

Some extra assumptions:

  • Environments can easier be goal-less, or have a single goal in which OpenAI defines as Env and GoalEnv.

These assumptions are necessary for us to implement other envs from other repos. We do not want to be tied to just OpenAI gyms.

Versioning

At present the repo is in alpha stages being. We plan to move this from alpha to a pseudo beta / working versions. Regardless of version, we will follow Python style versioning

Alpha Versions: #.#.# e.g. 0.1.0. Alpha will never go above 0.99.99, at that point it will be full version 1.0.0. A key point is during alpha, coding will be quick and dirty with no promise of proper deprecation.

Beta / Full Versions: These will be greater than 1.0.0. We follow the Python method of versions: [Breaking Changes].[Backward Compatible Features].[Bug Fixes]. These will be feature additions such new functions, tools, models, env support. Also proper deprecation will be used.

Pip update frequency: We have a pip repository, however we do not plan to update it as frequently at the moment. However, the current frequency will be during Beta / Full Version updates, we might every 0.5.0 versions update pip.

Contributing

Follow the templates we have on github. Make a branch either from master or the most recent version branch. We recommend squashing commits / keep pointless ones to a minimum.

Style

Since fastai uses a different style from traditional PEP-8, we will be following Style and Abbreviations. Also we will use RL specific abbr.

Concept Abbr. Combination Examples
RL State st
Action acn
Bounds bb Same as Bounding Box

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_rl-0.9.91.tar.gz (44.7 kB view details)

Uploaded Source

Built Distribution

fast_rl-0.9.91-py3-none-any.whl (57.7 kB view details)

Uploaded Python 3

File details

Details for the file fast_rl-0.9.91.tar.gz.

File metadata

  • Download URL: fast_rl-0.9.91.tar.gz
  • Upload date:
  • Size: 44.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.10

File hashes

Hashes for fast_rl-0.9.91.tar.gz
Algorithm Hash digest
SHA256 50ffd645ffe6d432a42f94546a5b7eccd1b36104aea898bc645f972b66e8ec07
MD5 6359065c6df101fbe573db143d4f9554
BLAKE2b-256 f2be81c29059f5f107779819aceee292c892b358cb40a5cc294a6dc47c97fb92

See more details on using hashes here.

File details

Details for the file fast_rl-0.9.91-py3-none-any.whl.

File metadata

  • Download URL: fast_rl-0.9.91-py3-none-any.whl
  • Upload date:
  • Size: 57.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.10

File hashes

Hashes for fast_rl-0.9.91-py3-none-any.whl
Algorithm Hash digest
SHA256 0ee457c8df31c8476da51fe10307b17799fdb2d2298efac2346937c18a006c4d
MD5 a9c076265f1adbd22c242a5d0a4f18d3
BLAKE2b-256 c0aac1bd665cdd36369c0c4fc99d24640fe1f717a8ea259a6d4d95aa4fb3ec22

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page