Skip to main content

Fastai for computer vision and tabular learning has been amazing. One would wish that this would be the same for RL. The purpose of this repo is to have a framework that is as easy as possible to start, but also designed for testing new agents.

Project description

Build Status pypi fasti_rl version github_master version

Fast_rl

This repo is not affiliated with Jeremy Howard or his course which can be found here. We will be using components from the Fastai library for building and training our reinforcement learning (RL) agents.

Our goal is for fast_rl to be make benchmarking easier, inference more efficient, and environment compatibility to be as decoupled as much as possible. This being version 1.0, we still have a lot of work to make RL training itself faster and more efficient. The goals for this repo can be seen in the RoadMap.

A simple example:

from fast_rl.agents.dqn import create_dqn_model, dqn_learner
from fast_rl.agents.dqn_models import *
from fast_rl.core.agent_core import ExperienceReplay,  GreedyEpsilon
from fast_rl.core.data_block import MDPDataBunch
from fast_rl.core.metrics import RewardMetric, EpsilonMetric

memory = ExperienceReplay(memory_size=1000000, reduce_ram=True)
explore = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64, add_valid=False)
model = create_dqn_model(data=data, base_arch=FixedTargetDQNModule, lr=0.001, layers=[32,32])
learn = dqn_learner(data, model, memory=memory, exploration_method=explore, copy_over_frequency=300,
                    callback_fns=[RewardMetric, EpsilonMetric])
learn.fit(450)

More complex examples might involve running an RL agent multiple times, generating episode snapshots as gifs, grouping reward plots, and finally showing the best and worst runs in a single graph.

from fastai.basic_data import DatasetType
from fast_rl.agents.dqn import create_dqn_model, dqn_learner
from fast_rl.agents.dqn_models import *
from fast_rl.core.agent_core import ExperienceReplay, GreedyEpsilon
from fast_rl.core.data_block import MDPDataBunch
from fast_rl.core.metrics import RewardMetric, EpsilonMetric
from fast_rl.core.train import GroupAgentInterpretation, AgentInterpretation

group_interp = GroupAgentInterpretation()
for i in range(5):
	memory = ExperienceReplay(memory_size=1000000, reduce_ram=True)
	explore = GreedyEpsilon(epsilon_start=1, epsilon_end=0.1, decay=0.001)
	data = MDPDataBunch.from_env('CartPole-v1', render='human', bs=64, add_valid=False)
	model = create_dqn_model(data=data, base_arch=FixedTargetDQNModule, lr=0.001, layers=[32,32])
	learn = dqn_learner(data, model, memory=memory, exploration_method=explore, copy_over_frequency=300,
						callback_fns=[RewardMetric, EpsilonMetric])
	learn.fit(450)

	interp=AgentInterpretation(learn, ds_type=DatasetType.Train)
	interp.plot_rewards(cumulative=True, per_episode=True, group_name='cartpole_experience_example')
	group_interp.add_interpretation(interp)
	group_interp.to_pickle(f'{learn.model.name.lower()}/', f'{learn.model.name.lower()}')
	for g in interp.generate_gif(): g.write(f'{learn.model.name.lower()}')
group_interp.plot_reward_bounds(per_episode=True, smooth_groups=10)

More examples can be found in docs_src and the actual code being run for generating gifs can be found in tests in either test_dqn.py or test_ddpg.py.

As a note, here is a run down of existing RL frameworks:

However there are also frameworks in PyTorch:

Installation

fastai (semi-optional)
Install Fastai or if you are using Anaconda (which is a good idea to use Anaconda) you can do:
conda install -c pytorch -c fastai fastai

fast_rl
Fastai will be installed if it does not exist. If it does exist, the versioning should be repaired by the the setup.py. pip install fastai

Installation (Optional)

OpenAI all gyms:
pip install gym[all]

Mazes:
git clone https://github.com/MattChanTK/gym-maze.git
cd gym-maze
python setup.py install

Installation Dev (Optional)

git clone https://github.com/josiahls/fast-reinforcement-learning.git
cd fast-reinforcement-learning
python setup.py install

Installation Issues

Many issues will likely fall under fastai installation issues.

Any other issues are likely environment related. It is important to note that Python 3.7 is not being tested due to an issue with Pyglet and gym do not working. This issue will not stop you from training models, however this might impact using OpenAI environments.

RoadMap

  • Working on 1.0.0 Base version is completed with working model visualizations proving performance / expected failure. At this point, all models should have guaranteed environments they should succeed in.
  • 1.1.0 More Traditional RL models
    • Add PPO
    • Add TRPO
    • Add D4PG
    • Add A2C
    • Add A3C
  • 1.2.0 HRL models Possibly might change version to 2.0 depending on SMDP issues
    • Add SMDP
    • Add Goal oriented MDPs. Will Require a new "Step"
    • Add FeUdal Network
    • Add storage based DataBunch memory management. This can prevent RAM from being used up by episode image frames that may or may not serve any use to the agent, but only for logging.
  • 1.3.0
    • Add HAC
    • Add MAXQ
    • Add HIRO
  • 1.4.0
    • Add h-DQN
    • Add Modulated Policy Hierarchies
    • Add Meta Learning Shared Hierarchies
  • 1.5.0
    • Add STRategic Attentive Writer (STRAW)
    • Add H-DRLN
    • Add Abstract Markov Decision Process (AMDP)
    • Add conda integration so that installation can be truly one step.
  • 1.6.0 HRL Options models Possibly will already be implemented in a previous model
    • Options augmentation to DQN based models
    • Options augmentation to actor critic models
    • Options augmentation to async actor critic models
  • 1.8.0 HRL Skills
    • Skills augmentation to DQN based models
    • Skills augmentation to actor critic models
    • Skills augmentation to async actor critic models
  • 1.9.0
  • 2.0.0 Add PyBullet Fetch Environments
    • 2.0.0 Not part of this repo, however the envs need to subclass the OpenAI gym.GoalEnv
    • 2.0.0 Add HER

Contribution

Following fastai's guidelines would be desirable: Guidelines

While we hope that model additions will be added smoothly. All models will only be dependent on core.layers.py. As time goes on, the model architecture will overall improve (we are and while continue to be still figuring things out).

Style

Since fastai uses a different style from traditional PEP-8, we will be following Style and Abbreviations. Also we will use RL specific abbr.

Concept Abbr. Combination Examples
RL State st
Action acn
Bounds bb Same as Bounding Box

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_rl-0.9.92.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

fast_rl-0.9.92-py3-none-any.whl (57.0 kB view details)

Uploaded Python 3

File details

Details for the file fast_rl-0.9.92.tar.gz.

File metadata

  • Download URL: fast_rl-0.9.92.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.10

File hashes

Hashes for fast_rl-0.9.92.tar.gz
Algorithm Hash digest
SHA256 ab2323cf2a317d184939725016f4b831e16a108b0bd761f57ee30f223b07ebfa
MD5 ae1492902f9ae4f6f44730f61a920e0d
BLAKE2b-256 958bd6da5ea3413d3101c3010ec51a080d797fd935e5c09e9b94be90487c9836

See more details on using hashes here.

File details

Details for the file fast_rl-0.9.92-py3-none-any.whl.

File metadata

  • Download URL: fast_rl-0.9.92-py3-none-any.whl
  • Upload date:
  • Size: 57.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.6.2 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.10

File hashes

Hashes for fast_rl-0.9.92-py3-none-any.whl
Algorithm Hash digest
SHA256 6697e231858e99df4566f544a41a6320ffde4b28c984c87632888924998842db
MD5 c51752d597a836d471c521f07dfc1718
BLAKE2b-256 8a90b2d4f43ffb597510782cf74b8a4abcc01a26f5991305046a531c2e7a1edb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page