Skip to main content

yarlp

Project description

|Build Status|

yarlp
-----

**Yet Another Reinforcement Learning Package**

Implementations of ```CEM`` </yarlp/agent/cem_agent.py>`__,
```REINFORCE`` </yarlp/agent/pg_agents.py>`__,
```TRPO`` </yarlp/agent/trpo_agent.py>`__,
```DDQN`` </yarlp/agent/ddqn_agent.py>`__,
```A2C`` </yarlp/agent/a2c_agent.py>`__ with reproducible benchmarks.
Experiments are templated using ``jsonschema`` and are compared to
published results. This is meant to be a starting point for working
implementations of classic RL algorithms. Unfortunately even
implementations from OpenAI baselines are `not always
reproducible <https://github.com/openai/baselines/issues/176>`__.

A working Dockerfile with ``yarlp`` installed can be run with:

- ``docker build -t "yarlpd" .``
- ``docker run -it yarlpd bash``

To run a benchmark, simply:

``python yarlp/experiment/experiment.py --help``

If you want to run things manually, look in ``examples`` or look at
this:

.. code:: python

from yarlp.agent.trpo_agent import TRPOAgent
from yarlp.utils.env_utils import NormalizedGymEnv

env = NormalizedGymEnv('MountainCarContinuous-v0')
agent = TRPOAgent(env, seed=123)
agent.train(max_timesteps=1000000)

Benchmarks
----------

We benchmark against published results and Openai
```baselines`` <https://github.com/openai/baselines>`__ where available
using
```yarlp/experiment/experiment.py`` </yarlp/experiment/experiment.py>`__.
Benchmark scripts for Openai ``baselines`` were made ad-hoc, such as
`this
one <https://github.com/btaba/baselines/blob/master/baselines/trpo_mpi/run_trpo_experiment.py>`__.

Atari10M
~~~~~~~~

+---------------+--------------+-------------------+
| |BeamRider| | |Breakout| | |Pong| |
+---------------+--------------+-------------------+
| |QBert| | |Seaquest| | |SpaceInvaders| |
+---------------+--------------+-------------------+

DDQN with dueling networks and prioritized replay
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``python yarlp/experiment/experiment.py run_atari10m_ddqn_benchmark``

I trained 6 Atari environments for 10M time-steps (**40M frames**),
using 1 random seed, since I only have 1 GPU and limited time on this
Earth. I used DDQN with dueling networks, but no prioritized replay
(although it's implemented). I compare the final mean 100 episode raw
scores for yarlp (with exploration of 0.01) with results from `Hasselt
et al, 2015 <https://arxiv.org/pdf/1509.06461.pdf>`__ and `Wang et al,
2016 <https://arxiv.org/pdf/1511.06581.pdf>`__ which train for **200M
frames** and evaluate on 100 episodes (exploration of 0.05).

I don't compare to OpenAI baselines because the OpenAI DDQN
implementation is **not** currently able to reproduce published results
as of 2018-01-20. See `this github
issue <https://github.com/openai/baselines/issues/176>`__, although I
found `these benchmark
plots <https://github.com/openai/baselines-results/blob/master/dqn_results.ipynb>`__
to be pretty helpful.

+------+------+------+------+
| env | yarl | Hass | Wang |
| | p | elt | et |
| | DUEL | et | al |
| | 40M | al | DUEL |
| | Fram | DDQN | 200M |
| | es | 200M | Fram |
| | | Fram | es |
| | | es | |
+======+======+======+======+
| Beam | 8705 | 7654 | 1216 |
| Ride | | | 4 |
| r | | | |
+------+------+------+------+
| Brea | 423. | 375 | 345 |
| kout | 5 | | |
+------+------+------+------+
| Pong | 20.7 | 21 | 21 |
| | 3 | | |
+------+------+------+------+
| QBer | 5410 | 1487 | 1922 |
| t | .75 | 5 | 0.3 |
+------+------+------+------+
| Seaq | 5300 | 7995 | 5024 |
| uest | .5 | | 5.2 |
+------+------+------+------+
| Spac | 1978 | 3154 | 6427 |
| eInv | .2 | .6 | .3 |
| ader | | | |
| s | | | |
+------+------+------+------+

+------+------+------+------+
| |Bea | |Bre | |Pon | |Qbe |
| mRid | akou | gNoF | rtNo |
| erNo | tNoF | rame | Fram |
| Fram | rame | skip | eski |
| eski | skip | -v4| | p-v4 |
| p-v4 | -v4| | | | |
| | | | | |
+------+------+------+------+
| |Sea | |Spa | | |
| ques | ceIn | | |
| tNoF | vade | | |
| rame | rsNo | | |
| skip | Fram | | |
| -v4| | eski | | |
| | p-v4 | | |
| | | | | |
+------+------+------+------+

A2C
^^^

``python yarlp/experiment/experiment.py run_atari10m_a2c_benchmark``

A2C on 10M time-steps (**40M frames**) with 1 random seed. Results
compared to learning curves from `Mnih et al,
2016 <https://arxiv.org/pdf/1602.01783.pdf>`__ extracted at 10M
time-steps from Figure 3. You are invited to run for multiple seeds and
the full 200M frames for a better comparison.

+-----------------+-----------------+---------------------------------+
| env | yarlp A2C 40M | Mnih et al A3C 40M 16-threads |
+=================+=================+=================================+
| BeamRider | 3150 | ~3000 |
+-----------------+-----------------+---------------------------------+
| Breakout | 418 | ~150 |
+-----------------+-----------------+---------------------------------+
| Pong | 20 | ~20 |
+-----------------+-----------------+---------------------------------+
| QBert | 3644 | ~1000 |
+-----------------+-----------------+---------------------------------+
| SpaceInvaders | 805 | ~600 |
+-----------------+-----------------+---------------------------------+

+------+------+------+------+
| |Bea | |Bre | |Pon | |Qbe |
| mRid | akou | gNoF | rtNo |
| erNo | tNoF | rame | Fram |
| Fram | rame | skip | eski |
| eski | skip | -v4| | p-v4 |
| p-v4 | -v4| | | | |
| | | | | |
+------+------+------+------+
| |Sea | |Spa | | |
| ques | ceIn | | |
| tNoF | vade | | |
| rame | rsNo | | |
| skip | Fram | | |
| -v4| | eski | | |
| | p-v4 | | |
| | | | | |
+------+------+------+------+

Here are some `more
plots <https://github.com/openai/baselines-results/blob/master/acktr_ppo_acer_a2c_atari.ipynb>`__
from OpenAI to compare against.

Mujoco1M
~~~~~~~~

TRPO
^^^^

``python yarlp/experiment/experiment.py run_mujoco1m_benchmark``

We average over 5 random seeds instead of 3 for both ``baselines`` and
``yarlp``. More seeds probably wouldn't hurt here, we report 95th
percent confidence intervals.

+-------------------------------+--------------------+-------------------------+----------------+
| |Hopper-v1| | |HalfCheetah-v1| | |Reacher-v1| | |Swimmer-v1| |
+-------------------------------+--------------------+-------------------------+----------------+
| |InvertedDoublePendulum-v1| | |Walker2d-v1| | |InvertedPendulum-v1| | |
+-------------------------------+--------------------+-------------------------+----------------+

CLI scripts
-----------

CLI convenience scripts will be installed with the package:

- Run a benchmark:

- ``python yarlp/experiment/experiment.py --help``

- Plot ``yarlp`` compared to Openai ``baselines`` benchmarks:

- ``compare_benchmark <yarlp-experiment-dir> <baseline-experiment-dir>``

- Experiments:

- Experiments can be defined using json, validated with
``jsonschema``. See `here </experiment_configs>`__ for sample
experiment configs. You can do a grid search if multiple
parameters are specified, which will run in parallel.
- Example:
``run_yarlp_experiment --spec-file experiment_configs/trpo_experiment_mult_params.json``

- Experiment plots:

- ``make_plots <experiment-dir>``

.. |Build Status| image:: https://travis-ci.org/btaba/yarlp.svg?branch=master
:target: https://travis-ci.org/btaba/yarlp
.. |BeamRider| image:: /assets/atari10m/ddqn/beamrider.gif
.. |Breakout| image:: /assets/atari10m/ddqn/breakout.gif
.. |Pong| image:: /assets/atari10m/ddqn/pong.gif
.. |QBert| image:: /assets/atari10m/ddqn/qbert.gif
.. |Seaquest| image:: /assets/atari10m/ddqn/seaquest.gif
.. |SpaceInvaders| image:: /assets/atari10m/ddqn/spaceinvaders.gif
.. |BeamRiderNoFrameskip-v4| image:: /assets/atari10m/ddqn/BeamRiderNoFrameskip-v4.png
.. |BreakoutNoFrameskip-v4| image:: /assets/atari10m/ddqn/BreakoutNoFrameskip-v4.png
.. |PongNoFrameskip-v4| image:: /assets/atari10m/ddqn/PongNoFrameskip-v4.png
.. |QbertNoFrameskip-v4| image:: /assets/atari10m/ddqn/QbertNoFrameskip-v4.png
.. |SeaquestNoFrameskip-v4| image:: /assets/atari10m/ddqn/SeaquestNoFrameskip-v4.png
.. |SpaceInvadersNoFrameskip-v4| image:: /assets/atari10m/ddqn/SpaceInvadersNoFrameskip-v4.png
.. |BeamRiderNoFrameskip-v4| image:: /assets/atari10m/a2c/BeamRiderNoFrameskip-v4.png
.. |BreakoutNoFrameskip-v4| image:: /assets/atari10m/a2c/BreakoutNoFrameskip-v4.png
.. |PongNoFrameskip-v4| image:: /assets/atari10m/a2c/PongNoFrameskip-v4.png
.. |QbertNoFrameskip-v4| image:: /assets/atari10m/a2c/QbertNoFrameskip-v4.png
.. |SeaquestNoFrameskip-v4| image:: /assets/atari10m/a2c/SeaquestNoFrameskip-v4.png
.. |SpaceInvadersNoFrameskip-v4| image:: /assets/atari10m/a2c/SpaceInvadersNoFrameskip-v4.png
.. |Hopper-v1| image:: /assets/mujoco1m/trpo/Hopper-v1.png
.. |HalfCheetah-v1| image:: /assets/mujoco1m/trpo/HalfCheetah-v1.png
.. |Reacher-v1| image:: /assets/mujoco1m/trpo/Reacher-v1.png
.. |Swimmer-v1| image:: /assets/mujoco1m/trpo/Swimmer-v1.png
.. |InvertedDoublePendulum-v1| image:: /assets/mujoco1m/trpo/InvertedDoublePendulum-v1.png
.. |Walker2d-v1| image:: /assets/mujoco1m/trpo/Walker2d-v1.png
.. |InvertedPendulum-v1| image:: /assets/mujoco1m/trpo/InvertedPendulum-v1.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
yarlp-0.1.0-py3.5.egg (541.9 kB) Copy SHA256 hash SHA256 Egg 3.5 Apr 1, 2018

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page