Skip to main content

Implementation of modern reward and imitation learning algorithms.

Project description

CircleCI Documentation Status codecov PyPI version

Imitation Learning Baseline Implementations

This project aims to provide clean implementations of imitation and reward learning algorithms. Currently, we have implementations of the algorithms below. 'Discrete' and 'Continous' stands for whether the algorithm supports discrete or continuous action/state spaces respectively.

Algorithm (+ link to paper) API Docs Discrete Continuous
Behavioral Cloning algorithms.bc
DAgger algorithms.dagger
Density-Based Reward Modeling algorithms.density
Maximum Causal Entropy Inverse Reinforcement Learning algorithms.mce_irl
Adversarial Inverse Reinforcement Learning algoritms.airl
Generative Adversarial Imitation Learning algorithms.gail
Deep RL from Human Preferences algorithms.preference_comparisons
Soft Q Imitation Learning algorithms.sqil

You can find the documentation here.

Installation

Prerequisites

  • Python 3.8+
  • (Optional) OpenGL (to render Gym environments)
  • (Optional) FFmpeg (to encode videos of renders)
  • (Optional) MuJoCo (follow instructions to install mujoco_py v1.5 here)

Installing PyPI release

Installing the PyPI release is the standard way to use imitation, and the recommended way for most users.

pip install imitation

Install from source

If you like, you can install imitation from source to contribute to the project or access the very last features before a stable release. You can do this by cloning the GitHub repository and running the installer directly. First run: git clone http://github.com/HumanCompatibleAI/imitation && cd imitation.

For development mode, then run:

pip install -e ".[dev]"

This will run setup.py in development mode, and install the additional dependencies required for development. For regular use, run instead

pip install .

Additional extras are available depending on your needs. Namely, tests for running the test suite, docs for building the documentation, parallel for parallelizing the training, and atari for including atari environments. The dev extra already installs the tests, docs, and atari dependencies automatically, and tests installs the atari dependencies.

For macOS users, some packages are required to run experiments (see ./experiments/README.md for details). First, install Homebrew if not available (see Homebrew). Then, run:

brew install coreutils gnu-getopt parallel

CLI Quickstart

We provide several CLI scripts as a front-end to the algorithms implemented in imitation. These use Sacred for configuration and replicability.

From examples/quickstart.sh:

# Train PPO agent on pendulum and collect expert demonstrations. Tensorboard logs saved in quickstart/rl/
python -m imitation.scripts.train_rl with pendulum environment.fast policy_evaluation.fast rl.fast fast logging.log_dir=quickstart/rl/

# Train GAIL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial gail with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart/rl/rollouts/final.npz demonstrations.source=local

# Train AIRL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial airl with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart/rl/rollouts/final.npz demonstrations.source=local

Tips:

  • Remove the "fast" options from the commands above to allow training run to completion.
  • python -m imitation.scripts.train_rl print_config will list Sacred script options. These configuration options are documented in each script's docstrings.

For more information on how to configure Sacred CLI options, see the Sacred docs.

Python Interface Quickstart

See examples/quickstart.py for an example script that loads CartPole-v1 demonstrations and trains BC, GAIL, and AIRL models on that data.

Density reward baseline

We also implement a density-based reward baseline. You can find an example notebook here.

Citations (BibTeX)

@misc{gleave2022imitation,
  author = {Gleave, Adam and Taufeeque, Mohammad and Rocamonde, Juan and Jenner, Erik and Wang, Steven H. and Toyer, Sam and Ernestus, Maximilian and Belrose, Nora and Emmons, Scott and Russell, Stuart},
  title = {imitation: Clean Imitation Learning Implementations},
  year = {2022},
  howPublished = {arXiv:2211.11972v1 [cs.LG]},
  archivePrefix = {arXiv},
  eprint = {2211.11972},
  primaryClass = {cs.LG},
  url = {https://arxiv.org/abs/2211.11972},
}

Contributing

See Contributing to imitation for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imitation-1.0.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

imitation-1.0.0-py3-none-any.whl (216.4 kB view details)

Uploaded Python 3

File details

Details for the file imitation-1.0.0.tar.gz.

File metadata

  • Download URL: imitation-1.0.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for imitation-1.0.0.tar.gz
Algorithm Hash digest
SHA256 40f6adba4baa03d54d5281b48112e4aa469f9ccb6d2da943a1a9d4b7d70c052e
MD5 406ada8ccba6b554f2af3cc51b06aba4
BLAKE2b-256 b433b0e2a105e22d767aae4c8d240f31d60b3cbbf21831835a3c31693b03861b

See more details on using hashes here.

File details

Details for the file imitation-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: imitation-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 216.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for imitation-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f052e9614618bf5b1f3b08f70be9025eda09a5eadaad5551562e29cbaf9f8bfe
MD5 7038a4931958a1a1715538fab7ab1b33
BLAKE2b-256 1eff84b95906947c9c27386dc736aa9f6e60794ba87f9dd79bd313e5e7319fb3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page