Implementation of modern reward and imitation learning algorithms.
Project description
Imitation Learning Baseline Implementations
This project aims to provide clean implementations of imitation and reward learning algorithms. Currently, we have implementations of the algorithms below. 'Discrete' and 'Continous' stands for whether the algorithm supports discrete or continuous action/state spaces respectively.
| Algorithm (+ link to paper) | API Docs | Discrete | Continuous |
|---|---|---|---|
| Behavioral Cloning | algorithms.bc |
✅ | ✅ |
| DAgger | algorithms.dagger |
✅ | ✅ |
| Density-Based Reward Modeling | algorithms.density |
✅ | ✅ |
| Maximum Causal Entropy Inverse Reinforcement Learning | algorithms.mce_irl |
✅ | ❌ |
| Adversarial Inverse Reinforcement Learning | algoritms.airl |
✅ | ✅ |
| Generative Adversarial Imitation Learning | algorithms.gail |
✅ | ✅ |
| Deep RL from Human Preferences | algorithms.preference_comparisons |
✅ | ✅ |
| Soft Q Imitation Learning | algorithms.sqil |
✅ | ❌ |
You can find the documentation here.
You can read the latest benchmark results here.
Installation
Prerequisites
- Python 3.8+
- (Optional) OpenGL (to render Gymnasium environments)
- (Optional) FFmpeg (to encode videos of renders)
Note:
imitationis only compatible with newer gymnasium environment API and does not support the oldergymAPI.
Installing PyPI release
Installing the PyPI release is the standard way to use imitation, and the recommended way for most users.
pip install imitation
Install from source
If you like, you can install imitation from source to contribute to the project or access the very last features before a stable release. You can do this by cloning the GitHub repository and running the installer directly. First run:
git clone http://github.com/HumanCompatibleAI/imitation && cd imitation.
For development mode, then run:
pip install -e ".[dev]"
This will run setup.py in development mode, and install the additional dependencies required for development. For regular use, run instead
pip install .
Additional extras are available depending on your needs. Namely, tests for running the test suite, docs for building the documentation, parallel for parallelizing the training, and atari for including atari environments. The dev extra already installs the tests, docs, and atari dependencies automatically, and tests installs the atari dependencies.
For macOS users, some packages are required to run experiments (see ./experiments/README.md for details). First, install Homebrew if not available (see Homebrew). Then, run:
brew install coreutils gnu-getopt parallel
CLI Quickstart
We provide several CLI scripts as a front-end to the algorithms implemented in imitation. These use Sacred for configuration and replicability.
# Train PPO agent on pendulum and collect expert demonstrations. Tensorboard logs saved in quickstart/rl/
python -m imitation.scripts.train_rl with pendulum environment.fast policy_evaluation.fast rl.fast fast logging.log_dir=quickstart/rl/
# Train GAIL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial gail with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart/rl/rollouts/final.npz demonstrations.source=local
# Train AIRL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial airl with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.path=quickstart/rl/rollouts/final.npz demonstrations.source=local
Tips:
- Remove the "fast" options from the commands above to allow training run to completion.
python -m imitation.scripts.train_rl print_configwill list Sacred script options. These configuration options are documented in each script's docstrings.
For more information on how to configure Sacred CLI options, see the Sacred docs.
Python Interface Quickstart
See examples/quickstart.py for an example script that loads CartPole-v1 demonstrations and trains BC, GAIL, and AIRL models on that data.
Density reward baseline
We also implement a density-based reward baseline. You can find an example notebook here.
Citations (BibTeX)
@misc{gleave2022imitation,
author = {Gleave, Adam and Taufeeque, Mohammad and Rocamonde, Juan and Jenner, Erik and Wang, Steven H. and Toyer, Sam and Ernestus, Maximilian and Belrose, Nora and Emmons, Scott and Russell, Stuart},
title = {imitation: Clean Imitation Learning Implementations},
year = {2022},
howPublished = {arXiv:2211.11972v1 [cs.LG]},
archivePrefix = {arXiv},
eprint = {2211.11972},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2211.11972},
}
Contributing
See Contributing to imitation for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imitation-1.0.1.tar.gz.
File metadata
- Download URL: imitation-1.0.1.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b6341e41bf4c8a572b05dccd819d7a70df6244318559fe83e41c9444f00d29c
|
|
| MD5 |
7659bd045ced58720cd37829edfc043c
|
|
| BLAKE2b-256 |
175ee31945c03461a8f9e3c285c552a27f6d5bfe9e10354bf8bb08c00c697574
|
File details
Details for the file imitation-1.0.1-py3-none-any.whl.
File metadata
- Download URL: imitation-1.0.1-py3-none-any.whl
- Upload date:
- Size: 216.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2024f479c0871000a9abf07c09669143332cd66b67b20c13683ff9b80e075b3
|
|
| MD5 |
e48a875f972f8367cb4f947c036673cd
|
|
| BLAKE2b-256 |
57ab6a08a515b9d7fe4d6317fbf13723784c69fa30623bbcb91d14b9fdd5a5c8
|