Reinforcement Learning: An Introduction

These details have not been verified by PyPI

Project links

Repository

Project description

Introduction

This is an implementation of concepts and algorithms described in "Reinforcement Learning: An Introduction" (Sutton and Barto, 2018, 2nd edition). It is a work in progress, implemented with the following objectives in mind.

Complete conceptual and algorithmic coverage: Implement all concepts and algorithms described in the text, plus some.
Minimal dependencies: All computation specific to the text is implemented here.
Complete test coverage: All implementations are paired with unit tests.
General-purpose design: The text provides concise pseudocode that is not difficult to implement for the examples covered; however, such implementations do not necessarily lead to reusable and extensible code that is generally applicable beyond such examples. The approach taken here should be generally applicable well beyond the text.

Please see the project website for a nicer version of this page.

Status

Quick Start

For single-click access to a graphical interface for RLAI, please click below:

Note that Binder notebooks are hosted for free by sponsors who donate computational infrastructure. Limitations are placed on each notebook, so don't expect the Binder interface to support heavy workloads. See the following section for alternatives.

Installation and Use

RLAI requires swig and ffmpeg to be installed on the system. These can be installed using a package manager on your OS (e.g., Homebrew for macOS, apt for Ubuntu, etc.). If installing with Homebrew on macOS, then you might need to add an environment variable pointing to ffmpeg as follows:

echo 'export IMAGEIO_FFMPEG_EXE="/opt/homebrew/bin/ffmpeg"' >> ~/.bash_profile

In order to use the HEADLESS=True environment variable locally, install Xvfb using Homebrew:

brew install --cask xquartz
sudo ln -s /opt/X11/bin/Xvfb /usr/local/bin/Xvfb

The RLAI code is distributed via PyPI. There are several ways to use the package.

JupyterLab notebook: Most of the RLAI functionality is exposed via the companion JupyterLab notebook. See the JupyterLab guide for more information.
Package dependency: See the example repository for how a project can be structured to consume the RLAI package functionality within source code.
Command-line interface: Using RLAI from the command-line interface (CLI) is demonstrated in the case studies below and is also explored in the CLI guide.
See here for how to use RLAI on a Raspberry Pi system.

Development

Looking for a place to dig in? Below are a few ideas organized by area of interest.

Explore new Gym environments: Gym provides a wide range of interesting environments, and experimenting with them can be as simple as modifying an existing training command (e.g., the one for inverted pendulum) and replacing the --gym-id with something else. Other changes might be needed depending on the environment, but Gym is particularly convenient.
Incorporate new statistical learning methods: The RLAI SKLearnSGD module demonstrates how to use methods in scikit-learn (in this case stochastic gradient descent regression) to approximate state-action value functions. This is just one approach, and it would be interesting to compare time, memory, and reward performance with a nonparametric approach like KNN regression.
Feel free to ask questions, submit issues, and submit pull requests.

Features

Diagnostic and interpretation tools: Diagnostic and interpretation tools become critical as the environment and agent increase in complexity (e.g., from tabular methods in small, discrete-space gridworlds to value function approximation methods in large, continuous-space control problems). Such tools can be found here.

Case Studies

The gridworld and other simple environments (e.g., gambler's problem) are used throughout the package to develop, implement, and test algorithmic concepts. Sutton and Barto do a nice job of explaining how reinforcement learning works for these environments. Below is a list of environments that are not covered in as much detail (e.g., the mountain car) or are not covered at all (e.g., Robocode). They are more difficult to train agents for and are instructive for understanding how agents are parameterized and rewarded.

Gymnasium

Gymnasium is a collection of environments that range from traditional control to advanced robotics. Case studies have been developed for the following environments, which are ordered roughly by increasing complexity:

Inverted Pendulum
Acrobot
Mountain Car
Mountain Car with Continuous Control
Lunar Lander with Continuous Control
MuJoCo Swimming Worm with Continuous Control
- A follow-up using process-level parallelization for faster, better results.
- See the MuJoCo section below for tips on installing MuJoCo.

MuJoCo

RLAI works with MuJoCo either via Gymnasium described above or directly via the MuJoCo-provided Python bindings. On macOS, see here for how to fix OpenGL errors.

Robocode

Robocode is a simulation-based robotic combat programming game with a dynamically rich environment, multi-agent teaming, and a large user community. Read more here.

Figures from the Textbook

A list of figures can be found here. Most of these are reproductions of those shown in the Sutton and Barto text; however, even the reproductions typically provide detail not shown in the text.

Links to Code

See here.

Bumping, Tagging, and Releasing Versions with Poetry

We follow semantic versioning and Python Packaging specifications when bumping and releasing.

Prerelease

Prereleases are useful for testing changes prior to an official release. These releases include alpha (a), beta (b), and release candidate (rc) versions, which are successively mature release phases on the path to an official release.

Bump the minor prerelease (e.g., 0.2.0 → 0.3.0a0):

OLD_VERSION=$(poetry version --short)
poetry version preminor
VERSION=$(poetry version --short)
git commit -a -m "Bump minor prerelease:  ${OLD_VERSION} → ${VERSION}"
git push

Bump the prerelease number within the current prerelease phase (e.g., 0.1.0a0 → 0.1.0a1):

OLD_VERSION=$(poetry version --short)
poetry version prerelease
VERSION=$(poetry version --short)
git commit -a -m "Bump prerelease number:  ${OLD_VERSION} → ${VERSION}"
git push

Bump the prerelease phase (e.g., 0.1.0a1 → 0.1.0b0):

OLD_VERSION=$(poetry version --short)
poetry version prerelease --next-phase
VERSION=$(poetry version --short)
git commit -a -m "Bump prerelease phase:  ${OLD_VERSION} → ${VERSION}"
git push

The prerelease phases progress as alpha (a), beta (b), and release candidate (rc), each time resetting to a prerelease number of 0. After rc, the prerelease suffix (e.g., rc3) is stripped, leaving the major.minor.patch release version.

Patch

A patch release fixes one or more issues in a previous release.

Bump the patch version (e.g., 0.1.0b1 → 0.1.1):

OLD_VERSION=$(poetry version --short)
poetry version patch
VERSION=$(poetry version --short)
git commit -a -m "Bump patch:  ${OLD_VERSION} → ${VERSION}"
git push

Minor

A minor release adds functionality in a backwards compatible fashion.

Bump the minor version (e.g., 0.1.0b1 → 0.1.0):

OLD_VERSION=$(poetry version --short)
poetry version minor
VERSION=$(poetry version --short)
git commit -a -m "Bump minor:  ${OLD_VERSION} → ${VERSION}"
git push

Major

A major release adds functionality in a backwards incompatible fashion.

Bump the major version (e.g., 0.1.0a0 → 2.0.0):

OLD_VERSION=$(poetry version --short)
poetry version major
VERSION=$(poetry version --short)
git commit -a -m "Bump major:  ${OLD_VERSION} → ${VERSION}"
git push

Tagging

Tagging the current version enables the publication of a new release to PyPI via GitHub workflow. Tag the current version (e.g., v2.0.0):

VERSION=$(poetry version --short)
git tag -a -m "version ${VERSION}" "v${VERSION}"
git push --follow-tags

Then create a new release from the tag. Doing this will trigger the publication workflow to run, which builds a new release and uploads it to PyPI.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

This version

1.6.0

Jun 14, 2026

1.4.0

Aug 13, 2024

1.3.0

Jun 9, 2024

1.0.0

May 26, 2022

0.23.0

Dec 16, 2021

0.22.0

Dec 15, 2021

0.21.0

Sep 27, 2021

0.21.0.dev0 pre-release

Sep 26, 2021

0.20.0

Sep 5, 2021

0.19.0

Aug 15, 2021

0.19.0.dev0 pre-release

Aug 15, 2021

0.17.0

Feb 17, 2021

0.16.0

Feb 15, 2021

0.15.0

Jan 24, 2021

0.14.0

Dec 27, 2020

0.13.0

Dec 10, 2020

0.12.0

Dec 7, 2020

0.11.0

Nov 30, 2020

0.10.0

Nov 27, 2020

0.10.0.dev0 pre-release

Nov 27, 2020

0.8.0

Nov 6, 2020

0.7.0

Nov 4, 2020

0.6.0

Nov 2, 2020

0.5.0

Oct 30, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlai-1.6.0.tar.gz (603.7 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rlai-1.6.0-py3-none-any.whl (635.5 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file rlai-1.6.0.tar.gz.

File metadata

Download URL: rlai-1.6.0.tar.gz
Upload date: Jun 14, 2026
Size: 603.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.11.15 Linux/6.17.0-1018-azure

File hashes

Hashes for rlai-1.6.0.tar.gz
Algorithm	Hash digest
SHA256	`2fe8abd3b0aa22065e2fc3b8862d940ecd8d2d1742ecdab331a65f4b22e63c33`
MD5	`416595dc72f797fb2a147b506ba317a2`
BLAKE2b-256	`e96aa16ca8d03ae9590b92de05ae890c4a1c610629f9cc74078e98d555a45203`

See more details on using hashes here.

File details

Details for the file rlai-1.6.0-py3-none-any.whl.

File metadata

Download URL: rlai-1.6.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 635.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.4.1 CPython/3.11.15 Linux/6.17.0-1018-azure

File hashes

Hashes for rlai-1.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72bd1459c97274ef6d924b5a0d41abe9b81fb2778876d9deedae7593119cce47`
MD5	`50312f8b2ff6e9165707989973f1c7c5`
BLAKE2b-256	`2094e84a7d73700d68abe0b16b6dff0ed9393f17776480364dd19bb7dff26826`

See more details on using hashes here.

rlai 1.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Introduction

Status

Quick Start

Installation and Use

Development

Features

Case Studies

Gymnasium

MuJoCo

Robocode

Figures from the Textbook

Links to Code

Bumping, Tagging, and Releasing Versions with Poetry

Prerelease

Patch

Minor

Major

Tagging

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes