Skip to main content

Reinforcement learning in pure JAX.

Project description

Dopamax

Dopamax is a library containing pure JAX implementations of common reinforcement learning algorithms. Everything is implemented in JAX, including the environments. This allows for extremely fast training and evaluation of agents, because the entire loop of environment simulation, agent interaction, and policy updates can be compiled as a single XLA program and executed on CPUs, GPUs, or TPUs. More specifically, the implementations in Dopamax follow the Anakin Podracer architecture -- see this paper for more details.

Note that this repository is not actively maintained and is subject to breaking changes at any time.

Supported Algorithms

Installation

Dopamax can be installed with:

pip install dopamax

This will install the dopamax Python package, as well as a command-line interface (CLI) for training and evaluation. Note that only the CPU version of JAX is installed by default. If you would like to use a GPU or TPU, you will need to install the appropriate version of JAX. See the JAX installation instructions.

Usage

After installation, the Dopamax CLI can be used to train and evaluate agents:

dopamax --help

Dopamax uses Weights and Biases (W&B) for logging and artifact management. Before using the CLI for training and evaluation, you must first make sure you have a W&B account (it's free) and have authenticated with wandb login.

Training

Agent's can be trained using the dopamax train command, to which you must provide a configuration file. The configuration file is a YAML file that specifies the agent, environment, and training hyperparameters. You can find examples in the configs directory. For example, to train a PPO agent on the CartPole environment, you would run:

dopamax train --config examples/ppo-cartpole/config.yaml

Note that all of the example config files have a random seed specified, so you will get the same result every time you run the command. The seeds provided in the examples are known to result in a successful run (with the given hyperparameters). To get different results on each run, you can remove the seed from the config file.

Evaluation

Once you have trained some agents, you can evaluate them using the dopamax evaluate command. This will allow you to specify a W&B agent artifact that you'd like to evaluate (these artifacts are produced by the training runs and contain the agent hyperparameters and weights from the end of training). For example, to evaluate a PPO agent trained on CartPole, you might use a command like:

dopamax evaluate --agent_artifact CartPole-PPO-agent:v0 --num_episodes 100

where --num_episodes 100 signals that you would like to rollout the agent's policy for 100 episodes. The minimum, mean, and maximum episode reward will be logged back to W&B. If you would additionally like to render the episodes and have then logged back to W&B, you can provide the --render flag. But note that this will usually significantly slow down the evaluation process since environment rendering is not a pure JAX function and requires callbacks to the host. You should usually only use the --render flag with a small number of episodes.

See Also

Some of the JAX-native packages that Dopamax relies on:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dopamax-0.2.0.tar.gz (33.8 kB view details)

Uploaded Source

Built Distribution

dopamax-0.2.0-py3-none-any.whl (49.6 kB view details)

Uploaded Python 3

File details

Details for the file dopamax-0.2.0.tar.gz.

File metadata

  • Download URL: dopamax-0.2.0.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for dopamax-0.2.0.tar.gz
Algorithm Hash digest
SHA256 eb4d8b54d5db2130c49fd1d67354ab0b9c7ba279ed17c66cd20a48c44a3d1d0c
MD5 cde027e88c3366d14c18681a21d0e1f1
BLAKE2b-256 dc69008530d54f5a6460718437bc071595f761099a91d2f35e5ea0416481d083

See more details on using hashes here.

File details

Details for the file dopamax-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dopamax-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 49.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for dopamax-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9e0787b56036cb968ec541182e7292cbb9baeb2c265198ed6824fdcdd265e748
MD5 f08f04a97382da6ff2fa592d483c4d7d
BLAKE2b-256 c9ff1daf98796cddbf7fd451b993e6ff352c9a62df4d0051b002234361c8ff68

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page