Skip to main content

Reinforcement learning in pure JAX.

Project description

Dopamax

Dopamax is a library containing pure JAX implementations of common reinforcement learning algorithms. Everything is implemented in JAX, including the environments. This allows for extremely fast training and evaluation of agents, because the entire loop of environment simulation, agent interaction, and policy updates can be compiled as a single XLA program and executed on CPUs, GPUs, or TPUs. More specifically, rhe implementations in Dopamax follow the Anakin Podracer architecture -- see this paper for more details.

Supported Algorithms

Installation

Dopamax can be installed with:

pip install git+https://github.com/rystrauss/dopamax.git

This will install the dopamax Python package, as well as a command-line interface (CLI) for training and evaluation.

Usage

After installation, the Dopamax CLI can be used to train and evaluate agents:

dopamax --help

Dopamax uses Weights and Biases (W&B) for logging and artifact management. Before using the CLI for training and evaluation, you must first make sure you have a W&B account (it's free) and have authenticated with wandb login.

Training

Agent's can be trained using the dopamax train command, to which you must provide a configuration file. The configuration file is a YAML file that specifies the agent, environment, and training hyperparameters. You can find examples in the configs directory. For example, to train a PPO agent on the CartPole environment, you would run:

dopamax train --config examples/ppo-cartpole/config.yaml

Note that all of the example config files have a random seed specified, so you will get the same result every time you run the command. The seeds provided in the examples are known to result in a successful run (with the given hyperparameters). To get different results on each run, you can remove the seed from the config file.

Evaluation

Once you have trained some agents, you can evaluate them using the dopamax evaluate command. This will allow you to specify a W&B agent artifact that you'd like to evaluate (these artifacts are produced by the training runs and contain the agent hyperparameters and weights from the end of training). For example, to evaluate a PPO agent trained on CartPole, you might use a command like:

dopamax evaluate --agent_artifact CartPole-PPO-agent:v0 --num_episodes 100

where --num_episodes 100 signals that you would like to rollout the agent's policy for 100 episodes. The minimum, mean, and maximum episode reward will be logged back to W&B. If you would additionally like to render the episodes and have then logged back to W&B, you can provide the --render flag. But note that this will usually significantly slow down the evaluation process since environment rendering is not a pure JAX function and requires callbacks to the host. You should usually only use the --render flag with a small number of episodes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dopamax-0.1.0.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

dopamax-0.1.0-py3-none-any.whl (38.9 kB view details)

Uploaded Python 3

File details

Details for the file dopamax-0.1.0.tar.gz.

File metadata

  • Download URL: dopamax-0.1.0.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for dopamax-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2826ddc264f1009da3a31e0b5f691c7e6d5c1bae341ee6346ec5a5813bc3f0d7
MD5 ee56d6247463e1e53028d3c99cba2f8a
BLAKE2b-256 f6894d9a1bf7e0b77a31317e45a5a5e78f4359898bf6ceff2568d75d14ce7201

See more details on using hashes here.

File details

Details for the file dopamax-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dopamax-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 38.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for dopamax-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 865a6dd0383968eae8aa371b3bfedf1345d09b48b40daa7668f3584551aa4b10
MD5 fdd3c1cd2e50356251a15ce9208d61af
BLAKE2b-256 1ac4c73f95755dfb5b913a3c1c17c8f45952b58927692655923cf9b46d891dda

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page