Skip to main content

A modular, meta-learning-ready RL library.

Project description

Mighty Logo

PyPI Version Python License Test Doc Status


Mighty

Welcome to Mighty, hopefully your future one-stop shop for everything cRL. Currently Mighty is still in its early stages with support for normal gym envs, DACBench and CARL. The interface is controlled through hydra and we provide DQN, PPO and SAC algorithms. We log training and regular evaluations to file and optionally also to wandb. If you have any questions or feedback, please tell us, ideally via the GitHub issues! If you want to get started immediately, use our Template repository.

Mighty features:

  • Modular structure for easy (Meta-)RL tinkering
  • PPO, SAC and DQN as base algorithms
  • Environment integrations via Gymnasium, Pufferlib, CARL & DACBench
  • Implementations of some important baselines: RND, PLR, Cosine LR Schedule and more!

Installation

We recommend to using uv to install and run Mighty in a virtual environment. The code has been tested with python 3.11 on Unix systems.

First create a clean python environment:

uv venv --python=3.11
source .venv/bin/activate

Then install Mighty:

make install

Optionally you can install the dev requirements directly:

make install-dev

Alternatively, you can install Mighty from PyPI:

pip install mighty-rl

Run a Mighty Agent

In order to run a Mighty Agent, use the run_mighty.py script and provide any training options as keywords. If you want to know more about the configuration options, call:

python mighty/run_mighty.py --help

An example for running the PPO agent on the Pendulum gym environment looks like this:

python mighty/run_mighty.py 'algorithm=ppo' 'environment=gymnasium/pendulum'

Train your Agent on a CARL Environment

Mighty is designed with contextual RL in mind and therefore fully compatible with CARL. Before you start training, however, please follow the installation instructions in the CARL repo.

Then use the same command as before, but provide the CARL environment, in this example CARLCartPoleEnv, and information about the context distribution as keywords:

python mighty/run_mighty.py 'algorithm=ppo' 'env=CARLCartPole' '+env_kwargs.num_contexts=10' '+env_kwargs.context_feature_args.gravity=[normal, 9.8, 1.0, -100.0, 100.0]' 'env_wrappers=[mighty.mighty_utils.wrappers.FlattenVecObs]' 'algorithm_kwargs.rollout_buffer_kwargs.buffer_size=2048'

For more complex configurations like this, we recommend making an environment configuration file. Check out our CARL Ant file to see how this simplifies the process of working with configurable environments.

Learning a Configuration Policy via DAC

In order to use Mighty with DACBench, you need to install DACBench first. We recommend following the instructions in the DACBench repo.

Afterwards, configure the benchmark you want to run. Since most DACBench benchmarks have Dict action and observation spaces, some fairly complex, you might need to wrap DACBenchmarks in order to translate the observations and actions to an easy-to-handle format. We have a version of the FunctionApproximationBenchmark configured for you so you can get started like this:

python mighty/run_mighty.py 'algorithm=ppo' 'environment=dacbench/function_approximation'

The matching configuration file shows you how to set the search spaces and benchmark type. Refer to DACBench itself to learn how to configure other elements like observations spaces or instance sets.

Optimize Hyperparameters

You can optimize the hyperparameters of your algorithm with the Hypersweeper package, e.g. using SMAC3. Mighty is directly compatible with Hypersweeper and thus smart and distributed HPO! There are also other HPO options, check out our examples for more information.

Build Your Own Mighty Project

If you want to implement your own method in Mighty, we recommend using the Mighty template repository as a base. It contains a runscript, the most relevant config files and basic scripts for plotting. Our domain randomization example shows that you can get started right away. Since Mighty has many options of how to implement your idea, here's a rough guide which Mighty class you want to look at:

stateDiagram
  direction TB
  classDef Neutral stroke-width:1px,stroke-dasharray:none,stroke:#000000,fill:#FFFFFF,color:#000000;
  classDef Peach stroke-width:1px,stroke-dasharray:none,stroke:#FBB35A,fill:#FFEFDB,color:#8F632D;
  classDef Aqua stroke-width:1px,stroke-dasharray:none,stroke:#46EDC8,fill:#DEFFF8,color:#378E7A;
  classDef Sky stroke-width:1px,stroke-dasharray:none,stroke:#374D7C,fill:#E2EBFF,color:#374D7C;
  classDef Pine stroke-width:1px,stroke-dasharray:none,stroke:#254336,fill:#8faea5,color:#FFFFFF;
  classDef Rose stroke-width:1px,stroke-dasharray:none,stroke:#FF5978,fill:#FFDFE5,color:#8E2236;
  classDef Ash stroke-width:1px,stroke-dasharray:none,stroke:#999999,fill:#EEEEEE,color:#000000;
  classDef Seven fill:#E1BEE7,color:#D50000,stroke:#AA00FF;
  Still --> root_end:Yes
  Still --> Moving:No
  Moving --> Crash:Yes
  Moving --> s2:No, only current transitions, env and network
  s2 --> s6:Action Sampling
  s2 --> s10:Policy Update
  s2 --> s8:Training Batch Sampling
  s2 --> Crash:More than one/not listed
  s2 --> s12:Direct algorithm change
  s12 --> s13:Yes
  s12 --> s14:No
  Still:Modify training settings and then repeated runs?
  root_end:Runner
  Moving:Access to update infos (gradients, batches, etc.)?
  Crash:Meta Component
  s2:Which interaction point with the algorithm?
  s6:Exploration Policy
  s10:Update
  s8:Buffer
  s12:Change only the model architecture?
  s13:Network and/or Model
  s14:Agent
  class root_end Peach
  class Crash Aqua
  class s6 Sky
  class s8 Pine
  class s10 Rose
  class s13 Ash
  class s14 Seven
  class Still Neutral
  class Moving Neutral
  class s2 Neutral
  class s12 Neutral
  style root_end color:none
  style s8 color:#FFFFFF

Pre-Implemented Methods

Mighty is meant to be a platform to build upon and not a large collection of methods in itself. We have a few relevant methods pre-implemented, however, and this collection will likely grow over time:

  • Agents: SAC, PPO, DQN
  • Updates: SAC, PPO, Q-learning, double Q-learning, clipped double Q-learning
  • Buffers: Rollout Buffer, Replay Buffer, Prioritized Replay Buffer
  • Exploration Policies: e-greedy (with and without decay), ez-greedy, standard stochastic
  • Models (with MLP, CNN or ResNet backbone): SAC, PPO, DQN (with soft and hard reset options)
  • Meta Components: RND, NovelD, SPaCE, PLR
  • Runners: online RL runner, ES runner

Cite Us

If you use Mighty in your work, please cite us:

@misc{mohaneimer24,
  author    = {A. Mohan and T. Eimer and C. Benjamins and M. Lindauer and A. Biedenkapp},
  title     = {Mighty},
  year      = {2024},
  url = {https://github.com/automl/mighty}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mighty_rl-1.0.0.tar.gz (67.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mighty_rl-1.0.0-py3-none-any.whl (83.7 kB view details)

Uploaded Python 3

File details

Details for the file mighty_rl-1.0.0.tar.gz.

File metadata

  • Download URL: mighty_rl-1.0.0.tar.gz
  • Upload date:
  • Size: 67.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.14

File hashes

Hashes for mighty_rl-1.0.0.tar.gz
Algorithm Hash digest
SHA256 da4410b200385845c67e816a46afcbd1ead0f450d5c96b049e0f701b814465c6
MD5 c1c91003861f13a43316612cde34d4e4
BLAKE2b-256 27eb8c1c13c03918feab1ce4e9f76969861c9b68ba77fb79780aae85fc636651

See more details on using hashes here.

File details

Details for the file mighty_rl-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mighty_rl-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 83.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.6.14

File hashes

Hashes for mighty_rl-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2eba6147be6d537a2706c018df8c3d1b1ac40cec3f8b85c2d2794760686f2df0
MD5 ba0b087f87bfdb30d2dc19aa33da46e6
BLAKE2b-256 2f683a5006b84ff5a1b8f7ef07811b59f61ecb9f58a2932b143f33863c1e7abb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page