A next-generation MARL evaluation framework for comprehensive robustness testing.

These details have not been verified by PyPI

Project links

Project description

Gauntlet MARL Benchmark

A next-generation MARL evaluation framework for comprehensive robustness testing of multi-agent reinforcement learning policies. This library provides a unified approach to evaluating agents against diverse adversarial strategies, environmental variations, and temporal challenges.

Features

Comprehensive Adversarial Testing: Evaluate policies against neural adversaries, adaptive opponents, and strategic challengers
Environmental Robustness: Test across multiple environment configurations and domain shifts
Temporal Evaluation: Assess continual learning capabilities and catastrophic forgetting
Rich Visualization: Generate detailed plots and reports for analysis
Extensible Architecture: Easy to add new environments, challengers, and evaluation metrics

Installation

Install the core library from PyPI:

pip install gauntlet-benchmark

For additional features, you can install optional dependencies:

# For Nash Equilibrium metrics
pip install gauntlet-benchmark[nash]

# For OpenSpiel environments
pip install gauntlet-benchmark[openspiel]

# For PettingZoo environments
pip install gauntlet-benchmark[pettingzoo]

# For development tools
pip install gauntlet-benchmark[dev]

Development Installation

If you're installing from source or in development mode:

# Install in development mode
pip install -e .

Note: After installation, import the package as from gauntlet import ..., not from gauntlet_benchmark import ...

Quickstart

Here's a simple example of how to evaluate a basic policy:

import torch
import torch.nn as nn
from gauntlet import EnhancedGauntletBenchmark, EvaluationConfig

# 1. Define your policy
class SimpleRPSPolicy(nn.Module):
    def __init__(self):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(6, 32),
            nn.ReLU(),
            nn.Linear(32, 3)
        )
    def forward(self, x):
        return self.network(x)

# 2. Configure the benchmark
config = EvaluationConfig(
    num_episodes=100,
    parallel_workers=4,
    save_visualizations=True
)
gauntlet = EnhancedGauntletBenchmark(config)

# 3. Instantiate your policy
my_policy = SimpleRPSPolicy()
my_policy.to(torch.device(config.device))

# 4. Run the evaluation!
# By default, Gauntlet has an RPS environment registered.
print("Running evaluation against the built-in challenger suite...")
metrics = gauntlet.evaluate_policy(my_policy, "SimpleRPSPolicy")

# 5. Generate a report
report = gauntlet.generate_report("my_policy_report.json")
print(f"Evaluation complete! Robustness Score: {metrics.robustness_score:.3f}")
print("Report and visualizations saved to current directory.")

Advanced Usage

Custom Challenger Agents

Create your own adversarial agents by subclassing ChallengerAgent:

from gauntlet import ChallengerAgent
import torch.nn as nn

class MyCustomChallenger(ChallengerAgent):
    def __init__(self, strategy="aggressive"):
        super().__init__(name=f"CustomChallenger-{strategy}")
        self.strategy = strategy
        # Initialize your custom model here

    def act(self, observation, legal_actions=None):
        # Implement your adversarial strategy
        return self.model(observation)

Environment Integration

Add new environments by implementing the Environment interface:

from gauntlet import Environment

class MyCustomEnvironment(Environment):
    def __init__(self, config):
        super().__init__(config)
        # Initialize your environment

    def reset(self):
        # Reset environment state
        return initial_observation

    def step(self, actions):
        # Execute actions and return next state
        return observation, rewards, done, info

Documentation

For detailed documentation, API reference, and advanced examples, visit our GitHub repository.

Contributing

We welcome contributions! Please see our contributing guidelines for details on how to get involved.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this library in your research, please cite:

@software{gauntlet_benchmark,
  title={Gauntlet Benchmark},
  author={Tanvish Desai},
  year={2024},
  url={https://github.com/tanvishdesai/gauntlet-benchmark}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.3

Aug 23, 2025

0.1.2

Aug 23, 2025

0.1.1

Aug 23, 2025

0.1.0

Aug 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gauntlet_benchmark-0.1.3.tar.gz (45.4 kB view details)

Uploaded Aug 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gauntlet_benchmark-0.1.3-py3-none-any.whl (45.4 kB view details)

Uploaded Aug 23, 2025 Python 3

File details

Details for the file gauntlet_benchmark-0.1.3.tar.gz.

File metadata

Download URL: gauntlet_benchmark-0.1.3.tar.gz
Upload date: Aug 23, 2025
Size: 45.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for gauntlet_benchmark-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`ae3d0c24c0f420c3d72c2e1acd61baa674cbb5bf65b458dc92a75ecc41b4d985`
MD5	`3b90542a8c90d5bb8d54ba68debe60fa`
BLAKE2b-256	`2687996c2b5c782f870af0482c4fca673893e3ea1ba34ed3c6a87b44d2746f21`

See more details on using hashes here.

File details

Details for the file gauntlet_benchmark-0.1.3-py3-none-any.whl.

File metadata

Download URL: gauntlet_benchmark-0.1.3-py3-none-any.whl
Upload date: Aug 23, 2025
Size: 45.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for gauntlet_benchmark-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e26afd6bf5174bd35450d540d95c8d24725c24a56dd68b49f993d6f875fa7a6`
MD5	`4e842b27c21eb01ab73a6b842c583fc5`
BLAKE2b-256	`459c550ae71674fee7a42f4ce11db3b65721cd2560a52d9037600c2819f2f07e`

See more details on using hashes here.

gauntlet-benchmark 0.1.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Gauntlet MARL Benchmark

Features

Installation

Development Installation

Quickstart

Advanced Usage

Custom Challenger Agents

Environment Integration

Documentation

Contributing

License

Citation

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes