Skip to main content

A next-generation MARL evaluation framework for comprehensive robustness testing.

Project description

Gauntlet MARL Benchmark

A next-generation MARL evaluation framework for comprehensive robustness testing of multi-agent reinforcement learning policies. This library provides a unified approach to evaluating agents against diverse adversarial strategies, environmental variations, and temporal challenges.

Features

  • Comprehensive Adversarial Testing: Evaluate policies against neural adversaries, adaptive opponents, and strategic challengers
  • Environmental Robustness: Test across multiple environment configurations and domain shifts
  • Temporal Evaluation: Assess continual learning capabilities and catastrophic forgetting
  • Rich Visualization: Generate detailed plots and reports for analysis
  • Extensible Architecture: Easy to add new environments, challengers, and evaluation metrics

Installation

Install the core library from PyPI:

pip install gauntlet-benchmark

For additional features, you can install optional dependencies:

# For Nash Equilibrium metrics
pip install gauntlet-benchmark[nash]

# For OpenSpiel environments
pip install gauntlet-benchmark[openspiel]

# For PettingZoo environments
pip install gauntlet-benchmark[pettingzoo]

# For development tools
pip install gauntlet-benchmark[dev]

Development Installation

If you're installing from source or in development mode:

# Install in development mode
pip install -e .

Note: After installation, import the package as from gauntlet import ..., not from gauntlet_benchmark import ...

Quickstart

Here's a simple example of how to evaluate a basic policy:

import torch
import torch.nn as nn
from gauntlet import EnhancedGauntletBenchmark, EvaluationConfig

# 1. Define your policy
class SimpleRPSPolicy(nn.Module):
    def __init__(self):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(6, 32),
            nn.ReLU(),
            nn.Linear(32, 3)
        )
    def forward(self, x):
        return self.network(x)

# 2. Configure the benchmark
config = EvaluationConfig(
    num_episodes=100,
    parallel_workers=4,
    save_visualizations=True
)
gauntlet = EnhancedGauntletBenchmark(config)

# 3. Instantiate your policy
my_policy = SimpleRPSPolicy()
my_policy.to(torch.device(config.device))

# 4. Run the evaluation!
# By default, Gauntlet has an RPS environment registered.
print("Running evaluation against the built-in challenger suite...")
metrics = gauntlet.evaluate_policy(my_policy, "SimpleRPSPolicy")

# 5. Generate a report
report = gauntlet.generate_report("my_policy_report.json")
print(f"Evaluation complete! Robustness Score: {metrics.robustness_score:.3f}")
print("Report and visualizations saved to current directory.")

Advanced Usage

Custom Challenger Agents

Create your own adversarial agents by subclassing ChallengerAgent:

from gauntlet import ChallengerAgent
import torch.nn as nn

class MyCustomChallenger(ChallengerAgent):
    def __init__(self, strategy="aggressive"):
        super().__init__(name=f"CustomChallenger-{strategy}")
        self.strategy = strategy
        # Initialize your custom model here

    def act(self, observation, legal_actions=None):
        # Implement your adversarial strategy
        return self.model(observation)

Environment Integration

Add new environments by implementing the Environment interface:

from gauntlet import Environment

class MyCustomEnvironment(Environment):
    def __init__(self, config):
        super().__init__(config)
        # Initialize your environment

    def reset(self):
        # Reset environment state
        return initial_observation

    def step(self, actions):
        # Execute actions and return next state
        return observation, rewards, done, info

Documentation

For detailed documentation, API reference, and advanced examples, visit our GitHub repository.

Contributing

We welcome contributions! Please see our contributing guidelines for details on how to get involved.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this library in your research, please cite:

@software{gauntlet_benchmark,
  title={Gauntlet Benchmark},
  author={Tanvish Desai},
  year={2024},
  url={https://github.com/tanvishdesai/gauntlet-benchmark}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gauntlet_benchmark-0.1.3.tar.gz (45.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gauntlet_benchmark-0.1.3-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file gauntlet_benchmark-0.1.3.tar.gz.

File metadata

  • Download URL: gauntlet_benchmark-0.1.3.tar.gz
  • Upload date:
  • Size: 45.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for gauntlet_benchmark-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ae3d0c24c0f420c3d72c2e1acd61baa674cbb5bf65b458dc92a75ecc41b4d985
MD5 3b90542a8c90d5bb8d54ba68debe60fa
BLAKE2b-256 2687996c2b5c782f870af0482c4fca673893e3ea1ba34ed3c6a87b44d2746f21

See more details on using hashes here.

File details

Details for the file gauntlet_benchmark-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for gauntlet_benchmark-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2e26afd6bf5174bd35450d540d95c8d24725c24a56dd68b49f993d6f875fa7a6
MD5 4e842b27c21eb01ab73a6b842c583fc5
BLAKE2b-256 459c550ae71674fee7a42f4ce11db3b65721cd2560a52d9037600c2819f2f07e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page