A next-generation MARL evaluation framework for comprehensive robustness testing.
Project description
Gauntlet MARL Benchmark
A next-generation MARL evaluation framework for comprehensive robustness testing of multi-agent reinforcement learning policies. This library provides a unified approach to evaluating agents against diverse adversarial strategies, environmental variations, and temporal challenges.
Features
- Comprehensive Adversarial Testing: Evaluate policies against neural adversaries, adaptive opponents, and strategic challengers
- Environmental Robustness: Test across multiple environment configurations and domain shifts
- Temporal Evaluation: Assess continual learning capabilities and catastrophic forgetting
- Rich Visualization: Generate detailed plots and reports for analysis
- Extensible Architecture: Easy to add new environments, challengers, and evaluation metrics
Installation
Install the core library from PyPI:
pip install gauntlet-benchmark
For additional features, you can install optional dependencies:
# For Nash Equilibrium metrics
pip install gauntlet-benchmark[nash]
# For OpenSpiel environments
pip install gauntlet-benchmark[openspiel]
# For PettingZoo environments
pip install gauntlet-benchmark[pettingzoo]
# For development tools
pip install gauntlet-benchmark[dev]
Development Installation
If you're installing from source or in development mode:
# Install in development mode
pip install -e .
Note: After installation, import the package as from gauntlet import ..., not from gauntlet_benchmark import ...
Quickstart
Here's a simple example of how to evaluate a basic policy:
import torch
import torch.nn as nn
from gauntlet import EnhancedGauntletBenchmark, EvaluationConfig
# 1. Define your policy
class SimpleRPSPolicy(nn.Module):
def __init__(self):
super().__init__()
self.network = nn.Sequential(
nn.Linear(6, 32),
nn.ReLU(),
nn.Linear(32, 3)
)
def forward(self, x):
return self.network(x)
# 2. Configure the benchmark
config = EvaluationConfig(
num_episodes=100,
parallel_workers=4,
save_visualizations=True
)
gauntlet = EnhancedGauntletBenchmark(config)
# 3. Instantiate your policy
my_policy = SimpleRPSPolicy()
my_policy.to(torch.device(config.device))
# 4. Run the evaluation!
# By default, Gauntlet has an RPS environment registered.
print("Running evaluation against the built-in challenger suite...")
metrics = gauntlet.evaluate_policy(my_policy, "SimpleRPSPolicy")
# 5. Generate a report
report = gauntlet.generate_report("my_policy_report.json")
print(f"Evaluation complete! Robustness Score: {metrics.robustness_score:.3f}")
print("Report and visualizations saved to current directory.")
Advanced Usage
Custom Challenger Agents
Create your own adversarial agents by subclassing ChallengerAgent:
from gauntlet import ChallengerAgent
import torch.nn as nn
class MyCustomChallenger(ChallengerAgent):
def __init__(self, strategy="aggressive"):
super().__init__(name=f"CustomChallenger-{strategy}")
self.strategy = strategy
# Initialize your custom model here
def act(self, observation, legal_actions=None):
# Implement your adversarial strategy
return self.model(observation)
Environment Integration
Add new environments by implementing the Environment interface:
from gauntlet import Environment
class MyCustomEnvironment(Environment):
def __init__(self, config):
super().__init__(config)
# Initialize your environment
def reset(self):
# Reset environment state
return initial_observation
def step(self, actions):
# Execute actions and return next state
return observation, rewards, done, info
Documentation
For detailed documentation, API reference, and advanced examples, visit our GitHub repository.
Contributing
We welcome contributions! Please see our contributing guidelines for details on how to get involved.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this library in your research, please cite:
@software{gauntlet_benchmark,
title={Gauntlet Benchmark},
author={Tanvish Desai},
year={2024},
url={https://github.com/tanvishdesai/gauntlet-benchmark}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gauntlet_benchmark-0.1.3.tar.gz.
File metadata
- Download URL: gauntlet_benchmark-0.1.3.tar.gz
- Upload date:
- Size: 45.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae3d0c24c0f420c3d72c2e1acd61baa674cbb5bf65b458dc92a75ecc41b4d985
|
|
| MD5 |
3b90542a8c90d5bb8d54ba68debe60fa
|
|
| BLAKE2b-256 |
2687996c2b5c782f870af0482c4fca673893e3ea1ba34ed3c6a87b44d2746f21
|
File details
Details for the file gauntlet_benchmark-0.1.3-py3-none-any.whl.
File metadata
- Download URL: gauntlet_benchmark-0.1.3-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e26afd6bf5174bd35450d540d95c8d24725c24a56dd68b49f993d6f875fa7a6
|
|
| MD5 |
4e842b27c21eb01ab73a6b842c583fc5
|
|
| BLAKE2b-256 |
459c550ae71674fee7a42f4ce11db3b65721cd2560a52d9037600c2819f2f07e
|