AI Robustness Evaluation System
Project description
AI Robustness Evaluation System (ARES)
Stop wondering if your AI is secure. Know for certain.
ARES automates LLM red-teaming so you can test your models against real attacks before deployment. Plug in your attacks, evaluators, and guardrails. Test across models. Get unified reports.
Install ARES, run this quickstart example, and view results in chat format:
ares evaluate example_configs/quickstart.yaml -l
ares show-chat -f results/keyword_evaluation.json --open
┌───────────────────────────────────────────────────────────────────────────┐
│ ARES Evaluation Flow │
└───────────────────────────────────────────────────────────────────────────┘
📋 Define Goals 🎯 Select Strategy 📊 Evaluate Results
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌───────────────┐ ┌─────────────────┐
│ What to test │ ───────> │ How to attack │ ───────> │ How to measure │
└──────────────┘ └───────────────┘ └─────────────────┘
• PII leakage • Prompt injection • Keyword match
• Data exfiltration • Crescendo • LLM judges
• Harmful content • GCG, TAP, etc. • Custom evals
• Custom goals • Your attack • Guardrails
What is ARES? An orchestration framework that lets you plug in your own attacks, evaluators, and guardrails to test LLMs - whether you're benchmarking a new attack method for research or testing your model's security before deployment.
Why ARES?
- 🔬 For Researchers: Benchmark your novel attack against 20+ existing methods with one config
- 🛡️ For Security Teams: Test against OWASP top-10 vulnerabilities before production
- 🔌 For Developers: Integrate your custom attacks, detectors, guardrails, or evaluation methods
Three core components you can customize:
- Goals: What to test (PII leakage, prompt injection, jailbreaks, or your custom goals)
- Strategy: How to attack (built-in methods or your novel attack technique)
- Evaluation: How to measure (keyword matching, LLM judges, or your custom evaluator)
🗺️ Navigation & Quick Start
Choose your learning path based on your experience level:
| Experience Level | I want to... | Start Here |
|---|---|---|
| 🟢 Beginner | Try it visually (no coding) | GUI Interface |
| 🟢 Beginner | Run my first security test | Quickstart |
| 🟢 Beginner | See real-world examples | Real-World Examples |
| 🟡 Intermediate | Test with multiple attack methods | Using Built-in Plugins |
| 🟡 Intermediate | Test OWASP vulnerabilities | OWASP Security Testing |
| 🔴 Advanced | Create custom attacks/evaluators | ADVANCED.md |
| 🔴 Advanced | Fine-tune configuration | ADVANCED.md |
Quick Decision Tree:
- 👉 New to red-teaming? Start with GUI or Quickstart
- 👉 Security professional? Jump to OWASP Testing
- 👉 Researcher? Check Using Plugins then ADVANCED.md
- 👉 Just exploring? Browse Real-World Examples
Full Documentation: ibm.github.io/ares
🏗️ Architecture
The ARES programming model provides a flexible framework for orchestrating robustness evaluations:
Key Components:
- Plugin Catalog: Extensible collection of target connectors, attack goals, strategies, and evaluations
- Configuration-Driven: Define your evaluation pipeline through YAML configuration
- Programmatic API: Full control through Python API (
redteamer.target(),redteamer.goal(),redteamer.strategy(),redteamer.evaluate())
🖥️ GUI (Optional)
🟢 Complexity: Beginner | No coding required
Not a command-line person? No problem. Test AI security with drag-and-drop simplicity - perfect for security teams who want quick results without writing code.
Quick Start
-
Clone the repository:
git clone https://github.com/IBM/ares.git cd ares
-
Install ARES:
pip install .
-
Launch the GUI:
python gui.py -
You'll see this interface:
GUI Features
The interface has 5 tabs on the left:
- 📝 Configuration: Upload and edit your test configuration
- 📊 Data: Upload test prompts and view configured datasets
- 🔌 Plugins: Browse and install available attack/evaluation plugins
- 🎯 Red Team: Launch your configured security tests
- 📈 Reports: View detailed results and vulnerability reports
Example Workflow
1. Upload Configuration
2. Install Required Plugins
3. Run Tests & View Results
4. Visualize Attack Conversations (Optional)
ARES can visualize attacks as chat-style conversations with evaluation scores, making it easier to assess multi-turn attacks and understand how jailbreaks evolve.
Just click Show Chat View from Reports tab.
💡 Pro Tip: The GUI is great for exploration, but the CLI gives you more control and is better for automation. Once you're comfortable, try the CLI Installation below.
⚡ Quick Installation
🟢 Complexity: Beginner
Prerequisites
You'll need Python 3.11+ and either:
- pip (standard Python package manager)
- uv (recommended - 10-100x faster):
curl -LsSf https://astral.sh/uv/install.sh | sh
One-Line Install
Set up a virtual environment first so your install stays clean and isolated:
# prepare a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# install ares
curl https://raw.githubusercontent.com/IBM/ares/refs/heads/main/install.sh | bash
This will create the example_configs/ and assets/ directories in your current directory with the files you need. Then run the quickstart and open the chat-style results:
ares evaluate example_configs/quickstart.yaml -l
ares show-chat -f results/keyword_evaluation.json --open
Or try the minimal example:
ares evaluate example_configs/minimal.yaml -l
⚠️ Important: Using a virtual environment is highly recommended.
💡 Note: See Understanding ARES_HOME for details on path resolution.
📦 Note: More examples and assets can be loaded from the ARES repository.
Development Installation
For interactive development and customization:
Using pip:
# 1. Clone the repository
git clone https://github.com/IBM/ares.git
cd ares
# 2. Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# 3. Install ARES with dev dependencies
pip install ".[dev]"
# 4. Run examples
ares evaluate example_configs/quickstart.yaml -l
ares show-chat -f results/keyword_evaluation.json --open
Using uv (faster):
# 1. Clone the repository
git clone https://github.com/IBM/ares.git
cd ares
# 2. Sync dependencies with dev extras (creates venv automatically)
uv sync --extra dev
# 3. Activate virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# 4. Run examples
ares evaluate example_configs/quickstart.yaml -l
ares show-chat -f results/keyword_evaluation.json --open
What's next? Run your first test.
:rocket: Quickstart
🟢 Complexity: Beginner | Your first security test
Let's catch a vulnerability before your users do. This quickstart tests a model against harmful behavior prompts - one of the most common security assessments.
Option 1: Use the Pre-Built Config (Fastest)
ares evaluate example_configs/quickstart.yaml -l -n 10
Flags explained: -l limits number of goals to run (default 5), -n 10 specifies exactly 10 goals to test
This uses our ready-to-go configuration that shows you all the components explicitly. View the config to see how it's structured.
Option 2: Create Your Own Config (Learn by Doing)
Create a file called my-first-test.yaml:
# my-first-test.yaml
target:
huggingface:
model_config:
pretrained_model_name_or_path: Qwen/Qwen2-0.5B-Instruct
tokenizer_config:
pretrained_model_name_or_path: Qwen/Qwen2-0.5B-Instruct
red-teaming:
prompts: assets/safety_behaviors_text_subset.csv # Test harmful behavior prompts
Then run the test:
ares evaluate my-first-test.yaml -l -n 10
Flags explained: -l limits number of goals to run (default 5), -n 10 specifies exactly 10 goals to test
Understanding the Results
What just happened?
- ✅ ARES loaded a small HuggingFace model (Qwen2-0.5B-Instruct)
- ✅ Sent 5 test prompts designed to elicit harmful behaviors
- ✅ Evaluated responses using keyword matching (checks for refusal patterns)
- ✅ Generated a detailed report showing results
Your saved results in the results folder will have:
- A high level summary report with relevant statistics.
- Which prompts the model responded to
- Which prompts were refused
- Response patterns and safety behaviors
- Detailed conversation logs (for multi turn attacks)
💡 Pro Tip: The quickstart uses defaults for simplicity. Check
example_configs/quickstart.yamlto see the full explicit configuration with all components (strategy, evaluation, goals) clearly defined.
Next Steps
- 📊 View the report: Open the generated HTML file in your browser
- 📝 See full config: Check
example_configs/quickstart.yamlto understand all components - 📓 Interactive learning: Try the Jupyter notebook
- 📁 More examples: Explore
example_configs/directory - 🎯 Test your model: Replace the default model with your own
🎯 What's Next? You've run your first test. Now see how real teams can use ARES to catch vulnerabilities before deployment → Real-World Examples
🌍 Real-World Examples
🟢 Complexity: Beginner | See ARES in action
Learn from real security testing scenarios. These examples show how teams can use ARES to catch vulnerabilities before deployment.
Example 1: Pre-Deployment Security Audit
Scenario: Test if your customer service chatbot leaks PII using multiple attack vectors.
What you test: Direct requests, crescendo and encoding attacks
What you learn: Which attacks extract PII, types of information leaked, target robustness
📋 See full configuration & results
Example 2: Testing Guardrail Effectiveness
Scenario: Measure how well Granite Guardian protects your model against various attacks.
What you test: Human Jailbreaks, encoding and crescendo attacks
What you learn: Which attacks the guardrail blocks, bypass techniques, effectiveness rates
📋 See full configuration & results
Example 3: Research Benchmarking
Scenario: Compare your novel attack against established methods for publication.
What you test: Your attack vs. 4 baselines with multiple evaluators
What you learn: Success rate comparisons, statistical significance, reproducible results
📋 See full configuration & results
📓 Try these interactive examples:
- Red Teaming with ARES - Complete walkthrough
- Granite Guardian Testing - Guardrail effectiveness
- Multi-Agent Coalition Attacks - Advanced attack scenarios
🎯 What's Next? You've seen examples. Now discover how to combine multiple attack methods to find vulnerabilities others miss → Using Built-in Plugins
💡 What You Can Do
🟡 Complexity: Intermediate | Understanding ARES capabilities
Now that you've seen ARES in action, here's everything you can do with it.
For Researchers
- 🔬 Benchmark novel attacks: Plug in your attack method and compare against 20+ existing techniques
- 📊 Multi-model testing: Test across local models and cloud APIs with one config
- 📈 Unified metrics: Get comparative analysis with standardized evaluation
- 📝 Reproducible research: Share configs for reproducible experiments
For Security Teams
- 🛡️ OWASP compliance: Test against OWASP top-10 LLM vulnerabilities
- 🔍 Pre-deployment testing: Catch vulnerabilities before production
- 📋 Audit reports: Generate detailed security assessment reports
- 🎯 Custom test scenarios: Define organization-specific security tests
For Developers
- 🔌 Guardrail integration: Add your custom safety filters and test effectiveness
- 🎯 Custom evaluators: Use your own detection methods (keywords, ML models, LLM judges)
- 🔄 CI/CD integration: Automate security testing in your pipeline
- 📊 Performance tracking: Monitor security improvements over time
Built-in Capabilities
- ✅ Single & multi-turn attacks: One-shot prompts and conversational strategies
- ✅ 19 ready-to-use plugins: Garak, PyRIT, AutoDAN, CyberSecEval, and more
- ✅ Interactive dashboard: Explore results visually
- ✅ One YAML config: Orchestrate everything from a single file
🎯 What's Next? Ready to test with multiple attack methods simultaneously? → Using Built-in Plugins
🔌 Using Built-in Plugins
🟡 Complexity: Intermediate | Testing with multiple attack methods
One config. 15+ attack methods. Find the weakest link. This section shows you how to combine multiple plugins for comprehensive security testing.
Understanding Plugin Types
Before diving into examples, here's what each plugin type does:
- 🎯 Goals: Define what to test (e.g., "extract PII", "generate harmful content")
- ⚔️ Strategies: Attack methods (e.g., jailbreaks, encoding, multi-turn conversations)
- 📊 Evaluators: How to measure success (e.g., keyword matching, LLM judges)
- 🔌 Connectors: How to connect to models (HuggingFace, OpenAI, WatsonX, etc.)
- 🛡️ Guardrails: Safety filters to test (input/output filters)
Plugin Installation
There are two ways to install plugins. The names of the available plugins are all under: ares/plugins.
- First we can use the ares cli with the name of the plugin, for this example we will use
ares-human-jailbreak:
ares install-plugin ares-human-jailbreak
- Or, for manual installation, we can navigate to the folder with the plugin, in this example
ares-litellm
cd plugins/ares-litellm
and then run
pip install .
to install the relevant plugin.
Example 1: Single Attack Method
requires ares-human-jailbreak plugin
install via: ares install-plugin ares-human-jailbreak
Start simple - test one attack method against your model:
- Use known jailbreak prompts
- Check responses for harmful content patterns
- Get clear pass/fail results
Example 2: Multiple Attack Methods
Compare strategies - test multiple attacks simultaneously:
- 3 different attack methods (crescendo, human jailbreaks, encoding)
- 2 evaluation methods (keyword matching, LLM judge)
- One unified report showing which attacks work best
🎯 Which Plugin Should I Use?
Choose based on your testing goal:
| Your Goal | Recommended Plugins | Why |
|---|---|---|
| Test jailbreak resistance | human_jailbreak, crescendo |
Known effective jailbreaks + multi-turn attacks |
| Test data leakage | direct_requests + inject_base64 + keyword |
Direct extraction attempts with and without encoding + pattern detection |
| Test encoding bypasses | encoding (base64, ROT13, etc.) |
Common obfuscation techniques |
| Benchmark novel attack | Create custom plugin | Compare against baselines |
| Test guardrail effectiveness | Any strategy + your guardrail | See what gets through |
📦 Available Built-in Plugins
🔽 Click to see all 19 public plugins
Core Strategies (Built-in):
direct_requests- Simple harmful promptsmulti_turn- Multi-turn conversation attacks (implement your, but make it compatible to ARES pipeline)
Plugin Attack Strategies:
ares-echo-chamber- Multi-turn attackares-gcg- Greedy Coordinate Gradient attacksares-tap- Tree of Attacks with Pruningares-human-jailbreak- Known jailbreak prompts from researchares-autodan- Automated jailbreak generationares-garak- Garak vulnerability scanner integrationares-pyrit- PyRIT attack framework integrationares-dynamic-llm- LLM-generated adaptive attacks
Core Evaluators (Built-in):
keyword- Pattern matching for harmful contentllm_eval- LLM-as-judge scoringhuggingface_eval- HuggingFace model-based evaluation
Plugin Evaluators:
ares-cyberseceval- Security-specific evaluations & goalsares-intrinsics- Intrinsic evaluation
Core Connectors (Built-in):
huggingface- Local HuggingFace modelswatsonx- IBM WatsonX modelsrestful- Generic REST API connector
Plugin Connectors:
ares-litellm- Universal LLM proxy (OpenAI, Anthropic, etc.)ares-granite-io- IBM Granite models via Ollamaares-vllm-connector- vLLM inference serverares-watsonx-orchestrate- WatsonX Orchestrate agentsares-lora-adapter-connector- LoRA adapter supportares-mcp-connector- Model Context Protocol connectorares-icarus-connector- Icarus platform integration
Goal Plugins:
ares-cyberseceval- CyberSecEval security test goalsares-deepteam- Deep team-based goals generation
🎯 What's Next? Test against industry-standard vulnerabilities that matter to stakeholders → OWASP Security Testing
🛡️ OWASP Security Testing
🟡 Complexity: Intermediate | Industry-standard vulnerability testing
Is your AI vulnerable to the top 10 security risks? ARES maps directly to the OWASP Top 10 for LLM Applications, making it easy to test for industry-recognized vulnerabilities.
Why OWASP Matters
The OWASP Top 10 represents the most critical security risks for LLM applications, identified by security experts worldwide. Testing against these vulnerabilities helps you:
- ✅ Meet security compliance requirements
- ✅ Identify critical risks before deployment
- ✅ Communicate security posture to stakeholders
- ✅ Prioritize security improvements
⚠️ Real Impact: Companies have found critical vulnerabilities (PII leakage, prompt injection) in production systems using OWASP testing. Don't wait for users to find them first.
Quick OWASP Test
Test your model against a specific OWASP vulnerability. Each intent must be tested separately:
# owasp-llm-01-test.yaml
target:
huggingface:
model_config:
pretrained_model_name_or_path: Qwen/Qwen2-0.5B-Instruct
tokenizer_config:
pretrained_model_name_or_path: Qwen/Qwen2-0.5B-Instruct
red-teaming:
intent: owasp-llm-01:2025 # Prompt Injection
prompts: assets/safety_behaviors_text_subset.csv
To test multiple OWASP categories: Run separate tests for each intent (owasp-llm-01:2025, owasp-llm-02:2025, etc.)
🎯 Top 3 Critical Vulnerabilities to Test First
Start with these high-impact vulnerabilities:
-
LLM01: Prompt Injection - Can attackers override your system instructions?
- Intent:
owasp-llm-01:2025 - Example Notebook
- Intent:
-
LLM02: Sensitive Information Disclosure - Does your model leak secrets?
- Intent:
owasp-llm-02:2025 - Contact us for examples
- Intent:
-
LLM09: Misinformation - Can attackers make your model hallucinate?
- Intent:
owasp-llm-09:2025 - Example Notebook
- Intent:
OWASP Mapping Table
📜 Complete OWASP to ARES Mapping (Click to expand)
| Code | Title | What It Tests | ARES Intent | Status | Example |
|---|---|---|---|---|---|
| LLM01 | Prompt Injection | Can prompts override intended behavior or security policies? | owasp-llm-01:2025 |
✅ Supported | Notebook |
| LLM02 | Sensitive Information Disclosure | Does the system leak secrets (API keys, PII) through responses? | owasp-llm-02:2025 |
✅ Supported | Contact us |
| LLM03 | Supply Chain | Are dependencies and model artifacts validated for integrity? | owasp-llm-03:2025 |
⚠️ Not supported | - |
| LLM04 | Data and Model Poisoning | Can external inputs corrupt training data or retrieval (RAG)? | owasp-llm-04:2025 |
✅ Supported | WIP |
| LLM05 | Improper Output Handling | Are outputs unsafe (injected prompts, broken deps, malformed code)? | owasp-llm-05:2025 |
✅ Supported | WIP |
| LLM06 | Excessive Agency | Can the agent use tools beyond intended scope or be hijacked? | owasp-llm-06:2025 |
✅ Supported | WIP |
| LLM07 | System Prompt Leakage | Are system-level instructions or sensitive context exposed? | owasp-llm-07:2025 |
✅ Supported | WIP |
| LLM08 | Vector and Embedding Weaknesses | Is sensitive data leaked via embeddings or retrieval vectors? | owasp-llm-08:2025 |
⚠️ See LLM02 | - |
| LLM09 | Misinformation | Is the model resilient against hallucinations or malicious content? | owasp-llm-09:2025 |
✅ Supported | Notebook |
| LLM10 | Unbounded Consumption | Does the agent prevent resource exhaustion (DoS attacks)? | owasp-llm-10:2025 |
✅ Supported | WIP |
🎯 What's Next? Ready to extend ARES with your own tools? Explore advanced customization → ADVANCED.md
🔧 Advanced Topics
Ready to extend ARES? Check out our Advanced Guide for:
- 🔌 Creating Custom Plugins - Build your own attack strategies, evaluators, and connectors
- ⚙️ Advanced Configuration - Fine-tune ARES behavior and model settings
- 📚 Plugin Development Resources - Templates, examples, and guides
Quick links:
- Plugin Template - Copy-paste starting point
- Plugin Examples - Real-world configurations
- Full Documentation - Detailed guides
🤝 Community & Support
Get Help
- 📖 Documentation - Comprehensive guides
- 💬 GitHub Discussions - Ask questions
- 🐛 Issue Tracker - Report bugs
- 📧 Email - Direct support
Contribute
We welcome contributions! Here's how to get started:
- Report Issues: Found a bug? Open an issue
- Share Plugins: Created a useful plugin? Submit a PR
- Improve Docs: Help us make documentation better
- Share Examples: Add your use cases to inspire others
Stay Updated
- ⭐ Star the repo to stay notified
- 📣 Follow releases for new features
- 🎓 Check out new example notebooks
Feedback Welcome
📣 Try ARES and share your feedback! We're constantly improving based on user input.
📚 Additional Resources
Example Configurations
The example_configs/ directory contains ready-to-use configurations:
- Basic Examples:
minimal.yaml,strategies.yaml,evaluators.yaml - OWASP Tests:
owasp/directory - Plugin Examples:
plugins/directory with 15+ plugin configs - Custom Scenarios:
custom/directory with advanced use cases
Jupyter Notebooks
Interactive tutorials in the notebooks/ directory:
- Red Teaming with ARES - Complete walkthrough
- OWASP Testing - Vulnerability-specific guides
- Plugin Development - Create your own plugins
- Multi-Agent Attacks - Advanced scenarios
Research Papers
ARES is built on cutting-edge research:
- Crescendo Attack - Multi-turn jailbreaking
- GCG Attack - Gradient-based adversarial suffixes
- TAP Attack - Tree of attacks with pruning
IBM ❤️ Open Source AI
ARES has been brought to you by IBM Research. We believe in open, transparent, and secure AI development.
License: Apache 2.0
Citation:
@software{ares2025,
title={ARES: AI Robustness Evaluation System},
author={Liubov Nedoshivina and
Kieran Fraser and
Mark Purcell and
Ambrish Rawat and
Giulio Zizzo and
Muhammad Zaid Hameed and
Stefano Braghin and
Anisa Halimi and
Cristian Morasso and
Ibrahim Malik and
Naoise Holohan and
Giandomenico Cornacchia},
organization={IBM Research},
year={2025},
url={https://github.com/IBM/ares}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ares_redteamer-0.2.1.tar.gz.
File metadata
- Download URL: ares_redteamer-0.2.1.tar.gz
- Upload date:
- Size: 7.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7888ee84dbf36379b024686f84d9d66a1d75c158cd5077eb947f2f341f6d9e75
|
|
| MD5 |
1235106dfdf0ff2558fb3f8d15e3be4d
|
|
| BLAKE2b-256 |
0f8a8f4f291cdff95210e125539fd72fce86426da173db177ab29f269de250ab
|
File details
Details for the file ares_redteamer-0.2.1-py3-none-any.whl.
File metadata
- Download URL: ares_redteamer-0.2.1-py3-none-any.whl
- Upload date:
- Size: 125.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53f9420bc49ac9fc1d0568cc19283f91374e249cf601dfa697bb96112d107b05
|
|
| MD5 |
29ef037a2e297f11020c1f741c107d03
|
|
| BLAKE2b-256 |
8266f444950d0e82afc22d7b6f272946c98de8d1299ce190df9193d434965c38
|