AI vs AI adversarial security testing platform. Red team agents attack, blue team agents defend. Fully automated.

These details have not been verified by PyPI

Project links

Project description

RedTeam Arena

AI vs AI adversarial security testing for your codebase.

Red team agents attack. Blue team agents defend. Fully automated.

RedTeam Arena Demo

Why?

Proactive, not reactive — AI agents that think like attackers probe your code 24/7, instead of waiting for the next CVE.
Attack + defense in one run — every vulnerability found gets an immediate mitigation proposal, so you ship fixes, not just findings.
Enterprise-grade auditing — go beyond simple exploits with specialized compliance agents for SOC 2, ISO 27001, HIPAA, FedRAMP, and more.
Zero setup — point it at a directory, pick a scenario, get a report. No config files, no infrastructure, no learning curve.
Multi-provider — works with Claude, OpenAI, Gemini, and local Ollama models out of the box.
45+ built-in scenarios — OWASP Top 10, AI-specific attacks, APT attack chains, and high-end enterprise compliance frameworks.

Quick Start

# Install
pip install redteam-arena

# Set your API key (Claude by default)
export ANTHROPIC_API_KEY=sk-ant-...

# Run a battle
redteam-arena battle ./my-project --scenario sql-injection

What you'll see

  REDTEAM ARENA v0.0.4
  Scenario: sql-injection | Target: ./my-project
  ==================================================

  Round 1/5
  ----------------------------------------

  RED AGENT (Attacker):
  ...streaming analysis...

  BLUE AGENT (Defender):
  ...streaming mitigations...

  Round 1: 3 finding(s), 3 mitigation(s)

  ==================================================
  Battle Report Summary
  ==================================================

  Rounds: 5  |  Vulnerabilities: 8
   Critical: 2   |  High: 3  |  Medium: 2  |  Low: 1
  Mitigations proposed: 7/8 (88%)

  Full report: ./reports/battle-abc123.md

CLI Reference

`redteam-arena battle <directory>`

Run a security battle against a target codebase.

Option	Description	Default
`-s, --scenario <name>`	Scenario to run (required)	—
`-r, --rounds <n>`	Number of battle rounds	`5`
`-p, --provider <name>`	LLM provider: `claude`, `openai`, `gemini`, `ollama`	auto-detect
`-m, --model <name>`	Specific model to use	provider default
`-f, --format <fmt>`	Report format: `markdown`, `json`, `sarif`, `html`, `compliance`	`markdown`
`-o, --output <path>`	Output file path	auto-generated
`--agent-mode <mode>`	Agent focus: `attacker` (default) or `auditor`	`attacker`
`--mock-llm`	Use a mock LLM for testing/demo (fast & free)	`false`
`--diff`	Only scan changed files (git diff)	`false`
`--auto-fix`	Generate fix suggestions as a branch	`false`
`--fail-on <sev>`	Exit non-zero if severity found: `critical`, `high`, `medium`, `low`	—
`--analyze`	Run advanced cross-cutting analysis	`false`
`--pr-comment`	Post results as a GitHub PR comment	`false`

# 3 rounds of XSS testing
redteam-arena battle ./webapp --scenario xss --rounds 3

# Full audit (runs all scenarios sequentially)
redteam-arena battle ./api --scenario full-audit

# CI mode: fail the build on critical findings
redteam-arena battle ./src --scenario sql-injection --fail-on critical

# Scan only changed files
redteam-arena battle ./src --scenario secrets-exposure --diff

# Use OpenAI instead of Claude
redteam-arena battle ./src --scenario xss --provider openai --model gpt-4o

# Run a HIPAA/HITECH healthcare compliance audit
redteam-arena battle ./medical-app --scenario hipaa-hitech-readiness --agent-mode auditor --format compliance

# SARIF output for GitHub Code Scanning
redteam-arena battle ./src --scenario full-audit --format sarif -o results.sarif.json

`redteam-arena list`

List all available scenarios.

redteam-arena list
redteam-arena list --tag owasp
redteam-arena list --tag ai-safety

`redteam-arena watch <directory>`

Watch a directory for file changes and re-scan automatically.

redteam-arena watch ./src --scenario xss

`redteam-arena benchmark`

Run detection accuracy benchmarks to measure your provider's effectiveness.

redteam-arena benchmark --suite owasp-web-basic
redteam-arena benchmark --list-suites

`redteam-arena history`

View past battle results, trends, and regression analysis.

redteam-arena history
redteam-arena history --trends
redteam-arena history --regression --target ./src

`redteam-arena dashboard`

Generate a rich HTML dashboard of battle history.

redteam-arena dashboard --open-browser

`redteam-arena serve`

Start the REST + WebSocket API server.

pip install redteam-arena[server]
redteam-arena serve --port 3000

Built-in Scenarios

Web Security (OWASP Top 10)

Scenario	Description
`sql-injection`	Find SQL injection vectors in database queries
`xss`	Detect cross-site scripting vulnerabilities
`auth-bypass`	Find authentication and authorization flaws
`secrets-exposure`	Detect hardcoded secrets and leaked credentials
`path-traversal`	Find directory traversal and path injection issues
`ssrf`	Server-side request forgery vulnerabilities
`injection`	General injection flaws (command, LDAP, XML)
`broken-access-control`	Missing or broken authorization checks
`crypto-failures`	Weak cryptography and insecure data handling
`security-misconfiguration`	Misconfigured servers, defaults, and permissions
`insecure-deserialization`	Unsafe object deserialization
`vulnerable-dependencies`	Known-vulnerable third-party packages
`sensitive-disclosure`	Excessive error messages and data leakage

AI & Agent Safety

Scenario	Description
`prompt-injection`	Detect prompt injection vulnerabilities in LLM apps
`data-poisoning`	Find training and RAG data poisoning risks
`agent-goal-hijack`	Test agentic systems for goal hijacking
`excessive-agency`	Identify over-privileged AI agents
`system-prompt-leakage`	Find system prompt extraction vectors
`memory-poisoning`	Detect persistent memory corruption attacks
`tool-misuse`	Identify unsafe tool usage in agent chains
`rogue-agents`	Test for unauthorized agent behavior
`llm-misinformation`	Detect hallucination-based security risks
`insecure-inter-agent-comms`	Unsecured agent-to-agent communication
`agentic-supply-chain`	Supply chain risks in agent pipelines
`llm-supply-chain`	Compromised models and unsafe fine-tuning
`human-agent-trust`	Failures in human-AI trust boundaries
`identity-privilege-abuse`	Identity spoofing and privilege escalation in agents
`improper-output-handling`	Unsafe handling of LLM-generated output
`unexpected-code-execution`	Unintended code execution via LLM
`vector-embedding-weakness`	Vulnerabilities in embedding-based retrieval
`cascading-failures`	Failure propagation in multi-agent systems

Enterprise & Compliance

Scenario	Description
`apt-advanced-persistent-threat`	Simulates an ALPHV/BlackCat style attack chain (LotL, MFA bypass, lateral movement)
`infrastructure-as-code`	Audits Terraform, Kubernetes, and Docker for cloud misconfigurations
`iso-42001-ai-compliance`	Audit for ISO/IEC 42001 Artificial Intelligence Management System
`soc2-security-privacy`	Audit for SOC 2 Trust Services Criteria (Security, Privacy, Logging)
`fedramp-readiness`	Audit for FedRAMP / NIST 800-53 technical control readiness
`hipaa-hitech-readiness`	US Healthcare compliance audit (ePHI security and audit logs)
`hitrust-csf-compliance`	High-end healthcare framework audit (DLP, device trust)
`epcs-dea-compliance`	DEA regulations for Electronic Prescriptions for Controlled Substances
`pci-dss-readiness`	Payment Card Industry (PCI DSS v4.0) readiness audit
`iso-27001-infosec.md`	Audit for ISO/IEC 27001 Information Security Management System
`cmmc-dod-readiness.md`	Audit for CMMC Level 2 (DoD) security requirements
`gdpr-ccpa-privacy.md`	Audit for global privacy laws (GDPR/CCPA/ISO 27701)

How It Works

Read — RedTeam Arena reads the source files in your target directory.
Attack — The Red Agent analyzes the code for vulnerabilities based on the chosen scenario, producing structured findings with severity, file location, and attack vectors.
Defend — The Blue Agent reviews each finding and proposes concrete mitigations with code fixes and confidence levels.
Report — A report is generated (Markdown, JSON, SARIF, or HTML) with all findings, mitigations, and a severity summary.

Each battle runs multiple rounds. In each round, the Red Agent digs deeper based on previous findings, and the Blue Agent refines its defenses.

Programmatic API

RedTeam Arena's core is fully importable for use in your own tools:

import asyncio
from redteam_arena import (
    BattleEngine,
    BattleEngineOptions,
    BattleConfig,
    RedAgent,
    BlueAgent,
    ClaudeAdapter,
    load_scenario,
    generate_report,
)

async def main():
    provider = ClaudeAdapter()

    scenario_result = await load_scenario("sql-injection")
    if not scenario_result.ok:
        raise RuntimeError("Scenario not found")

    config = BattleConfig(
        target_dir="./my-project",
        scenario=scenario_result.value,
        rounds=3,
    )

    engine = BattleEngine(BattleEngineOptions(
        red_agent=RedAgent(provider),
        blue_agent=BlueAgent(provider),
        config=config,
    ))

    battle = await engine.run()
    report = generate_report(battle)
    print(report)

asyncio.run(main())

Configuration File

Optionally, add .redteamarena.yml to your project root:

# .redteamarena.yml
provider: claude         # claude | openai | gemini | ollama
model: claude-sonnet-4-20250514
rounds: 5
format: markdown         # markdown | json | sarif | html

CI/CD Integration

GitHub Actions

- name: RedTeam Arena Security Scan
  run: |
    pip install redteam-arena
    redteam-arena battle ./src --scenario full-audit --fail-on high --format sarif -o security.sarif.json
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

- name: Upload SARIF results
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: security.sarif.json

Requirements

Python >= 3.10
API key for your chosen provider:
- Claude (default): ANTHROPIC_API_KEY — get one here
- OpenAI: OPENAI_API_KEY
- Gemini: GEMINI_API_KEY
- Ollama: no key needed (runs locally)

Contributing

See CONTRIBUTING.md for development setup, project structure, and how to add new scenarios.

Security

To report a security vulnerability in RedTeam Arena itself, see SECURITY.md.

License

MIT — Muhammad Dilawar Shafiq (Dilawar Gopang)

If RedTeam Arena is useful to you, please star it — it helps others find it.

Report a bug · Request a feature · Contribute

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.4

Feb 28, 2026

0.0.3

Feb 28, 2026

0.0.2

Feb 28, 2026

0.0.1

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redteam_arena-0.0.4.tar.gz (118.4 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

redteam_arena-0.0.4-py3-none-any.whl (157.4 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file redteam_arena-0.0.4.tar.gz.

File metadata

Download URL: redteam_arena-0.0.4.tar.gz
Upload date: Feb 28, 2026
Size: 118.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for redteam_arena-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`5b04418e7fd83367f1f82744f73bb45947f44cf06aef5a1b9bd73a293726d3ec`
MD5	`bcd3ad57b88a439845243e69dad628f1`
BLAKE2b-256	`b7250e64470a74ac3663f02b42ed377ce2b4e31a75b02af3b79cb16193e3cc5d`

See more details on using hashes here.

File details

Details for the file redteam_arena-0.0.4-py3-none-any.whl.

File metadata

Download URL: redteam_arena-0.0.4-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 157.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for redteam_arena-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`058c33f40c80bfb23d13018f83e77d0cfb3bd5d2788028f439d9a6e9fc1c78de`
MD5	`02d9e56c9e9581410737f3929430a468`
BLAKE2b-256	`401e42c79dcd9a13a44fed002a77b64653d94033cc831fa2f6d0401b5fc20438`

See more details on using hashes here.

redteam-arena 0.0.4

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

RedTeam Arena

Why?

Quick Start

What you'll see

CLI Reference

redteam-arena battle <directory>

redteam-arena list

redteam-arena watch <directory>

redteam-arena benchmark

redteam-arena history

redteam-arena dashboard

redteam-arena serve

Built-in Scenarios

Web Security (OWASP Top 10)

AI & Agent Safety

Enterprise & Compliance

How It Works

Programmatic API

Configuration File

CI/CD Integration

GitHub Actions

Requirements

Contributing

Security

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`redteam-arena battle <directory>`

`redteam-arena list`

`redteam-arena watch <directory>`

`redteam-arena benchmark`

`redteam-arena history`

`redteam-arena dashboard`

`redteam-arena serve`