Automated Red Teaming Suite for AI Agents - 'Burp Suite for AI'
Project description
Veritas
Automated Red Teaming Suite for AI Agents
"Burp Suite for AI Agents"
Features • Installation • Quick Start • Documentation • Contributing
What is Veritas?
Veritas is the first open-source automated red-teaming suite for agentic AI systems. It stress-tests safety, memory integrity, and tool-use reliability of any agent — LLM-based or symbolic — inside a controlled sandbox.
It answers the question every AI lab worries about: "Is this agent safe to deploy?"
| Tool | Domain |
|---|---|
| Burp Suite | Web App Security |
| Jest | Software Testing |
| Snyk | Dependency Vulnerabilities |
| Veritas | AI Agent Failures |
Features
Attack Modules (10)
- Jailbreak - Bypass safety guidelines (DAN, roleplay, etc.)
- Prompt Injection - Override system instructions
- Tool Abuse - Misuse available tools (shell, HTTP, files)
- Memory Poisoning - Corrupt agent context/memory
- Goal Hijacking - Redirect agent objectives
- Context Override - Overwrite system prompts
- Data Exfiltration - Extract sensitive information
- Denial of Service - Resource exhaustion attacks
- Privilege Escalation - Gain unauthorized access
- Multi-Turn Manipulation - Gradual boundary erosion
Secure Sandbox
- Docker-isolated execution environment
- Network disabled (prevent data exfiltration)
- Memory limits (prevent resource bombs)
- Timeout enforcement
- Full execution logging
Defense Engine
- Policy Engine: Symbolic rules for tool safety
- Veritas-Nano Classifier: Fast attack detection
- Contract Rules: Block dangerous commands, file ops, network calls
Professional Reports
- PDF vulnerability assessment
- JSON machine-readable output
- Risk scoring (Critical/High/Medium/Low)
- Actionable remediation recommendations
Installation
Quick Install
pip install veritas-redteam
Full Install (with all features)
pip install veritas-redteam[full]
Development Install
git clone https://github.com/ARYAN2302/veritas.git
cd veritas
pip install -e ".[dev,full]"
Requirements
- Python 3.10+
- Docker (for sandbox features)
Quick Start
Web Dashboard
# Launch interactive dashboard
streamlit run src/dashboard/app.py
The dashboard provides:
- Real-time prompt analysis
- Attack type probability distribution
- Token attribution heatmap
- Defense recommendations
CLI Usage
# Run full security scan
veritas scan
# Run specific attacks only
veritas scan --attacks jailbreak injection tool_abuse
# Export PDF report
veritas scan --output report.pdf
# Quick 1-page report
python -m src.reporter.quick_report --prompt "Your prompt" -o report.pdf
# CI mode (exit code based on risk level)
veritas scan --ci --fail-on critical
Python SDK
from veritas import Auditor
# Quick scan
auditor = Auditor(your_agent)
result = auditor.scan()
print(result.summary())
# Custom attack selection
result = auditor.scan(attacks=["jailbreak", "prompt_injection"])
# Export report
result.export_pdf("security_report.pdf")
Scan Your Own Agent
from src.core.target import AgentTarget
class MyAgent(AgentTarget):
def __init__(self):
self.name = "My Custom Agent"
def invoke(self, prompt: str) -> str:
# Your agent logic here
return your_llm.generate(prompt)
# Run Veritas against your agent
from veritas import scan
results = scan(MyAgent())
Architecture
┌─────────────────────────┐
│ VERITAS UI │
│ Dashboard + PDF Engine │
└───────────┬─────────────┘
│
┌──────────────────────────────────┼──────────────────────────────────┐
│ │ │
│ VERITAS BACKEND (Core Engine) │
│ │
│ ┌─────────────────────────────┬─────────────────────────────────┐ │
│ │ Attack Engine (Red Team) │ Agent Sandbox │ │
│ │─────────────────────────────│─────────────────────────────────│ │
│ │ • Jailbreak generator │ • Docker isolated runtime │ │
│ │ • Tool abuse generator │ • Tool-call interceptor │ │
│ │ • Memory poisoning injector │ • Memory write logger │ │
│ │ • Goal hijack prompts │ • HTTP/File system monitors │ │
│ └─────────────────────────────┴─────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ Defense Engine (Veritas-Nano Model + Symbolic Rules) ││
│ │ • Heuristic classifier for attack detection ││
│ │ • Symbolic "contract rules" for tool safety ││
│ │ • Policy engine for blocking/re-routing agent plans ││
│ └─────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────┘
Risk Scoring
| Level | Score | Action |
|---|---|---|
| Critical | 75-100 | Block deployment |
| High | 50-74 | Requires mitigation |
| Medium | 25-49 | Review recommended |
| Low | 1-24 | Acceptable risk |
| Safe | 0 | All tests passed |
CI/CD Integration
GitHub Actions
name: AI Safety Scan
on: [push, pull_request]
jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Veritas
run: pip install veritas-redteam
- name: Run Security Scan
run: veritas scan --ci --fail-on critical
env:
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
- name: Upload Report
uses: actions/upload-artifact@v4
with:
name: security-report
path: veritas_report.json
Extending Veritas
Custom Attacks
from src.core.attack import BaseAttack, AttackResult
class MyCustomAttack(BaseAttack):
def __init__(self):
super().__init__("My Attack", "Description here")
self.severity = "high"
self.payloads = ["payload1", "payload2"]
def run(self, target) -> AttackResult:
for payload in self.payloads:
response = target.invoke(payload)
if self.is_vulnerable(response):
return AttackResult(success=True, ...)
return AttackResult(success=False, ...)
Custom Policy Rules
from src.defense.policy import PolicyRule, Action
no_crypto_mining = PolicyRule(
name="no_crypto_mining",
description="Blocks cryptocurrency mining attempts",
check=lambda ctx: "xmrig" in str(ctx).lower() or "monero" in str(ctx).lower(),
action=Action.BLOCK,
severity="critical"
)
Project Structure
veritas/
├── pyproject.toml # Package configuration
├── README.md # This file
├── veritas.py # Main entry point
├── src/
│ ├── attacks/ # 10 attack modules
│ │ ├── jailbreak.py
│ │ ├── injection.py
│ │ ├── tool_abuse.py
│ │ └── ...
│ ├── defense/ # Defense engine
│ │ ├── policy.py # Policy engine
│ │ ├── classifier.py # Veritas-Nano
│ │ └── rules.py # Default rules
│ ├── sandbox/ # Docker isolation
│ ├── core/ # Target + Scoring
│ ├── cli.py # CLI interface
│ └── reporter.py # PDF generation
├── tests/ # Test suite
├── benchmarks/ # Standard benchmarks
└── examples/ # Usage examples
Roadmap
- 10 attack modules with 280+ payloads
- Docker sandbox
- Policy engine
- PDF/JSON reports
- CLI tool
- Veritas-Nano ML classifier (90.7% detection on Gandalf benchmark)
- Universal adapters (OpenAI, Anthropic, Groq, Ollama)
- YAML configuration system
- Real-world benchmark validation
- Streamlit dashboard
- VS Code extension
Real-World Benchmarks
Veritas has been tested against industry-standard datasets:
Veritas-Nano Classifier
| Benchmark | Metric | Score |
|---|---|---|
| deepset/prompt-injections | F1 Score | 71.9% |
| Precision | 71.4% | |
| Recall | 72.4% | |
| Lakera/gandalf_ignore | Detection Rate | 90.7% (705/777) |
| JailbreakBench/JBB-Behaviors | F1 Score | 48.8%* |
| Curated Benign Set | False Positive Rate | 0% (0/70) |
*JailbreakBench tests harmful content requests, not prompt injection — different task domain.
End-to-End Defense (vs Llama-3.1-8B)
| Attack Category | Defense Rate | Notes |
|---|---|---|
| Jailbreak | 85% (17/20) | Strong |
| Prompt Injection | 75% (15/20) | Good |
| Memory Poisoning | 75% (15/20) | Good |
| Goal Hijacking | 85% (17/20) | Strong |
| Context Override | 83% (15/18) | Strong |
| Privilege Escalation | 60% (12/20) | Moderate |
| DoS | 55% (11/20) | Moderate |
| Data Exfiltration | 50% (10/20) | Needs work |
| Tool Abuse | 15% (3/20) | Weak |
| OVERALL | 64.6% (115/178) |
Honest Assessment: Veritas detects 90.7% of ignore-style jailbreaks (Gandalf benchmark) with 0% false positives on normal traffic. Tool abuse defense needs improvement — this is expected as tool calls depend heavily on application-specific policy rules.
Usage
from src.classifier import VeritasNanoInference, guardrail
# Load classifier
clf = VeritasNanoInference("models/veritas-nano")
# Classify text
result = clf.classify("Ignore all previous instructions...")
print(f"Is Attack: {result.is_attack}, Score: {result.score:.2%}")
# Use as guardrail decorator
@guardrail(clf, block_on_attack=True)
def process_input(text):
return agent.invoke(text)
Train Your Own
# Generate dataset from payloads
python -m src.classifier.dataset
# Train model
python -m src.classifier.train
# Evaluate
python -m src.classifier.cli evaluate -d data/classifier -m models/veritas-nano
Contributing
Contributions welcome! Please read our Contributing Guide.
# Setup dev environment
git clone https://github.com/ARYAN2302/veritas.git
cd veritas
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check .
License
MIT License - see LICENSE for details.
Acknowledgments
Built for the AI safety community. Inspired by:
- Garak - LLM vulnerability scanner
- PyRIT - Microsoft's red teaming toolkit
- Burp Suite - Web security testing
Built with ❤️ for safer AI agents
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file veritas_redteam-1.0.0.tar.gz.
File metadata
- Download URL: veritas_redteam-1.0.0.tar.gz
- Upload date:
- Size: 78.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
403423c96924c7230ce086ce939adf7213d3d49e381150cef10ee9416ea9ccd2
|
|
| MD5 |
f76246cd6f6952a452d6cbc11766d7c0
|
|
| BLAKE2b-256 |
eb6709edd0efb999f529a12a8025d1e355590fdc8583f1ce7d53f53ef28ac20e
|
File details
Details for the file veritas_redteam-1.0.0-py3-none-any.whl.
File metadata
- Download URL: veritas_redteam-1.0.0-py3-none-any.whl
- Upload date:
- Size: 86.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d3329420c16c17d5411786431d9630e62b8e7c958f810d018b99187251919ce
|
|
| MD5 |
d77bc236759a88946a4bf336f76955cd
|
|
| BLAKE2b-256 |
caff3589ba7d3d3fadef4ef62dbf84afc418a1e43942210b0fa9678032843eb2
|