Skip to main content

Automated Red Teaming Suite for AI Agents - 'Burp Suite for AI'

Project description

Veritas

Veritas Banner

Automated Red Teaming Suite for AI Agents

"Burp Suite for AI Agents"

Python 3.10+ License: MIT Code style: ruff

FeaturesInstallationQuick StartDocumentationContributing


What is Veritas?

Veritas is the first open-source automated red-teaming suite for agentic AI systems. It stress-tests safety, memory integrity, and tool-use reliability of any agent — LLM-based or symbolic — inside a controlled sandbox.

It answers the question every AI lab worries about: "Is this agent safe to deploy?"

Tool Domain
Burp Suite Web App Security
Jest Software Testing
Snyk Dependency Vulnerabilities
Veritas AI Agent Failures

Features

Attack Modules (10)

  • Jailbreak - Bypass safety guidelines (DAN, roleplay, etc.)
  • Prompt Injection - Override system instructions
  • Tool Abuse - Misuse available tools (shell, HTTP, files)
  • Memory Poisoning - Corrupt agent context/memory
  • Goal Hijacking - Redirect agent objectives
  • Context Override - Overwrite system prompts
  • Data Exfiltration - Extract sensitive information
  • Denial of Service - Resource exhaustion attacks
  • Privilege Escalation - Gain unauthorized access
  • Multi-Turn Manipulation - Gradual boundary erosion

Secure Sandbox

  • Docker-isolated execution environment
  • Network disabled (prevent data exfiltration)
  • Memory limits (prevent resource bombs)
  • Timeout enforcement
  • Full execution logging

Defense Engine

  • Policy Engine: Symbolic rules for tool safety
  • Veritas-Nano Classifier: Fast attack detection
  • Contract Rules: Block dangerous commands, file ops, network calls

Professional Reports

  • PDF vulnerability assessment
  • JSON machine-readable output
  • Risk scoring (Critical/High/Medium/Low)
  • Actionable remediation recommendations

Installation

Quick Install

pip install veritas-redteam

Full Install (with all features)

pip install veritas-redteam[full]

Development Install

git clone https://github.com/ARYAN2302/veritas.git
cd veritas
pip install -e ".[dev,full]"

Requirements

  • Python 3.10+
  • Docker (for sandbox features)

Quick Start

Web Dashboard

# Launch interactive dashboard
streamlit run src/dashboard/app.py

The dashboard provides:

  • Real-time prompt analysis
  • Attack type probability distribution
  • Token attribution heatmap
  • Defense recommendations

CLI Usage

# Run full security scan
veritas scan

# Run specific attacks only
veritas scan --attacks jailbreak injection tool_abuse

# Export PDF report
veritas scan --output report.pdf

# Quick 1-page report
python -m src.reporter.quick_report --prompt "Your prompt" -o report.pdf

# CI mode (exit code based on risk level)
veritas scan --ci --fail-on critical

Python SDK

from veritas import Auditor

# Quick scan
auditor = Auditor(your_agent)
result = auditor.scan()
print(result.summary())

# Custom attack selection
result = auditor.scan(attacks=["jailbreak", "prompt_injection"])

# Export report
result.export_pdf("security_report.pdf")

Scan Your Own Agent

from src.core.target import AgentTarget

class MyAgent(AgentTarget):
    def __init__(self):
        self.name = "My Custom Agent"
    
    def invoke(self, prompt: str) -> str:
        # Your agent logic here
        return your_llm.generate(prompt)

# Run Veritas against your agent
from veritas import scan
results = scan(MyAgent())

Architecture

                       ┌─────────────────────────┐
                       │       VERITAS UI        │
                       │  Dashboard + PDF Engine │
                       └───────────┬─────────────┘
                                   │
┌──────────────────────────────────┼──────────────────────────────────┐
│                                  │                                  │
│                  VERITAS BACKEND (Core Engine)                      │
│                                                                     │
│  ┌─────────────────────────────┬─────────────────────────────────┐  │
│  │   Attack Engine (Red Team)  │       Agent Sandbox             │  │
│  │─────────────────────────────│─────────────────────────────────│  │
│  │ • Jailbreak generator       │ • Docker isolated runtime       │  │
│  │ • Tool abuse generator      │ • Tool-call interceptor         │  │
│  │ • Memory poisoning injector │ • Memory write logger           │  │
│  │ • Goal hijack prompts       │ • HTTP/File system monitors     │  │
│  └─────────────────────────────┴─────────────────────────────────┘  │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │      Defense Engine (Veritas-Nano Model + Symbolic Rules)       ││
│  │ • Heuristic classifier for attack detection                     ││
│  │ • Symbolic "contract rules" for tool safety                     ││
│  │ • Policy engine for blocking/re-routing agent plans             ││
│  └─────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────┘

Risk Scoring

Level Score Action
Critical 75-100 Block deployment
High 50-74 Requires mitigation
Medium 25-49 Review recommended
Low 1-24 Acceptable risk
Safe 0 All tests passed

CI/CD Integration

GitHub Actions

name: AI Safety Scan

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install Veritas
        run: pip install veritas-redteam
      
      - name: Run Security Scan
        run: veritas scan --ci --fail-on critical
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
      
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: security-report
          path: veritas_report.json

Extending Veritas

Custom Attacks

from src.core.attack import BaseAttack, AttackResult

class MyCustomAttack(BaseAttack):
    def __init__(self):
        super().__init__("My Attack", "Description here")
        self.severity = "high"
        self.payloads = ["payload1", "payload2"]

    def run(self, target) -> AttackResult:
        for payload in self.payloads:
            response = target.invoke(payload)
            if self.is_vulnerable(response):
                return AttackResult(success=True, ...)
        return AttackResult(success=False, ...)

Custom Policy Rules

from src.defense.policy import PolicyRule, Action

no_crypto_mining = PolicyRule(
    name="no_crypto_mining",
    description="Blocks cryptocurrency mining attempts",
    check=lambda ctx: "xmrig" in str(ctx).lower() or "monero" in str(ctx).lower(),
    action=Action.BLOCK,
    severity="critical"
)

Project Structure

veritas/
├── pyproject.toml          # Package configuration
├── README.md               # This file
├── veritas.py              # Main entry point
├── src/
│   ├── attacks/            # 10 attack modules
│   │   ├── jailbreak.py
│   │   ├── injection.py
│   │   ├── tool_abuse.py
│   │   └── ...
│   ├── defense/            # Defense engine
│   │   ├── policy.py       # Policy engine
│   │   ├── classifier.py   # Veritas-Nano
│   │   └── rules.py        # Default rules
│   ├── sandbox/            # Docker isolation
│   ├── core/               # Target + Scoring
│   ├── cli.py              # CLI interface
│   └── reporter.py         # PDF generation
├── tests/                  # Test suite
├── benchmarks/             # Standard benchmarks
└── examples/               # Usage examples

Roadmap

  • 10 attack modules with 280+ payloads
  • Docker sandbox
  • Policy engine
  • PDF/JSON reports
  • CLI tool
  • Veritas-Nano ML classifier (90.7% detection on Gandalf benchmark)
  • Universal adapters (OpenAI, Anthropic, Groq, Ollama)
  • YAML configuration system
  • Real-world benchmark validation
  • Streamlit dashboard
  • VS Code extension

Real-World Benchmarks

Veritas has been tested against industry-standard datasets:

Veritas-Nano Classifier

Benchmark Metric Score
deepset/prompt-injections F1 Score 71.9%
Precision 71.4%
Recall 72.4%
Lakera/gandalf_ignore Detection Rate 90.7% (705/777)
JailbreakBench/JBB-Behaviors F1 Score 48.8%*
Curated Benign Set False Positive Rate 0% (0/70)

*JailbreakBench tests harmful content requests, not prompt injection — different task domain.

End-to-End Defense (vs Llama-3.1-8B)

Attack Category Defense Rate Notes
Jailbreak 85% (17/20) Strong
Prompt Injection 75% (15/20) Good
Memory Poisoning 75% (15/20) Good
Goal Hijacking 85% (17/20) Strong
Context Override 83% (15/18) Strong
Privilege Escalation 60% (12/20) Moderate
DoS 55% (11/20) Moderate
Data Exfiltration 50% (10/20) Needs work
Tool Abuse 15% (3/20) Weak
OVERALL 64.6% (115/178)

Honest Assessment: Veritas detects 90.7% of ignore-style jailbreaks (Gandalf benchmark) with 0% false positives on normal traffic. Tool abuse defense needs improvement — this is expected as tool calls depend heavily on application-specific policy rules.

Usage

from src.classifier import VeritasNanoInference, guardrail

# Load classifier
clf = VeritasNanoInference("models/veritas-nano")

# Classify text
result = clf.classify("Ignore all previous instructions...")
print(f"Is Attack: {result.is_attack}, Score: {result.score:.2%}")

# Use as guardrail decorator
@guardrail(clf, block_on_attack=True)
def process_input(text):
    return agent.invoke(text)

Train Your Own

# Generate dataset from payloads
python -m src.classifier.dataset

# Train model  
python -m src.classifier.train

# Evaluate
python -m src.classifier.cli evaluate -d data/classifier -m models/veritas-nano

Contributing

Contributions welcome! Please read our Contributing Guide.

# Setup dev environment
git clone https://github.com/ARYAN2302/veritas.git
cd veritas
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check .

License

MIT License - see LICENSE for details.

Acknowledgments

Built for the AI safety community. Inspired by:

  • Garak - LLM vulnerability scanner
  • PyRIT - Microsoft's red teaming toolkit
  • Burp Suite - Web security testing

Built with ❤️ for safer AI agents

Report Bug · Request Feature

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

veritas_redteam-1.0.0.tar.gz (78.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

veritas_redteam-1.0.0-py3-none-any.whl (86.6 kB view details)

Uploaded Python 3

File details

Details for the file veritas_redteam-1.0.0.tar.gz.

File metadata

  • Download URL: veritas_redteam-1.0.0.tar.gz
  • Upload date:
  • Size: 78.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for veritas_redteam-1.0.0.tar.gz
Algorithm Hash digest
SHA256 403423c96924c7230ce086ce939adf7213d3d49e381150cef10ee9416ea9ccd2
MD5 f76246cd6f6952a452d6cbc11766d7c0
BLAKE2b-256 eb6709edd0efb999f529a12a8025d1e355590fdc8583f1ce7d53f53ef28ac20e

See more details on using hashes here.

File details

Details for the file veritas_redteam-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for veritas_redteam-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d3329420c16c17d5411786431d9630e62b8e7c958f810d018b99187251919ce
MD5 d77bc236759a88946a4bf336f76955cd
BLAKE2b-256 caff3589ba7d3d3fadef4ef62dbf84afc418a1e43942210b0fa9678032843eb2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page