Automated Red Teaming Suite for AI Agents - 'Burp Suite for AI'

These details have not been verified by PyPI

Project links

Project description

Veritas

Veritas Banner

Automated Red Teaming Suite for AI Agents

"Burp Suite for AI Agents"

Features • Installation • Quick Start • Documentation • Contributing

What is Veritas?

Veritas is the first open-source automated red-teaming suite for agentic AI systems. It stress-tests safety, memory integrity, and tool-use reliability of any agent — LLM-based or symbolic — inside a controlled sandbox.

It answers the question every AI lab worries about: "Is this agent safe to deploy?"

Tool	Domain
Burp Suite	Web App Security
Jest	Software Testing
Snyk	Dependency Vulnerabilities
Veritas	AI Agent Failures

Features

Attack Modules (10)

Jailbreak - Bypass safety guidelines (DAN, roleplay, etc.)
Prompt Injection - Override system instructions
Tool Abuse - Misuse available tools (shell, HTTP, files)
Memory Poisoning - Corrupt agent context/memory
Goal Hijacking - Redirect agent objectives
Context Override - Overwrite system prompts
Data Exfiltration - Extract sensitive information
Denial of Service - Resource exhaustion attacks
Privilege Escalation - Gain unauthorized access
Multi-Turn Manipulation - Gradual boundary erosion

Secure Sandbox

Docker-isolated execution environment
Network disabled (prevent data exfiltration)
Memory limits (prevent resource bombs)
Timeout enforcement
Full execution logging

Defense Engine

Policy Engine: Symbolic rules for tool safety
Veritas-Nano Classifier: Fast attack detection
Contract Rules: Block dangerous commands, file ops, network calls

Professional Reports

PDF vulnerability assessment
JSON machine-readable output
Risk scoring (Critical/High/Medium/Low)
Actionable remediation recommendations

Installation

Quick Install

pip install veritas-redteam

Full Install (with all features)

pip install veritas-redteam[full]

Development Install

git clone https://github.com/ARYAN2302/veritas.git
cd veritas
pip install -e ".[dev,full]"

Requirements

Python 3.10+
Docker (for sandbox features)

Quick Start

Web Dashboard

# Launch interactive dashboard
streamlit run src/dashboard/app.py

The dashboard provides:

Real-time prompt analysis
Attack type probability distribution
Token attribution heatmap
Defense recommendations

CLI Usage

# Run full security scan
veritas scan

# Run specific attacks only
veritas scan --attacks jailbreak injection tool_abuse

# Export PDF report
veritas scan --output report.pdf

# Quick 1-page report
python -m src.reporter.quick_report --prompt "Your prompt" -o report.pdf

# CI mode (exit code based on risk level)
veritas scan --ci --fail-on critical

Python SDK

from veritas import Auditor

# Quick scan
auditor = Auditor(your_agent)
result = auditor.scan()
print(result.summary())

# Custom attack selection
result = auditor.scan(attacks=["jailbreak", "prompt_injection"])

# Export report
result.export_pdf("security_report.pdf")

Scan Your Own Agent

from src.core.target import AgentTarget

class MyAgent(AgentTarget):
    def __init__(self):
        self.name = "My Custom Agent"
    
    def invoke(self, prompt: str) -> str:
        # Your agent logic here
        return your_llm.generate(prompt)

# Run Veritas against your agent
from veritas import scan
results = scan(MyAgent())

Architecture

                       ┌─────────────────────────┐
                       │       VERITAS UI        │
                       │  Dashboard + PDF Engine │
                       └───────────┬─────────────┘
                                   │
┌──────────────────────────────────┼──────────────────────────────────┐
│                                  │                                  │
│                  VERITAS BACKEND (Core Engine)                      │
│                                                                     │
│  ┌─────────────────────────────┬─────────────────────────────────┐  │
│  │   Attack Engine (Red Team)  │       Agent Sandbox             │  │
│  │─────────────────────────────│─────────────────────────────────│  │
│  │ • Jailbreak generator       │ • Docker isolated runtime       │  │
│  │ • Tool abuse generator      │ • Tool-call interceptor         │  │
│  │ • Memory poisoning injector │ • Memory write logger           │  │
│  │ • Goal hijack prompts       │ • HTTP/File system monitors     │  │
│  └─────────────────────────────┴─────────────────────────────────┘  │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │      Defense Engine (Veritas-Nano Model + Symbolic Rules)       ││
│  │ • Heuristic classifier for attack detection                     ││
│  │ • Symbolic "contract rules" for tool safety                     ││
│  │ • Policy engine for blocking/re-routing agent plans             ││
│  └─────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────┘

Risk Scoring

Level	Score	Action
Critical	75-100	Block deployment
High	50-74	Requires mitigation
Medium	25-49	Review recommended
Low	1-24	Acceptable risk
Safe	0	All tests passed

CI/CD Integration

GitHub Actions

name: AI Safety Scan

on: [push, pull_request]

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install Veritas
        run: pip install veritas-redteam
      
      - name: Run Security Scan
        run: veritas scan --ci --fail-on critical
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
      
      - name: Upload Report
        uses: actions/upload-artifact@v4
        with:
          name: security-report
          path: veritas_report.json

Extending Veritas

Custom Attacks

from src.core.attack import BaseAttack, AttackResult

class MyCustomAttack(BaseAttack):
    def __init__(self):
        super().__init__("My Attack", "Description here")
        self.severity = "high"
        self.payloads = ["payload1", "payload2"]

    def run(self, target) -> AttackResult:
        for payload in self.payloads:
            response = target.invoke(payload)
            if self.is_vulnerable(response):
                return AttackResult(success=True, ...)
        return AttackResult(success=False, ...)

Custom Policy Rules

from src.defense.policy import PolicyRule, Action

no_crypto_mining = PolicyRule(
    name="no_crypto_mining",
    description="Blocks cryptocurrency mining attempts",
    check=lambda ctx: "xmrig" in str(ctx).lower() or "monero" in str(ctx).lower(),
    action=Action.BLOCK,
    severity="critical"
)

Project Structure

veritas/
├── pyproject.toml          # Package configuration
├── README.md               # This file
├── veritas.py              # Main entry point
├── src/
│   ├── attacks/            # 10 attack modules
│   │   ├── jailbreak.py
│   │   ├── injection.py
│   │   ├── tool_abuse.py
│   │   └── ...
│   ├── defense/            # Defense engine
│   │   ├── policy.py       # Policy engine
│   │   ├── classifier.py   # Veritas-Nano
│   │   └── rules.py        # Default rules
│   ├── sandbox/            # Docker isolation
│   ├── core/               # Target + Scoring
│   ├── cli.py              # CLI interface
│   └── reporter.py         # PDF generation
├── tests/                  # Test suite
├── benchmarks/             # Standard benchmarks
└── examples/               # Usage examples

Roadmap

10 attack modules with 280+ payloads
Docker sandbox
Policy engine
PDF/JSON reports
CLI tool
Veritas-Nano ML classifier (90.7% detection on Gandalf benchmark)
Universal adapters (OpenAI, Anthropic, Groq, Ollama)
YAML configuration system
Real-world benchmark validation
Streamlit dashboard
VS Code extension

Real-World Benchmarks

Veritas has been tested against industry-standard datasets:

Veritas-Nano Classifier

Benchmark	Metric	Score
deepset/prompt-injections	F1 Score	71.9%
	Precision	71.4%
	Recall	72.4%
Lakera/gandalf_ignore	Detection Rate	90.7% (705/777)
JailbreakBench/JBB-Behaviors	F1 Score	48.8%*
Curated Benign Set	False Positive Rate	0% (0/70)

*JailbreakBench tests harmful content requests, not prompt injection — different task domain.

End-to-End Defense (vs Llama-3.1-8B)

Attack Category	Defense Rate	Notes
Jailbreak	85% (17/20)	Strong
Prompt Injection	75% (15/20)	Good
Memory Poisoning	75% (15/20)	Good
Goal Hijacking	85% (17/20)	Strong
Context Override	83% (15/18)	Strong
Privilege Escalation	60% (12/20)	Moderate
DoS	55% (11/20)	Moderate
Data Exfiltration	50% (10/20)	Needs work
Tool Abuse	15% (3/20)	Weak
OVERALL	64.6% (115/178)

Honest Assessment: Veritas detects 90.7% of ignore-style jailbreaks (Gandalf benchmark) with 0% false positives on normal traffic. Tool abuse defense needs improvement — this is expected as tool calls depend heavily on application-specific policy rules.

Usage

from src.classifier import VeritasNanoInference, guardrail

# Load classifier
clf = VeritasNanoInference("models/veritas-nano")

# Classify text
result = clf.classify("Ignore all previous instructions...")
print(f"Is Attack: {result.is_attack}, Score: {result.score:.2%}")

# Use as guardrail decorator
@guardrail(clf, block_on_attack=True)
def process_input(text):
    return agent.invoke(text)

Train Your Own

# Generate dataset from payloads
python -m src.classifier.dataset

# Train model  
python -m src.classifier.train

# Evaluate
python -m src.classifier.cli evaluate -d data/classifier -m models/veritas-nano

Contributing

Contributions welcome! Please read our Contributing Guide.

# Setup dev environment
git clone https://github.com/ARYAN2302/veritas.git
cd veritas
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check .

License

MIT License - see LICENSE for details.

Acknowledgments

Built for the AI safety community. Inspired by:

Garak - LLM vulnerability scanner
PyRIT - Microsoft's red teaming toolkit
Burp Suite - Web security testing

Built with ❤️ for safer AI agents

Report Bug · Request Feature

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Dec 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

veritas_redteam-1.0.0.tar.gz (78.7 kB view details)

Uploaded Dec 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

veritas_redteam-1.0.0-py3-none-any.whl (86.6 kB view details)

Uploaded Dec 4, 2025 Python 3

File details

Details for the file veritas_redteam-1.0.0.tar.gz.

File metadata

Download URL: veritas_redteam-1.0.0.tar.gz
Upload date: Dec 4, 2025
Size: 78.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for veritas_redteam-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`403423c96924c7230ce086ce939adf7213d3d49e381150cef10ee9416ea9ccd2`
MD5	`f76246cd6f6952a452d6cbc11766d7c0`
BLAKE2b-256	`eb6709edd0efb999f529a12a8025d1e355590fdc8583f1ce7d53f53ef28ac20e`

See more details on using hashes here.

File details

Details for the file veritas_redteam-1.0.0-py3-none-any.whl.

File metadata

Download URL: veritas_redteam-1.0.0-py3-none-any.whl
Upload date: Dec 4, 2025
Size: 86.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for veritas_redteam-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d3329420c16c17d5411786431d9630e62b8e7c958f810d018b99187251919ce`
MD5	`d77bc236759a88946a4bf336f76955cd`
BLAKE2b-256	`caff3589ba7d3d3fadef4ef62dbf84afc418a1e43942210b0fa9678032843eb2`

See more details on using hashes here.

veritas-redteam 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Veritas

What is Veritas?

Features

Attack Modules (10)

Secure Sandbox

Defense Engine

Professional Reports

Installation

Quick Install

Full Install (with all features)

Development Install

Requirements

Quick Start

Web Dashboard

CLI Usage

Python SDK

Scan Your Own Agent

Architecture

Risk Scoring

CI/CD Integration

GitHub Actions

Extending Veritas

Custom Attacks

Custom Policy Rules

Project Structure

Roadmap

Real-World Benchmarks

Veritas-Nano Classifier

End-to-End Defense (vs Llama-3.1-8B)

Usage

Train Your Own

Contributing

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes