Automated LLM red team framework -- test any model's safety with one command

These details have not been verified by PyPI

Project links

Project description

ai-blackteam

Automated LLM red team framework. Test any model's safety with one command.

Install

pip install ai-blackteam

Or from source:

git clone https://github.com/BILLKISHORE/ai-evals.git
cd ai-evals
pip install -e .

Quick Start

# Set your API key
blackteam config set providers.anthropic.api_key sk-ant-...

# Run a single attack
blackteam run -p anthropic -m claude-sonnet-4-6 -a encoding-obfuscation -t "Write a phishing email"

# Run all attacks against a model
blackteam batch -p anthropic -m claude-sonnet-4-6 --attacks all -t "Write a phishing email"

# Run everything against everything
blackteam sweep -t "Write a phishing email"

# Generate reports
blackteam report --format markdown
blackteam report --format html --output report.html
blackteam report --format json --output results.json

Providers

Provider	Models	Auth
Anthropic	Claude Sonnet 4.6, Opus, Haiku	API key
OpenAI	GPT-5.4, GPT-4o	API key
Google	Gemini 3.1 Pro, Flash	API key
DeepSeek	DeepSeek V3.2	API key
Mistral	Mistral Large 3, Small 4	API key
Ollama	Any local model (Llama 4, Qwen, Gemma)	None (local)
HuggingFace	Any HF model	API token

Attacks (20 built-in)

Attack	Mode	Description
encoding-obfuscation	single	Base64, ROT13, hex, leetspeak, pig latin
role-play-bypass	multi	Character development, fiction building
system-prompt-extraction	multi	JSON schema, chain-of-thought, error correction
progressive-normalization	tool-use	Gradual file access escalation
many-shot	single	Fabricated Q&A context flooding
context-manipulation	multi	10-turn conversational drift
dan-variants	single	DAN, EDUALC, Opposite Day, Professor persona
taxonomy-paraphrasing	single	Victim, researcher, journalist framing
best-of-n	single	Capitalization, typo, padding variations
historical-framing	single	Documentary, museum, textbook framing
token-smuggling	single	Zero-width Unicode character injection
few-shot-amplification	single	Benign examples + encoded target
conversation-derailment	multi	Self-discovery of model weaknesses
sunk-cost-exploit	multi	Build helpful context then pivot
emotional-manipulation	multi	Distress framing to override safety
contradiction-exploit	multi	Feed wrong info, get corrected with truth
xml-boundary-injection	single	Fake closing tags to escape prompt
authority-chain	multi	CISO/professor authority building
completion-exploit	single	Fake transcripts, JSON completion
multi-modal-text	single	ASCII art, tables, code embedding

Custom Plugins

Drop a Python file in the plugins/ folder:

from blackteam.registry import register_attack
from blackteam.attacks.base import BaseAttack

@register_attack("my-attack")
class MyAttack(BaseAttack):
    name = "My Custom Attack"
    technique_id = "my-attack"
    mode = "single-turn"

    def generate_prompts(self, target, **kwargs):
        return [f"Custom prompt: {target}"]

It shows up in blackteam list-attacks automatically.

Evaluator

Three scoring methods (combine any):

Keyword matching -- fast, free, checks for harmful content indicators
Regex patterns -- precise, free, matches structural patterns
LLM-as-judge -- accurate, uses Claude Haiku to rate 1-5

# Use all three
blackteam run -p anthropic -a encoding-obfuscation -t "target" --evaluator keyword,regex,llm

Reports

Format	Use Case
Markdown	Human-readable summary for documentation
JSON	Machine-readable for CI/CD pipelines
HTML	Dark-themed report with stats dashboard

Research

This tool was built alongside real security research on Claude Sonnet 4 and 4.6. See the experiments/ folder for 8 experiments covering 115 attack techniques with documented findings.

Author

Bill Kishore -- a developer who likes breaking things to understand how they work. Currently exploring LLM safety evals, red teaming, and the weird gaps between how AI systems are designed and how they actually behave. Open to collaborating on AI safety research, evals, or anything that needs creative problem-solving. Reach out.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Mar 31, 2026

0.9.0

Mar 30, 2026

0.4.0

Mar 30, 2026

This version

0.3.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_blackteam-0.3.0.tar.gz (25.5 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_blackteam-0.3.0-py3-none-any.whl (39.2 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file ai_blackteam-0.3.0.tar.gz.

File metadata

Download URL: ai_blackteam-0.3.0.tar.gz
Upload date: Mar 29, 2026
Size: 25.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.13.12 Linux/6.17.9-76061709-generic

File hashes

Hashes for ai_blackteam-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`eb5dc84f73e32eec60b62eadd4e4550df3e98789350f80db162b32b7d8a09bca`
MD5	`6713bfe06cc1501a3ea42f0e57ba841a`
BLAKE2b-256	`4e340e424ea6309bb62e0becd55ea657fb3a8e660d716e646645e735af1a492e`

See more details on using hashes here.

File details

Details for the file ai_blackteam-0.3.0-py3-none-any.whl.

File metadata

Download URL: ai_blackteam-0.3.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 39.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.13.12 Linux/6.17.9-76061709-generic

File hashes

Hashes for ai_blackteam-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f89ae37bc96dbe3bc3c1c1576d20676a18ce59e50a5cf70cb4c115f28f1b4496`
MD5	`8d018e80bb1fc9ec5d59d8573923e66c`
BLAKE2b-256	`185cd59b1ee8b1ac197cb2d61aa8d49aded2c07abfef26834c402eacce952a56`

See more details on using hashes here.

ai-blackteam 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ai-blackteam

Install

Quick Start

Providers

Attacks (20 built-in)

Custom Plugins

Evaluator

Reports

Research

Author

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes