Adversarial security testing framework for LLM-powered applications
Project description
LLMStrike
Adversarial security testing for LLM-powered applications.
Installation · Quick Start · Attack Categories · CI/CD · Custom Techniques
LLMStrike is an open-source Python CLI that runs a battery of AI-specific attack techniques against any LLM application endpoint and produces a detailed vulnerability report.
Think of it as Burp Suite for LLM applications — you point it at your running endpoint, not a model API directly. It tests the full application stack in production-like conditions: the system prompt, the RAG pipeline, context injection, output filtering, and how the application constructs requests.
25 techniques. 6 attack categories. OWASP-mapped. CI-ready. Extensible via YAML.
Why LLMStrike?
Most LLM security tools test the model in isolation. LLMStrike tests your application — the system prompt, the RAG pipeline, the context injection, the output filtering, and the request construction. That's where the real vulnerabilities live.
| Tool | What it tests | Blind spot |
|---|---|---|
| garak | The underlying model — hallucination, toxicity, base model behavior | Your system prompt, RAG pipeline, and context injection are invisible to it |
| LLMStrike | The full application stack — system prompt, RAG pipeline, context injection, output filtering | This is the gap |
LLMStrike directly implements the adversarial testing requirements called out in Executive Order 14110 on AI Safety and the NIST AI Risk Management Framework.
Warning — Ethical Use
LLMStrike is designed to test applications you own or have explicit written authorization to test. Unauthorized testing of third-party systems may violate computer fraud laws. Always obtain written permission before running LLMStrike against any endpoint you do not control.
Installation
pip install llmstrike
Or from source:
git clone https://github.com/akeemmckenzie/llmstrike.git
cd llmstrike
pip install -e .
Requirements: Python 3.10+
Quick Start
# Test an OpenAI-compatible endpoint
llmstrike probe --target https://your-app.com/api/chat --key sk-...
# Test an Anthropic endpoint
llmstrike probe --target https://your-app.com/api/chat --format anthropic --key sk-ant-...
# Run only prompt injection tests
llmstrike probe --target https://your-app.com/api/chat --key sk-... \
--category prompt-injection-direct
# Run specific techniques by ID
llmstrike probe --target https://your-app.com/api/chat --key sk-... \
--techniques pi_direct_role_switch,jailbreak_dan
# List all available techniques
llmstrike list techniques
# List attack categories
llmstrike list categories
Attack Categories
LLMStrike ships with 25 techniques across 6 categories, each mapped to the OWASP Top 10 for LLM Applications:
| Category | OWASP | Severity | Techniques | What it tests |
|---|---|---|---|---|
prompt-injection-direct |
LLM01 | CRITICAL | 5 | Role switching, instruction overrides, delimiter escapes, context escapes, completion-based leaking |
prompt-injection-indirect |
LLM01 | CRITICAL | 3 | Document injection, web content injection, hidden/steganographic instructions |
jailbreak |
LLM01 | HIGH | 5 | DAN-style personas, roleplay, hypothetical framing, encoding tricks, multi-turn escalation |
system-prompt-extraction |
LLM06 | HIGH | 5 | Verbatim extraction, translation tricks, debug mode, constraint enumeration, behavioral probing |
data-exfiltration |
LLM06 | HIGH | 4 | PII extraction, training data leakage, cross-context leakage, tool/RAG output leakage |
rag-poisoning |
LLM03 | CRITICAL | 3 | Authority injection, false context injection, instruction smuggling via metadata |
Detection Methods
Each technique carries its own detection logic:
- Keyword — checks if the response contains specific success indicators
- Keyword (inverted) — flags vulnerability when refusal phrases are absent (the model didn't refuse)
- Regex pattern — matches response content against regex patterns (PII formats, credential patterns, system prompt fragments)
- LLM-as-judge — uses a separate LLM to evaluate whether the response indicates a vulnerability
CLI Reference
llmstrike probe
Run an adversarial security probe against an LLM endpoint.
Options:
--target URL Target endpoint URL (required)
--key API_KEY API key (Bearer token / x-api-key)
--format FORMAT openai | anthropic | generic | raw (default: openai)
--model MODEL Model name for request body
--system-prompt TEXT System prompt to include in requests
--category CATEGORY Run only this category (repeatable)
--techniques IDS Comma-separated technique IDs
--output DIR Report output directory (default: ./llmstrike-reports)
--judge-key API_KEY API key for LLM-as-judge evaluation
--concurrency N Parallel technique runners (default: 3)
--ci CI mode: JSON to stdout, exit 1 on critical/high
--timeout SECONDS Per-request timeout (default: 30)
llmstrike list techniques
llmstrike list techniques # all techniques
llmstrike list techniques --category prompt-injection-direct # filter by category
llmstrike list categories
llmstrike list categories
Target Formats
OpenAI (default)
Standard OpenAI-compatible /chat/completions format. Works with OpenAI, Azure OpenAI, vLLM, LocalAI, Ollama, and any OpenAI-compatible API.
llmstrike probe --target https://api.openai.com/v1/chat/completions \
--key sk-... --model gpt-4o
Anthropic
Anthropic Messages API format.
llmstrike probe --target https://api.anthropic.com/v1/messages \
--format anthropic --key sk-ant-... --model claude-sonnet-4-20250514
Generic
Jinja2-templated requests for custom API formats.
llmstrike probe --target https://your-app.com/api/query --format generic
Raw
Simple {"prompt": "..."} POST format for custom endpoints.
llmstrike probe --target https://your-app.com/api/generate --format raw
Scoring
LLMStrike produces a 0-100 security score with a letter grade for every probe:
| Grade | Score | Meaning |
|---|---|---|
| A | 90-100 | Strong resistance across all tested categories |
| B | 75-89 | Minor weaknesses detected |
| C | 60-74 | Moderate vulnerabilities — remediation recommended |
| D | 45-59 | Significant vulnerabilities found |
| F | < 45 | Critical exposure — immediate action required |
How scoring works:
- Each technique starts at 100 points, deducted based on
(vulnerable_payloads / total_payloads) * severity_weight * 8 - Severity weights: Critical = 10, High = 7, Medium = 4, Low = 1.5, Info = 0.5
- Hard caps: any CRITICAL finding caps the overall score at 60; any HIGH caps at 75
Adding Custom Techniques
Create a YAML file following this schema:
id: my_custom_technique
name: "My Custom Technique"
description: "What this technique tests"
category: "prompt-injection-direct"
owasp: "LLM01:2023"
owasp_name: "Prompt Injection"
severity: high
payloads:
- "payload 1"
- "payload 2"
- "payload 3"
- "payload 4"
- "payload 5"
detection:
method: keyword
keywords:
- "indicator1"
- "indicator2"
remediation: "How to fix this vulnerability"
references:
- "https://example.com/reference"
tags:
- "custom"
Drop the file in any directory and point LLMStrike at it:
export LLMSTRIKE_TECHNIQUES_DIR=/path/to/custom/techniques
llmstrike probe --target https://your-app.com/api/chat --key sk-...
Or run specific technique IDs directly with --techniques.
See CONTRIBUTING.md for full guidelines on writing techniques.
CI/CD Integration
GitHub Actions
name: LLM Security Scan
on:
pull_request:
branches: [main]
jobs:
llm-security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install LLMStrike
run: pip install llmstrike
- name: Run security probe
run: |
llmstrike probe \
--target ${{ secrets.LLM_ENDPOINT }} \
--key ${{ secrets.LLM_API_KEY }} \
--ci
# Exit code 1 if any critical or high severity findings
In CI mode (--ci), LLMStrike outputs JSON to stdout and exits with code 1 if any critical or high severity findings are detected — making it a drop-in quality gate.
Architecture
+-------------+
| CLI | (Click)
+------+------+
|
+------v------+
| Runner | (asyncio orchestration)
+------+------+
|
+------------+------------+
| | |
+------v----+ +----v-----+ +----v------+
| Connector | | Scorer | | Reporter |
| (httpx) | | (grades) | | (HTML/JSON)|
+-----------+ +----------+ +-----------+
|
+------v------+
| Techniques | (YAML loader)
+-------------+
|
+------v------+
| techniques/ | (YAML files)
+-------------+
Reports
Every probe generates:
- HTML report — self-contained, shareable security assessment with findings, evidence, remediation guidance, and scoring breakdown
- JSON report — machine-readable output for integration with dashboards, SIEM, or compliance tooling
- Terminal summary — color-coded findings table with severity, grade, and hit rates
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmstrike-0.1.0.tar.gz.
File metadata
- Download URL: llmstrike-0.1.0.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44d417b853bcb06d42a8a2b56cb63d19b0e79ea38fc06d0e060a499abc9965fa
|
|
| MD5 |
0a004bd324d6ab70c12b6c5ff9727f7e
|
|
| BLAKE2b-256 |
cc5a1cc11eb7c818e9ca852663b7e931f7256be279d154dc0eabc600697f462d
|
File details
Details for the file llmstrike-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmstrike-0.1.0-py3-none-any.whl
- Upload date:
- Size: 70.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2ff1cf92bfc973052bc9c924faea4fe85f81eb6a08eaf43c0f593dcf9fe76da
|
|
| MD5 |
a71e8a0ed264df87f2a75a986267a218
|
|
| BLAKE2b-256 |
5aa07c965706b7d38d9eb5ff83a31c5043091bc4d379ac6df075713f9aeb4a81
|