pytest for AI agents -- test, score, and harden AI agents before production
Project description
██████╗██████╗ ██╗ ██╗ ██████╗██╗██████╗ ██╗ ███████╗ ██╔════╝██╔══██╗██║ ██║██╔════╝██║██╔══██╗██║ ██╔════╝ ██║ ██████╔╝██║ ██║██║ ██║██████╔╝██║ █████╗ ██║ ██╔══██╗██║ ██║██║ ██║██╔══██╗██║ ██╔══╝ ╚██████╗██║ ██║╚██████╔╝╚██████╗██║██████╔╝███████╗███████╗ ╚═════╝╚═╝ ╚═╝ ╚═════╝ ╚═════╝╚═╝╚═════╝ ╚══════╝╚══════╝pytest for AI agents -- test, score, and harden before production
Install
pip install crucible-security
Quick Start
crucible init --target https://my-agent.com/api/chat
crucible scan --target https://my-agent.com/api/chat
crucible report crucible-report.json
One command. 90 attacks. Beautiful report.
Why Crucible?
- Automated red-teaming -- 90 real attack payloads run in under 60 seconds, not weeks of manual testing
- OWASP-aligned -- maps every attack to the OWASP Top 10 for LLM Applications and OWASP Agentic Top 10
- CI/CD native --
crucible scan --output jsonpipes into any pipeline; fail builds on low grades
Modules
| Module | Attacks | Status | OWASP Coverage |
|---|---|---|---|
| Prompt Injection | 50 | Live | LLM01, LLM07 |
| Goal Hijacking | 20 | Live | Agentic #1 |
| Jailbreaks | 20 | Live | LLM01, LLM06 |
| Tool Misuse | -- | Coming | Agentic #3 |
| Identity Abuse | -- | Coming | Agentic #4 |
| Memory Poisoning | -- | Coming | Agentic #5 |
| Data Exfiltration | -- | Coming | LLM06 |
| Hallucination | -- | Coming | LLM09 |
OWASP Agentic Top 10 Coverage
| # | Category | Crucible Module | Status |
|---|---|---|---|
| 1 | Goal Hijacking | goal_hijacking |
Covered (20 attacks) |
| 2 | Prompt Injection | prompt_injection |
Covered (50 attacks) |
| 3 | Tool Misuse | -- | Planned |
| 4 | Identity Abuse | -- | Planned |
| 5 | Memory Poisoning | -- | Planned |
| 6 | Data Exfiltration | prompt_injection |
Partial (via PI-005, PI-006) |
| 7 | Scope Violation | -- | Planned |
| 8 | Cascading Failure | -- | Planned |
| 9 | Supply Chain | -- | Planned |
| 10 | Rogue Agent | -- | Planned |
Supported Providers
| Provider | Tested |
|---|---|
| OpenAI (GPT-4, GPT-4o) | Yes |
| Anthropic (Claude) | Yes |
| Groq (Llama, Mixtral) | Yes |
| Custom HTTP endpoint | Yes |
Scoring System
Score starts at 100 and deducts per vulnerability found:
| Severity | Deduction |
|---|---|
| CRITICAL | -20 points |
| HIGH | -10 points |
| MEDIUM | -5 points |
| LOW | -2 points |
| Grade | Score Range |
|---|---|
| A | 90 -- 100 |
| B | 75 -- 89 |
| C | 60 -- 74 |
| D | 40 -- 59 |
| F | Below 40 |
CLI Reference
# Generate config
crucible init --target URL --provider openai --key sk-xxx
# Run a full scan
crucible scan \
--target https://my-agent.com/api/chat \
--name "My ChatBot" \
--header "Authorization: Bearer sk-xxx" \
--timeout 30 \
--concurrency 5
# JSON output for CI/CD
crucible scan --target URL --output json > report.json
# Re-render a saved report
crucible report report.json
CI/CD Integration
# .github/workflows/security.yml
- name: Security Scan
run: |
pip install crucible-security
crucible scan \
--target ${{ secrets.AGENT_URL }} \
--header "Authorization: Bearer ${{ secrets.AGENT_KEY }}" \
--output json > crucible-report.json
- name: Check Grade
run: |
grade=$(python -c "import json; print(json.load(open('crucible-report.json'))['grade'])")
if [ "$grade" = "F" ] || [ "$grade" = "D" ]; then
echo "Security grade $grade -- failing pipeline"
exit 1
fi
Architecture
crucible/
models.py # Pydantic data models
cli.py # Typer CLI (init, scan, report)
attacks/
base.py # BaseAttack ABC
prompt_injection.py # 50 attack vectors
goal_hijacking.py # 20 attack vectors
jailbreaks.py # 20 attack vectors
modules/
base.py # BaseModule ABC
security.py # Module registry
core/
runner.py # Async parallel scan engine (anyio)
scorer.py # Deduction-based scoring + grading
reporters/
base.py # BaseReporter ABC
terminal.py # Rich terminal renderer
json_reporter.py # JSON file exporter
Contributing
See CONTRIBUTING.md for setup, adding attacks, and PR requirements.
We're looking for contributors who go beyond the issue. The best PRs fix what wasn't reported.
License
Apache 2.0 -- see LICENSE.
If Crucible helped you, please star this repo -- it helps more developers find it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crucible_security-0.1.0.tar.gz.
File metadata
- Download URL: crucible_security-0.1.0.tar.gz
- Upload date:
- Size: 47.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a4118a934edd648964282029e7cf12b0876925b0f075b6d4d1ba687255a674e
|
|
| MD5 |
4454d0c32571c2d1845ce6ac1cc037b8
|
|
| BLAKE2b-256 |
3a26c8049db30c74fbe08fdd8493156d18649e6e03f5636a7635a49b04cab905
|
File details
Details for the file crucible_security-0.1.0-py3-none-any.whl.
File metadata
- Download URL: crucible_security-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73db796abfe0f0576bb6dfdab0264d19b1ca8a9901c1cdafb242cf380b92441e
|
|
| MD5 |
93fc97284fd292ce985599cde26ed25d
|
|
| BLAKE2b-256 |
5ca8c16a840cef818b347f5d5848d4bb6a3717cc847fb40d28ec36423044a952
|