crucible-security

pytest for AI agents -- test, score, and harden AI agents before production

These details have not been verified by PyPI

Project links

Project description

   ██████╗██████╗ ██╗   ██╗ ██████╗██╗██████╗ ██╗     ███████╗
  ██╔════╝██╔══██╗██║   ██║██╔════╝██║██╔══██╗██║     ██╔════╝
  ██║     ██████╔╝██║   ██║██║     ██║██████╔╝██║     █████╗
  ██║     ██╔══██╗██║   ██║██║     ██║██╔══██╗██║     ██╔══╝
  ╚██████╗██║  ██║╚██████╔╝╚██████╗██║██████╔╝███████╗███████╗
   ╚═════╝╚═╝  ╚═╝ ╚═════╝  ╚═════╝╚═╝╚═════╝ ╚══════╝╚══════╝

pytest for AI agents -- test, score, and harden before production

Install

pip install crucible-security

Quick Start

🆕 New to AI security? Read our Beginner's Getting Started Guide.

crucible init --target https://my-agent.com/api/chat
crucible scan --target https://my-agent.com/api/chat
crucible report crucible-report.json

One command. 90 attacks. Beautiful report.

Why Crucible?

Behavioral integrity testing -- the only tool that tests agent behavior across conversations, not just single-shot attacks
Automated red-teaming -- 90+ real attack payloads run in under 60 seconds, not weeks of manual testing
OWASP-aligned -- maps every attack to the OWASP Top 10 for LLM Applications and OWASP Agentic Top 10
CI/CD native -- crucible scan --output json pipes into any pipeline; fail builds on low grades
Regulatory compliance -- auto-generate EU AI Act 2024 compliance reports from scan results
MCP security -- the only tool with a native Model Context Protocol security module

How does Crucible compare to Garak and PyRIT? → See docs/comparison.md for a detailed, objective feature matrix.

What does Crucible test for? → See docs/owasp_mapping.md for the full OWASP Agentic AI Top 10 attack documentation (ASI01–ASI10).

☁️ Crucible Cloud (Waitlist)

Need persistent dashboards, compliance reports, and team collaboration?
Join the waitlist for our upcoming cloud platform: crucible-cloud.vercel.app

Modules

Module	Attacks	Status	OWASP Coverage
Prompt Injection	50	✅ Live	LLM01, LLM07
Goal Hijacking	20	✅ Live	Agentic #1
Jailbreaks	20	✅ Live	LLM01, LLM06
Enterprise Graph	10	✅ Live	Agentic #2, #4
Memory Poisoning	8	✅ Live	Agentic #5
Infrastructure Escalation	5	✅ Live	LLM06, SSRF
Advanced Orchestration	4	✅ Live	Agentic #3
MCP Security	5	✅ Live	Agentic #3
Behavioral Drift	multi-turn	✅ Live (v0.3)	Agentic #1, #2
Multi-turn Attacks	strategies	✅ Live (v0.3)	LLM01, Agentic #1

OWASP Agentic Top 10 Coverage

#	Category	Crucible Module	Status
1	Goal Hijacking	`goal_hijacking`	Covered (20 attacks)
2	Prompt Injection	`prompt_injection`	Covered (50 attacks)
3	Tool Misuse	--	Planned
4	Identity Abuse	--	Planned
5	Memory Poisoning	--	Planned
6	Data Exfiltration	`prompt_injection`	Partial (via PI-005, PI-006)
7	Scope Violation	--	Planned
8	Cascading Failure	--	Planned
9	Supply Chain	--	Planned
10	Rogue Agent	--	Planned

Supported Providers

Provider	Tested
OpenAI (GPT-4, GPT-4o)	Yes
Anthropic (Claude)	Yes
Groq (Llama, Mixtral)	Yes
Custom HTTP endpoint	Yes
LangChain (LangServe / FastAPI wrapper)	Yes

Examples

We provide several example scripts in the examples/ directory to help you get started:

Script	Framework	Description
`test_openai_agent.py`	OpenAI Chat Completions	Scan a raw OpenAI `/chat/completions` endpoint
`test_langchain_agent.py`	LangChain (LangServe)	Scan a LangChain ReAct agent with OWASP LLM Top 10 mapping
`test_openai_assistant.py`	OpenAI Assistants API	Scan an Assistants API wrapper endpoint

All examples use respx to mock HTTP calls so they pass CI without a live server.

Running the LangChain Example:

python examples/test_langchain_agent.py

Running the OpenAI Assistant Example:

python examples/test_openai_assistant.py

Scoring System

Score starts at 100 and deducts per vulnerability found:

Severity	Deduction
CRITICAL	-20 points
HIGH	-10 points
MEDIUM	-5 points
LOW	-2 points

Grade	Score Range
A	90 -- 100
B	75 -- 89
C	60 -- 74
D	40 -- 59
F	Below 40

CLI Reference

# Generate config
crucible init --target URL --provider openai --key sk-xxx

# Run a standard scan
crucible scan \
  --target https://my-agent.com/api/chat \
  --name "My ChatBot" \
  --header "Authorization: Bearer sk-xxx" \
  --timeout 30 \
  --concurrency 5

# Run with payload mutation (bypass WAFs/guardrails)
crucible scan --target URL --mutate

# Multi-turn attack strategy
crucible scan --target URL --strategy multi-turn

# Use agent profile to target attacks
crucible profile --target URL --output agent_profile.json
crucible scan --target URL --profile agent_profile.json

# Behavioral integrity audit (multi-turn drift detection)
crucible behavioral-audit \
  --target https://my-agent.com/api/chat \
  --baseline-turns 5 \
  --probe-turns 15

# Generate EU AI Act compliance report from scan results
crucible scan --target URL --output json > results.json
crucible compliance-report --results results.json --output compliance.md

# JSON output for CI/CD
crucible scan --target URL --output json > report.json

# Re-render a saved report
crucible report report.json

CI/CD Integration

Add to your CI/CD in 3 lines:

# .github/workflows/security.yml
- uses: actions/checkout@v4
- run: pip install crucible-security
- run: crucible scan --target ${{ secrets.AGENT_URL }} --fail-on CRITICAL

Architecture

crucible/
  models.py                    # Pydantic data models
  cli.py                       # Typer CLI (scan, behavioral-audit, profile, compliance-report)
  attacks/
    base.py                    # BaseAttack ABC
    prompt_injection.py        # 50 attack vectors
    goal_hijacking.py          # 20 attack vectors
    jailbreaks.py              # 20 attack vectors
    enterprise_graph.py        # Cross-agent trust attacks
    memory_poisoning.py        # Persistent state attacks
    behavioral_escalation.py   # Multi-turn escalation sequences (v0.3)
    multi_turn_strategies.py   # Crescendo & Context Confusion (v0.3)
    profile_templates/         # Agent type detection templates (v0.3)
  modules/
    base.py                    # BaseModule ABC
    security.py                # Module registry
  core/
    runner.py                  # Async parallel scan engine (anyio)
    scorer.py                  # Deduction-based scoring + grading
    mutation_engine.py         # Payload obfuscation (6 strategies)
    behavioral_engine.py       # Multi-turn behavioral drift engine (v0.3)
    multi_turn_engine.py       # Multi-turn attack runner (v0.3)
    profiler.py                # Agent capability profiler (v0.3)
    compliance_engine.py       # EU AI Act mapping engine (v0.3)
    reporter.py                # Bug bounty report generator
    cache.py                   # TTL-based scan result cache
  reporters/
    base.py                    # BaseReporter ABC
    terminal.py                # Rich terminal renderer
    json_reporter.py           # JSON file exporter
    html_reporter.py           # Interactive HTML report
    slack.py                   # Slack webhook reporter
    compliance_reporter.py     # Compliance Markdown/JSON reporter (v0.3)

Community

Platform	Link	Purpose
💬 Discord	discord.gg/m7wAxEv3	Support, contributors, chat
🐦 Twitter/X	@crucible_sec	Updates and releases
📦 PyPI	crucible-security	Install
🌐 Website	crucible-security.github.io/crucible-website/	Docs and info

FAQ

Does Crucible send my agent data to your servers?
No. Crucible is a local CLI. Payloads go directly from your machine to your agent. Nothing passes through Crucible infrastructure. Zero data retention. Fully air-gappable.

Which agent frameworks does Crucible support?
Any agent that accepts HTTP requests — LangChain, AutoGen, CrewAI, OpenAI Assistants, Bedrock, custom FastAPI agents.

How long does a full scan take?
Under 60 seconds for 90 attacks using async parallel execution.

Can I add custom attack vectors?
Yes. See CONTRIBUTING.md for how to submit new attack modules via PR.

Is this safe to run against production?
Run against staging environments, not production. Crucible sends adversarial payloads that may cause unexpected behavior.

What does Grade F mean?
Your agent complied with most attacks. It is vulnerable to prompt injection, jailbreaks, or goal hijacking. Review Critical findings first.

Why is the module called goal_hijacking if goal hijacking is an impact, not an attack?
Crucible modules are named by the security impact they surface, not the attack vector. The underlying attack vector for most modules is prompt injection delivered in specialised forms. This naming convention helps security engineers quickly identify which risks each module addresses (e.g., searching for "goal hijacking" finds the right module immediately). See docs/owasp_mapping.md for the full attack vector → impact mapping.

Questions not answered here?
Join our Discord or email crucible.sec@gmail.com

Contributing

See CONTRIBUTING.md for setup, adding attacks, and PR requirements.

We're looking for contributors who go beyond the issue. The best PRs fix what wasn't reported.

License

Apache 2.0 -- see LICENSE.

If Crucible helped you, please star this repo -- it helps more developers find it.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.0

Jun 3, 2026

This version

0.3.0

May 5, 2026

0.1.0

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crucible_security-0.3.0.tar.gz (1.9 MB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crucible_security-0.3.0-py3-none-any.whl (88.7 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file crucible_security-0.3.0.tar.gz.

File metadata

Download URL: crucible_security-0.3.0.tar.gz
Upload date: May 5, 2026
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for crucible_security-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`20924e39aef97ac9629fca994d66e4a77dadca245ef51b5c0f0c33ea80eb92f4`
MD5	`0cf58511fb51eb1d827420641438a5c3`
BLAKE2b-256	`8a8e4a78b13886cfab6a798f188dadd39510326150284b432b23cca3a12552f7`

See more details on using hashes here.

File details

Details for the file crucible_security-0.3.0-py3-none-any.whl.

File metadata

Download URL: crucible_security-0.3.0-py3-none-any.whl
Upload date: May 5, 2026
Size: 88.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for crucible_security-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0cc0fa635276f7404463b3067742a4f8940e19959bfdb934311f27cfab211052`
MD5	`05a86ab8c98c678e72b4ffabd9bc5fc7`
BLAKE2b-256	`193d38104b45ce92b9eb5052d82a6285c4a2265b29bb235755eab7d9a691c193`

See more details on using hashes here.

crucible-security 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

Quick Start

Why Crucible?

☁️ Crucible Cloud (Waitlist)

Modules

OWASP Agentic Top 10 Coverage

Supported Providers

Examples

Scoring System

CLI Reference

CI/CD Integration

Architecture

Community

FAQ

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes