Public adversarial leaderboard for prompt injection detection — the referee of the field
Project description
raucle-bench
Public adversarial leaderboard for prompt-injection detection. Benchmarks open-source LLM guardrails on a shared, version-controlled dataset of attack and benign prompts.
Every guardrail vendor claims accuracy. Almost none publish reproducible numbers. This is the referee.
- Live leaderboard: raucle.com/bench/ — client-side dashboard that always reflects
results/latest.jsonin this repo. - Dataset: 165 curated prompts across 6 attack classes + benign baseline. Grows toward 10k+.
- Methodology: precision, recall, F1, false-positive rate, strict-action match, p50/p99 latency per adapter.
- License: MIT (code and dataset).
Why this exists
Lakera, Llama Guard, LLM Guard, Rebuff, Vigil, NeMo, raucle-detect — every prompt-injection detector ships with marketing numbers and no way to reproduce them. There is no SPEC2017 for AI security. The result is that:
- Vendors compete on claims rather than on evidence.
- Customers cannot tell whether a detector actually protects them.
- Researchers measuring detection quality have to write the benchmark themselves every time.
raucle-bench fixes this by being the same dataset run against every adapter, with the script and outputs in version control. Anyone can re-run the benchmark, submit a new adapter, or contribute a prompt the dataset is missing.
What's in v0.1
| Adapter | Status |
|---|---|
always-allow baseline |
✅ Built in |
always-block baseline |
✅ Built in |
raucle-detect (strict / standard / permissive) |
✅ Built in |
llm-guard |
✅ Built in (install with pip install '.[llm-guard]') |
rebuff |
⏳ Adapter PR welcome |
vigil |
⏳ Adapter PR welcome |
nemo-guardrails |
⏳ Adapter PR welcome |
llama-guard |
⏳ Adapter PR welcome |
lakera-guard |
⏳ Adapter PR welcome (API key required) |
If you maintain a detector — open or commercial — we want to benchmark it. The adapter interface is ~30 lines of Python (see src/raucle_bench/adapter.py).
Quick start
pip install -e '.[raucle]'
raucle-bench --list-adapters
raucle-bench # runs everything, writes results/<timestamp>.json|md|html
raucle-bench --adapters raucle-strict # one adapter only
CLI flags:
--datasets-dir DIR directory of *.jsonl files (default: datasets/)
--results-dir DIR where to write outputs (default: results/)
--adapters NAME... limit to specific adapters
--run-id ID override the timestamp
--validate-only load and validate the dataset without running anything
--list-adapters print registered adapter names
Scoring methodology
A prediction is positive if the adapter says ALERT or BLOCK, negative if it says ALLOW. Ground truth is positive for any non-benign prompt.
For each adapter we report:
| Metric | What it tells you |
|---|---|
| F1 | Harmonic mean of precision and recall. Single-number summary. |
| Detection rate | Fraction of attacks correctly flagged. Same as recall. |
| False-positive rate | Fraction of benign prompts incorrectly flagged. The thing that breaks production. |
| Strict match rate | Fraction of prompts where predicted action exactly matches expected action. Catches "detected but recommended wrong remediation." |
| p50 / p99 latency | Per-prompt wall-clock time. |
No single metric is sufficient. A detector that always says BLOCK has 100% recall and a meaningless 100% FPR. A detector that always says ALLOW has 0% FPR and 0% recall. The leaderboard reports both and the trivial baselines so you can calibrate the real entries.
Dataset
165 prompts as of v0.1, broken down by attack class:
| Class | Prompts | What it tests |
|---|---|---|
direct_injection |
25 | Override / role hijack / ChatML / context stuff |
jailbreak |
25 | DAN, developer mode, hypothetical pretext, multi-turn escalation |
data_exfiltration |
20 | System prompt extraction, credential leakage, exfil channels |
tool_abuse |
20 | Shell injection, path traversal, SQL injection, SSRF, code injection |
evasion |
20 | Base64 / ROT13 / hex smuggling, homoglyphs, zero-width, leet, case-flip |
indirect_injection |
15 | Document injection, tool poisoning, RAG poisoning, markdown exfil |
benign |
40 | Clean prompts including hard negatives (mentions of "ignore", "system prompt", "developer mode" in legit contexts) |
See datasets/README.md for the schema, source labelling, and ethical considerations. The dataset is MIT-licensed; please ensure contributions carry compatible rights.
Adding an adapter
# src/raucle_bench/adapters/my_tool.py
from raucle_bench.adapter import Prediction
class MyToolAdapter:
name = "my-tool-v1"
version = "0.1.0"
def setup(self) -> None:
self._scanner = my_tool.Scanner()
def teardown(self) -> None:
self._scanner = None
def predict(self, prompt: str) -> Prediction:
result = self._scanner.scan(prompt)
action = "BLOCK" if result.is_attack else "ALLOW"
return Prediction(action=action, confidence=result.score)
Register it in src/raucle_bench/cli.py under _register_optional_adapters() so missing deps don't break the rest of the benchmark.
Adding a prompt
- Pick the right
datasets/<class>.jsonlfile. - Add a JSONL line with the next free ID in the sequence.
- Run
raucle-bench --validate-onlyto confirm the dataset still loads. - Open a PR with the
datasetlabel.
See datasets/README.md for the schema.
Weekly auto-run
.github/workflows/weekly-run.yml runs the full benchmark every Monday at 06:00 UTC and commits the results directly to main. The latest snapshot is at results/latest.json and results/latest.html.
Roadmap
- v0.2: dataset to 500+ prompts; LLM Guard, Vigil, Rebuff adapters; balanced-accuracy metric alongside F1.
- v0.3: dashboard at
bench.raucle.com(Cloudflare Pages); time-series view of every adapter's score across weekly runs. - v0.4: Llama Guard, NeMo Guardrails, Lakera (API key in repo secret) adapters.
- v1.0: 10k+ prompts; multimodal (image + audio); third-party submission process.
Related
- raucle-detect — the prompt injection detection engine being benchmarked.
- Raucle Provenance Receipt v1 — the verifiable-AI standard from the same team.
- Cryptographic Provenance for AI Workflows — context on why we are publishing benchmarks as protocols rather than blog posts.
License
MIT for both code and dataset. Contributions are welcomed under the same terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file raucle_bench-0.1.0.tar.gz.
File metadata
- Download URL: raucle_bench-0.1.0.tar.gz
- Upload date:
- Size: 32.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e79a6f97b1517118efad9b69de019a5ad7aad886d4435b0fea6724d0f3f4a65e
|
|
| MD5 |
0420a380377f29fef850516c6df89b08
|
|
| BLAKE2b-256 |
59df3602a271bc74c28282832d2253d5b309aa2fad023e3e45196886410a9dcc
|
Provenance
The following attestation bundles were made for raucle_bench-0.1.0.tar.gz:
Publisher:
publish.yml on craigamcw/raucle-bench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
raucle_bench-0.1.0.tar.gz -
Subject digest:
e79a6f97b1517118efad9b69de019a5ad7aad886d4435b0fea6724d0f3f4a65e - Sigstore transparency entry: 1535811725
- Sigstore integration time:
-
Permalink:
craigamcw/raucle-bench@5b43f2bb4f0beca8e6e544a2ef6ff225cbb6a65f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/craigamcw
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5b43f2bb4f0beca8e6e544a2ef6ff225cbb6a65f -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file raucle_bench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: raucle_bench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29e3be2f7ec35c2c4200e25af860f46e1f25099a0e4b27c6894c41ba4e5d501f
|
|
| MD5 |
d67a76aaf452e74efca707e4e827704a
|
|
| BLAKE2b-256 |
a212bd5c6f26d858dcb51181c55b3003ef1e1b0327f3a695bf610ea22d352c52
|
Provenance
The following attestation bundles were made for raucle_bench-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on craigamcw/raucle-bench
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
raucle_bench-0.1.0-py3-none-any.whl -
Subject digest:
29e3be2f7ec35c2c4200e25af860f46e1f25099a0e4b27c6894c41ba4e5d501f - Sigstore transparency entry: 1535811844
- Sigstore integration time:
-
Permalink:
craigamcw/raucle-bench@5b43f2bb4f0beca8e6e544a2ef6ff225cbb6a65f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/craigamcw
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5b43f2bb4f0beca8e6e544a2ef6ff225cbb6a65f -
Trigger Event:
workflow_dispatch
-
Statement type: