Public adversarial leaderboard for prompt injection detection — the referee of the field

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

raucle

These details have not been verified by PyPI

Project links

Leaderboard

Project description

raucle-bench

Public adversarial leaderboard for prompt-injection detection. Benchmarks open-source LLM guardrails on a shared, version-controlled dataset of attack and benign prompts.

Every guardrail vendor claims accuracy. Almost none publish reproducible numbers. This is the referee.

Live leaderboard: raucle.com/bench/ — client-side dashboard that always reflects results/latest.json in this repo.
Dataset: 165 curated prompts across 6 attack classes + benign baseline. Grows toward 10k+.
Methodology: precision, recall, F1, false-positive rate, strict-action match, p50/p99 latency per adapter.
License: MIT (code and dataset).

Why this exists

Lakera, Llama Guard, LLM Guard, Rebuff, Vigil, NeMo, raucle-detect — every prompt-injection detector ships with marketing numbers and no way to reproduce them. There is no SPEC2017 for AI security. The result is that:

Vendors compete on claims rather than on evidence.
Customers cannot tell whether a detector actually protects them.
Researchers measuring detection quality have to write the benchmark themselves every time.

raucle-bench fixes this by being the same dataset run against every adapter, with the script and outputs in version control. Anyone can re-run the benchmark, submit a new adapter, or contribute a prompt the dataset is missing.

What's in v0.1

Adapter	Status
`always-allow` baseline	✅ Built in
`always-block` baseline	✅ Built in
`raucle-detect` (strict / standard / permissive)	✅ Built in
`llm-guard`	✅ Built in (install with `pip install '.[llm-guard]'`)
`rebuff`	⏳ Adapter PR welcome
`vigil`	⏳ Adapter PR welcome
`nemo-guardrails`	⏳ Adapter PR welcome
`llama-guard`	⏳ Adapter PR welcome
`lakera-guard`	⏳ Adapter PR welcome (API key required)

If you maintain a detector — open or commercial — we want to benchmark it. The adapter interface is ~30 lines of Python (see src/raucle_bench/adapter.py).

Quick start

pip install -e '.[raucle]'
raucle-bench --list-adapters
raucle-bench                          # runs everything, writes results/<timestamp>.json|md|html
raucle-bench --adapters raucle-strict # one adapter only

CLI flags:

--datasets-dir DIR     directory of *.jsonl files (default: datasets/)
--results-dir DIR      where to write outputs (default: results/)
--adapters NAME...     limit to specific adapters
--run-id ID            override the timestamp
--validate-only        load and validate the dataset without running anything
--list-adapters        print registered adapter names

Scoring methodology

A prediction is positive if the adapter says ALERT or BLOCK, negative if it says ALLOW. Ground truth is positive for any non-benign prompt.

For each adapter we report:

Metric	What it tells you
F1	Harmonic mean of precision and recall. Single-number summary.
Detection rate	Fraction of attacks correctly flagged. Same as recall.
False-positive rate	Fraction of benign prompts incorrectly flagged. The thing that breaks production.
Strict match rate	Fraction of prompts where predicted action exactly matches expected action. Catches "detected but recommended wrong remediation."
p50 / p99 latency	Per-prompt wall-clock time.

No single metric is sufficient. A detector that always says BLOCK has 100% recall and a meaningless 100% FPR. A detector that always says ALLOW has 0% FPR and 0% recall. The leaderboard reports both and the trivial baselines so you can calibrate the real entries.

Dataset

165 prompts as of v0.1, broken down by attack class:

Class	Prompts	What it tests
`direct_injection`	25	Override / role hijack / ChatML / context stuff
`jailbreak`	25	DAN, developer mode, hypothetical pretext, multi-turn escalation
`data_exfiltration`	20	System prompt extraction, credential leakage, exfil channels
`tool_abuse`	20	Shell injection, path traversal, SQL injection, SSRF, code injection
`evasion`	20	Base64 / ROT13 / hex smuggling, homoglyphs, zero-width, leet, case-flip
`indirect_injection`	15	Document injection, tool poisoning, RAG poisoning, markdown exfil
`benign`	40	Clean prompts including hard negatives (mentions of "ignore", "system prompt", "developer mode" in legit contexts)

See datasets/README.md for the schema, source labelling, and ethical considerations. The dataset is MIT-licensed; please ensure contributions carry compatible rights.

Adding an adapter

# src/raucle_bench/adapters/my_tool.py
from raucle_bench.adapter import Prediction

class MyToolAdapter:
    name = "my-tool-v1"
    version = "0.1.0"

    def setup(self) -> None:
        self._scanner = my_tool.Scanner()

    def teardown(self) -> None:
        self._scanner = None

    def predict(self, prompt: str) -> Prediction:
        result = self._scanner.scan(prompt)
        action = "BLOCK" if result.is_attack else "ALLOW"
        return Prediction(action=action, confidence=result.score)

Register it in src/raucle_bench/cli.py under _register_optional_adapters() so missing deps don't break the rest of the benchmark.

Adding a prompt

Pick the right datasets/<class>.jsonl file.
Add a JSONL line with the next free ID in the sequence.
Run raucle-bench --validate-only to confirm the dataset still loads.
Open a PR with the dataset label.

See datasets/README.md for the schema.

Weekly auto-run

.github/workflows/weekly-run.yml runs the full benchmark every Monday at 06:00 UTC and commits the results directly to main. The latest snapshot is at results/latest.json and results/latest.html.

Roadmap

v0.2: dataset to 500+ prompts; LLM Guard, Vigil, Rebuff adapters; balanced-accuracy metric alongside F1.
v0.3: dashboard at bench.raucle.com (Cloudflare Pages); time-series view of every adapter's score across weekly runs.
v0.4: Llama Guard, NeMo Guardrails, Lakera (API key in repo secret) adapters.
v1.0: 10k+ prompts; multimodal (image + audio); third-party submission process.

raucle-detect — the prompt injection detection engine being benchmarked.
Raucle Provenance Receipt v1 — the verifiable-AI standard from the same team.
Cryptographic Provenance for AI Workflows — context on why we are publishing benchmarks as protocols rather than blog posts.

License

MIT for both code and dataset. Contributions are welcomed under the same terms.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

raucle

These details have not been verified by PyPI

Project links

Leaderboard

Release history Release notifications | RSS feed

This version

0.1.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raucle_bench-0.1.0.tar.gz (32.0 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

raucle_bench-0.1.0-py3-none-any.whl (19.9 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file raucle_bench-0.1.0.tar.gz.

File metadata

Download URL: raucle_bench-0.1.0.tar.gz
Upload date: May 14, 2026
Size: 32.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for raucle_bench-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e79a6f97b1517118efad9b69de019a5ad7aad886d4435b0fea6724d0f3f4a65e`
MD5	`0420a380377f29fef850516c6df89b08`
BLAKE2b-256	`59df3602a271bc74c28282832d2253d5b309aa2fad023e3e45196886410a9dcc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for raucle_bench-0.1.0.tar.gz:

Publisher: publish.yml on craigamcw/raucle-bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: raucle_bench-0.1.0.tar.gz
- Subject digest: e79a6f97b1517118efad9b69de019a5ad7aad886d4435b0fea6724d0f3f4a65e
- Sigstore transparency entry: 1535811725
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: craigamcw/raucle-bench@5b43f2bb4f0beca8e6e544a2ef6ff225cbb6a65f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/craigamcw
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5b43f2bb4f0beca8e6e544a2ef6ff225cbb6a65f
- Trigger Event: workflow_dispatch

File details

Details for the file raucle_bench-0.1.0-py3-none-any.whl.

File metadata

Download URL: raucle_bench-0.1.0-py3-none-any.whl
Upload date: May 14, 2026
Size: 19.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for raucle_bench-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`29e3be2f7ec35c2c4200e25af860f46e1f25099a0e4b27c6894c41ba4e5d501f`
MD5	`d67a76aaf452e74efca707e4e827704a`
BLAKE2b-256	`a212bd5c6f26d858dcb51181c55b3003ef1e1b0327f3a695bf610ea22d352c52`

See more details on using hashes here.

Provenance

The following attestation bundles were made for raucle_bench-0.1.0-py3-none-any.whl:

Publisher: publish.yml on craigamcw/raucle-bench

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: raucle_bench-0.1.0-py3-none-any.whl
- Subject digest: 29e3be2f7ec35c2c4200e25af860f46e1f25099a0e4b27c6894c41ba4e5d501f
- Sigstore transparency entry: 1535811844
- Sigstore integration time: May 14, 2026
Source repository:
- Permalink: craigamcw/raucle-bench@5b43f2bb4f0beca8e6e544a2ef6ff225cbb6a65f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/craigamcw
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5b43f2bb4f0beca8e6e544a2ef6ff225cbb6a65f
- Trigger Event: workflow_dispatch

raucle-bench 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

raucle-bench

Why this exists

What's in v0.1

Quick start

Scoring methodology

Dataset

Adding an adapter

Adding a prompt

Weekly auto-run

Roadmap

Related

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance