Pure-Rust prompt-injection detector with 1.5MB embedded MLP classifier. 98.40% accuracy, p50 14ms CPU inference. Apache-2.0/MIT alternative to Rebuff (archived 2025) and Lakera Guard.
Project description
JailGuard
JailGuard is a pure-Rust prompt-injection detector with a 1.5 MB embedded MLP classifier. It scores text in p50 14 ms on CPU, achieves 98.40% accuracy on a 7,049-sample held-out test set drawn from 17 public datasets, and ships bindings for Rust, Python, JavaScript, Go, and Elixir. No external service, no API key. Dual-licensed under MIT OR Apache-2.0.
Quick start
Rust — cargo add jailguard
use jailguard::{detect, is_injection};
if is_injection("ignore previous instructions") {
return Err("blocked");
}
let result = detect("What is the capital of France?");
println!("score={:.3} risk={:?}", result.score, result.risk);
Python — pip install jailguard
import jailguard
if jailguard.is_injection("ignore previous instructions"):
raise RuntimeError("blocked")
result = jailguard.detect("What is the capital of France?")
print(result.score, result.risk)
JavaScript / TypeScript — npm install @yfedoseev/jailguard
import { detect, isInjection } from "@yfedoseev/jailguard";
if (isInjection("ignore previous instructions")) {
throw new Error("blocked");
}
const r = detect("What is the capital of France?");
console.log(r.score, r.risk);
Go — go get github.com/yfedoseev/jailguard/go
import jailguard "github.com/yfedoseev/jailguard/go"
if injection, _ := jailguard.IsInjection("ignore previous instructions"); injection {
log.Fatal("blocked")
}
result, _ := jailguard.Detect("What is the capital of France?")
fmt.Printf("score=%.3f risk=%v\n", result.Score, result.Risk)
Elixir — mix.exs
def deps do
[{:jailguard, "~> 0.1.2"}]
end
:ok = JailGuard.download_model()
{:ok, injection?} = JailGuard.is_injection("ignore previous instructions")
if injection?, do: raise("blocked")
{:ok, result} = JailGuard.detect("What is the capital of France?")
IO.inspect({result.score, result.risk})
Precompiled NIFs ship for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x86_64) — no Rust toolchain on install. Set JAILGUARD_BUILD=1 to compile from source on unsupported targets.
The classifier is embedded in every binding. The 90 MB MiniLM ONNX embedder is auto-downloaded to ~/.cache/jailguard/ on first use. For production: call jailguard::download_model() at startup to warm the cache before serving traffic.
JailGuard vs alternatives in 2026
| Feature | JailGuard | Lakera Guard | Rebuff | ProtectAI deberta-v3 | Meta Prompt Guard 2 |
|---|---|---|---|---|---|
| License | Apache 2.0 / MIT | proprietary (Check Point announced acquisition Sep 16, 2025) | Apache 2.0 — archived May 16, 2025 | Apache 2.0 (parent acq. by Palo Alto Jul 22, 2025) | Llama 4 Community (non-OSI) |
| Deployment | embedded library | SaaS API | self-host Python SDK | HF model | HF model |
| Model size | 1.5 MB MLP + 90 MB MiniLM ONNX | n/a (API) | n/a | ~440 MB | 22 M or 86 M params |
| Latency (CPU) | p50 14 ms | ~150–300 ms RTT | n/a | 104–212 ms | 92 ms (A100 GPU)¹ |
| Classification | 8-class taxonomy | binary | binary | binary | binary |
| Active in 2026? | ✅ | ✅ (Check Point pending) | ❌ archived | ✅ (Palo Alto) | ✅ |
| No PyTorch / no runtime dep | ✅ (Rust) | ❌ HTTP client | ❌ Python+OpenAI | ❌ PyTorch | ❌ PyTorch |
| Multi-language bindings | Rust, Python, JS, Go, Elixir | API clients | Python | Python | Python |
¹ Meta does not publish CPU latency for Prompt Guard 2.
Full methodology, dataset breakdown, and head-to-head local-CPU comparisons against protectai/deberta-v3-base-prompt-injection-v2, deepset/deberta-v3-base-injection, and madhurjindal/Jailbreak-Detector-Large are in BENCHMARKS.md.
API at a glance
pub fn detect(text: &str) -> DetectionOutput
pub fn is_injection(text: &str) -> bool
pub fn score(text: &str) -> f32
pub fn detect_batch(texts: &[&str]) -> Vec<DetectionOutput>
pub fn download_model() -> Result<PathBuf, Error>
pub struct DetectionOutput {
pub is_injection: bool,
pub score: f32,
pub confidence: f32,
pub risk: RiskLevel,
}
pub enum RiskLevel { Safe, Low, Medium, High, Critical }
Python / JS / Go / Elixir expose the same surface in idiomatic form. See docs/API.md for full per-language signatures.
How it works
JailGuard pairs a frozen sentence-embedding model with a small classifier:
- MiniLM-L6-v2 (384-dim, ONNX) produces a semantic vector for the input.
- A 3-layer MLP (384 → 256 → 128 → 1, ~130 K parameters, ReLU + dropout 0.2 + sigmoid) scores it as injection vs. benign.
The embedding model is frozen — no fine-tuning — which keeps training and inference cost on CPU modest. The classifier weights are a 1.5 MB JSON file include_str!'d into the binary at compile time.
┌─────────────────────────────────────────────────────────────┐
│ JAILGUARD DETECTION PIPELINE │
├─────────────────────────────────────────────────────────────┤
│ │
│ User Prompt │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ MiniLM-L6 │ Semantic Embedding (384-dim) │
│ │ (ONNX) │ • Pre-trained by Microsoft │
│ └──────┬──────┘ • Captures meaning, not just keywords │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Binary Classifier (Pure Rust) │ │
│ │ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Dense 256 │→ │ Dense 128 │ │ │
│ │ │ ReLU+Drop │ │ ReLU+Drop │ │ │
│ │ └─────────────┘ └─────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ Sigmoid (0-1) │ │ │
│ │ └─────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Detection Result │
│ • confidence: 0.0 - 1.0 │
│ • is_injection: confidence > 0.5 │
│ • risk: Safe | Low | Medium | High | Critical │
│ │
└─────────────────────────────────────────────────────────────┘
Measurements
Measured on Apple M3, last revalidated 2026-05-03. The pipeline test split is in-distribution (held out from the same 17-source training mix). J1N2 and shalyhinpavel are external datasets, neither used during training.
| Test set | Samples | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|
| Pipeline (in-distribution) | 7,049 | 98.40% | 98.56% | 97.98% | 0.983 |
| J1N2 mix (OOD) | 5,000 | 99.38% | 98.09% | 99.94% | 0.990 |
| shalyhinpavel hard-negatives (OOD) | 147 | 89.12% | 76.60% | 87.80% | 0.818 |
Latency (single CPU thread)
| Component | Apple M3 | Intel i5-10210U @ 1.6 GHz¹ |
|---|---|---|
| Embedding (MiniLM ONNX) | ~13 ms | ~36 ms |
| Classification (MLP) | ~1 ms | ~1 ms |
| Total (p50) | ~14 ms | ~37 ms |
| Total (p99) | ~19 ms | ~43 ms |
| Cold start | ~140 ms | ~350 ms |
¹ A 4-year-old low-power Chromebook CPU (Comet Lake-U, 2019, 4c/8t,
running ChromeOS Crostini Linux 6.6). Included to show JailGuard runs
well even on older / weaker hardware. Modern desktop or server CPUs
land closer to the M3 column. Full per-benchmark numbers in
BENCHMARKS.md.
Benchmarks
Reproducible latency and throughput numbers come from three harnesses:
benches/detect.rs— Criterion bench covering single-shotis_injection/detect/scoreand batch throughput atn = 1, 8, 32, 128. Run withcargo bench --bench detect.examples/cold_start_bench.rs— process-startup cost (ONNX session init + first inference). Run withcargo run --release --example cold_start_bench.scripts/bench.sh— portable POSIX wrapper that captures machine metadata (CPU, arch, kernel, toolchain) and emits a single markdown report. Works on Linux x86_64, Linux aarch64, macOS Intel, macOS Apple Silicon, and Chromebook Crostini.
Full methodology and head-to-head local-CPU comparisons in BENCHMARKS.md.
Attack categories covered in training
The classifier output is binary at the public API (injection / benign), but its training mix spans eight attack families:
| Category | Examples |
|---|---|
| Direct injection | "Ignore previous instructions" |
| Jailbreak | DAN, developer-mode prompts |
| Role-play | Persona-based overrides |
| System prompt leak | "Reveal your instructions" |
| Encoding attacks | Base64, ROT13, Unicode obfuscation |
| Context manipulation | Framing and separator tricks |
| Output manipulation | Format coercion |
| Indirect injection | Malicious content embedded in documents |
References
- all-MiniLM-L6-v2 — sentence embeddings
- PromptGuard (Meta)
- Rebuff (archived)
- Sentinel: SOTA model to protect against prompt injections
- Not What You've Signed Up For — indirect injection
Citation
If you use JailGuard in research or production, please cite:
@software{jailguard,
title = {JailGuard: Efficient Prompt Injection Detection via Pre-trained Embeddings},
author = {Yury Fedoseev},
year = {2026},
url = {https://github.com/yfedoseev/jailguard}
}
A machine-readable CITATION.cff is also available.
License
Dual-licensed under MIT OR Apache-2.0 at your option.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jailguard-0.1.2.tar.gz.
File metadata
- Download URL: jailguard-0.1.2.tar.gz
- Upload date:
- Size: 917.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a3e3899a93b9d30558b22f8eab14db8a142e9befff3f52a4b070769cce0f0f9
|
|
| MD5 |
b913b824b799ca5b981cc4dcd88910bf
|
|
| BLAKE2b-256 |
281ca00823f7b320f76bc42726e9a0dd606a3cc25b4b2012da3e1726ed058da3
|
File details
Details for the file jailguard-0.1.2-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: jailguard-0.1.2-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 9.4 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
354a9217af60944e242a65c33c65534f80c5007a241cb1e0a615aea8158614d6
|
|
| MD5 |
f9d0c81363442d22c05c2e9e222a861d
|
|
| BLAKE2b-256 |
4c0d6a4ad2643d7b264a049ceb03734f792302b382de94527c0c0f9867d04713
|
File details
Details for the file jailguard-0.1.2-cp38-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: jailguard-0.1.2-cp38-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 12.2 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef62ccc18c3c218d458d346a1feb78ae2eec700987fea55120973264fcf3c71a
|
|
| MD5 |
55414a39e3618cf0d96635bb0eee3f9b
|
|
| BLAKE2b-256 |
32e38dc528fbf908ab39672860bbd37661ee7dd6221bdd750172ddac864fd254
|
File details
Details for the file jailguard-0.1.2-cp38-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: jailguard-0.1.2-cp38-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 11.6 MB
- Tags: CPython 3.8+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
052b8e5578bd1ac05d46fcbd703cf9e804b2cbab6423e0f34d93b9fa4b60719b
|
|
| MD5 |
d9d52ff0eead8d387b46adfa8bc7066e
|
|
| BLAKE2b-256 |
e26f46d8bb4d0f1042a0cd960ff52b332c7aae21661b4a390b58935db90bfa40
|
File details
Details for the file jailguard-0.1.2-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: jailguard-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 9.5 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
089ac2df30b7634cbb4329fc5448d0671f8171a5566554043c8d498350236242
|
|
| MD5 |
a3c7be40aca9d7188efc9f96b1899669
|
|
| BLAKE2b-256 |
274f43cb438423105237aff0f5c4be1f7c4d807c13155f98b4496f64d805dd87
|
File details
Details for the file jailguard-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: jailguard-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 10.6 MB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47c5e2602ff079b8a6f0d4586c21adfd982151a900febce422c4c31bf3ba50c6
|
|
| MD5 |
488a2b2e998c9a8bf3ed49d1a8cc9536
|
|
| BLAKE2b-256 |
d30fd7952db938c6ece50120e47ac3af10267ac547049fcb919e08f07ded13c5
|