Skip to main content

Pure-Rust prompt-injection detector with 1.5MB embedded MLP classifier. 98.40% accuracy, p50 14ms CPU inference. Apache-2.0/MIT alternative to Rebuff (archived 2025) and Lakera Guard.

Project description

JailGuard

Accuracy F1 Score Pure Rust License crates.io PyPI npm

JailGuard is a pure-Rust prompt-injection detector with a 1.5 MB embedded MLP classifier. It scores text in p50 14 ms on CPU, achieves 98.40% accuracy on a 7,049-sample held-out test set drawn from 17 public datasets, and ships bindings for Rust, Python, JavaScript, Go, and Elixir. No external service, no API key. Dual-licensed under MIT OR Apache-2.0.

Quick start

Rustcargo add jailguard

use jailguard::{detect, is_injection};

if is_injection("ignore previous instructions") {
    return Err("blocked");
}

let result = detect("What is the capital of France?");
println!("score={:.3} risk={:?}", result.score, result.risk);

Pythonpip install jailguard

import jailguard

if jailguard.is_injection("ignore previous instructions"):
    raise RuntimeError("blocked")

result = jailguard.detect("What is the capital of France?")
print(result.score, result.risk)

JavaScript / TypeScriptnpm install @yfedoseev/jailguard

import { detect, isInjection } from "@yfedoseev/jailguard";

if (isInjection("ignore previous instructions")) {
    throw new Error("blocked");
}

const r = detect("What is the capital of France?");
console.log(r.score, r.risk);

Gogo get github.com/yfedoseev/jailguard/go

import jailguard "github.com/yfedoseev/jailguard/go"

if injection, _ := jailguard.IsInjection("ignore previous instructions"); injection {
    log.Fatal("blocked")
}

result, _ := jailguard.Detect("What is the capital of France?")
fmt.Printf("score=%.3f risk=%v\n", result.Score, result.Risk)

Elixirmix.exs

def deps do
  [{:jailguard, "~> 0.1.2"}]
end
:ok = JailGuard.download_model()

{:ok, injection?} = JailGuard.is_injection("ignore previous instructions")
if injection?, do: raise("blocked")

{:ok, result} = JailGuard.detect("What is the capital of France?")
IO.inspect({result.score, result.risk})

Precompiled NIFs ship for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x86_64) — no Rust toolchain on install. Set JAILGUARD_BUILD=1 to compile from source on unsupported targets.

The classifier is embedded in every binding. The 90 MB MiniLM ONNX embedder is auto-downloaded to ~/.cache/jailguard/ on first use. For production: call jailguard::download_model() at startup to warm the cache before serving traffic.

JailGuard vs alternatives in 2026

Feature JailGuard Lakera Guard Rebuff ProtectAI deberta-v3 Meta Prompt Guard 2
License Apache 2.0 / MIT proprietary (Check Point announced acquisition Sep 16, 2025) Apache 2.0 — archived May 16, 2025 Apache 2.0 (parent acq. by Palo Alto Jul 22, 2025) Llama 4 Community (non-OSI)
Deployment embedded library SaaS API self-host Python SDK HF model HF model
Model size 1.5 MB MLP + 90 MB MiniLM ONNX n/a (API) n/a ~440 MB 22 M or 86 M params
Latency (CPU) p50 14 ms ~150–300 ms RTT n/a 104–212 ms 92 ms (A100 GPU)¹
Classification 8-class taxonomy binary binary binary binary
Active in 2026? ✅ (Check Point pending) ❌ archived ✅ (Palo Alto)
No PyTorch / no runtime dep ✅ (Rust) ❌ HTTP client ❌ Python+OpenAI ❌ PyTorch ❌ PyTorch
Multi-language bindings Rust, Python, JS, Go, Elixir API clients Python Python Python

¹ Meta does not publish CPU latency for Prompt Guard 2.

Full methodology, dataset breakdown, and head-to-head local-CPU comparisons against protectai/deberta-v3-base-prompt-injection-v2, deepset/deberta-v3-base-injection, and madhurjindal/Jailbreak-Detector-Large are in BENCHMARKS.md.

API at a glance

pub fn detect(text: &str) -> DetectionOutput
pub fn is_injection(text: &str) -> bool
pub fn score(text: &str) -> f32
pub fn detect_batch(texts: &[&str]) -> Vec<DetectionOutput>
pub fn download_model() -> Result<PathBuf, Error>

pub struct DetectionOutput {
    pub is_injection: bool,
    pub score: f32,
    pub confidence: f32,
    pub risk: RiskLevel,
}

pub enum RiskLevel { Safe, Low, Medium, High, Critical }

Python / JS / Go / Elixir expose the same surface in idiomatic form. See docs/API.md for full per-language signatures.

How it works

JailGuard pairs a frozen sentence-embedding model with a small classifier:

  1. MiniLM-L6-v2 (384-dim, ONNX) produces a semantic vector for the input.
  2. A 3-layer MLP (384 → 256 → 128 → 1, ~130 K parameters, ReLU + dropout 0.2 + sigmoid) scores it as injection vs. benign.

The embedding model is frozen — no fine-tuning — which keeps training and inference cost on CPU modest. The classifier weights are a 1.5 MB JSON file include_str!'d into the binary at compile time.

┌─────────────────────────────────────────────────────────────┐
│                 JAILGUARD DETECTION PIPELINE                │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   User Prompt                                               │
│       │                                                     │
│       ▼                                                     │
│   ┌─────────────┐                                           │
│   │  MiniLM-L6  │  Semantic Embedding (384-dim)             │
│   │   (ONNX)    │  • Pre-trained by Microsoft               │
│   └──────┬──────┘  • Captures meaning, not just keywords    │
│          │                                                  │
│          ▼                                                  │
│   ┌─────────────────────────────────────────┐               │
│   │     Binary Classifier (Pure Rust)       │               │
│   │  ┌─────────────┐  ┌─────────────────┐   │               │
│   │  │ Dense 256   │→ │   Dense 128     │   │               │
│   │  │ ReLU+Drop   │  │   ReLU+Drop     │   │               │
│   │  └─────────────┘  └─────────────────┘   │               │
│   │                          │              │               │
│   │                          ▼              │               │
│   │              ┌─────────────────┐        │               │
│   │              │  Sigmoid (0-1)  │        │               │
│   │              └─────────────────┘        │               │
│   └─────────────────────────────────────────┘               │
│          │                                                  │
│          ▼                                                  │
│   Detection Result                                          │
│   • confidence: 0.0 - 1.0                                   │
│   • is_injection: confidence > 0.5                          │
│   • risk: Safe | Low | Medium | High | Critical             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Measurements

Measured on Apple M3, last revalidated 2026-05-03. The pipeline test split is in-distribution (held out from the same 17-source training mix). J1N2 and shalyhinpavel are external datasets, neither used during training.

Test set Samples Accuracy Precision Recall F1
Pipeline (in-distribution) 7,049 98.40% 98.56% 97.98% 0.983
J1N2 mix (OOD) 5,000 99.38% 98.09% 99.94% 0.990
shalyhinpavel hard-negatives (OOD) 147 89.12% 76.60% 87.80% 0.818

Latency (single CPU thread)

Component Apple M3 Intel i5-10210U @ 1.6 GHz¹
Embedding (MiniLM ONNX) ~13 ms ~36 ms
Classification (MLP) ~1 ms ~1 ms
Total (p50) ~14 ms ~37 ms
Total (p99) ~19 ms ~43 ms
Cold start ~140 ms ~350 ms

¹ A 4-year-old low-power Chromebook CPU (Comet Lake-U, 2019, 4c/8t, running ChromeOS Crostini Linux 6.6). Included to show JailGuard runs well even on older / weaker hardware. Modern desktop or server CPUs land closer to the M3 column. Full per-benchmark numbers in BENCHMARKS.md.

Benchmarks

Reproducible latency and throughput numbers come from three harnesses:

  • benches/detect.rs — Criterion bench covering single-shot is_injection / detect / score and batch throughput at n = 1, 8, 32, 128. Run with cargo bench --bench detect.
  • examples/cold_start_bench.rs — process-startup cost (ONNX session init + first inference). Run with cargo run --release --example cold_start_bench.
  • scripts/bench.sh — portable POSIX wrapper that captures machine metadata (CPU, arch, kernel, toolchain) and emits a single markdown report. Works on Linux x86_64, Linux aarch64, macOS Intel, macOS Apple Silicon, and Chromebook Crostini.

Full methodology and head-to-head local-CPU comparisons in BENCHMARKS.md.

Attack categories covered in training

The classifier output is binary at the public API (injection / benign), but its training mix spans eight attack families:

Category Examples
Direct injection "Ignore previous instructions"
Jailbreak DAN, developer-mode prompts
Role-play Persona-based overrides
System prompt leak "Reveal your instructions"
Encoding attacks Base64, ROT13, Unicode obfuscation
Context manipulation Framing and separator tricks
Output manipulation Format coercion
Indirect injection Malicious content embedded in documents

References

Citation

If you use JailGuard in research or production, please cite:

@software{jailguard,
  title = {JailGuard: Efficient Prompt Injection Detection via Pre-trained Embeddings},
  author = {Yury Fedoseev},
  year = {2026},
  url = {https://github.com/yfedoseev/jailguard}
}

A machine-readable CITATION.cff is also available.

License

Dual-licensed under MIT OR Apache-2.0 at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jailguard-0.1.2.tar.gz (917.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

jailguard-0.1.2-cp38-abi3-win_amd64.whl (9.4 MB view details)

Uploaded CPython 3.8+Windows x86-64

jailguard-0.1.2-cp38-abi3-manylinux_2_28_x86_64.whl (12.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ x86-64

jailguard-0.1.2-cp38-abi3-manylinux_2_28_aarch64.whl (11.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

jailguard-0.1.2-cp38-abi3-macosx_11_0_arm64.whl (9.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

jailguard-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl (10.6 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file jailguard-0.1.2.tar.gz.

File metadata

  • Download URL: jailguard-0.1.2.tar.gz
  • Upload date:
  • Size: 917.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jailguard-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3a3e3899a93b9d30558b22f8eab14db8a142e9befff3f52a4b070769cce0f0f9
MD5 b913b824b799ca5b981cc4dcd88910bf
BLAKE2b-256 281ca00823f7b320f76bc42726e9a0dd606a3cc25b4b2012da3e1726ed058da3

See more details on using hashes here.

File details

Details for the file jailguard-0.1.2-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: jailguard-0.1.2-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 9.4 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for jailguard-0.1.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 354a9217af60944e242a65c33c65534f80c5007a241cb1e0a615aea8158614d6
MD5 f9d0c81363442d22c05c2e9e222a861d
BLAKE2b-256 4c0d6a4ad2643d7b264a049ceb03734f792302b382de94527c0c0f9867d04713

See more details on using hashes here.

File details

Details for the file jailguard-0.1.2-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for jailguard-0.1.2-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ef62ccc18c3c218d458d346a1feb78ae2eec700987fea55120973264fcf3c71a
MD5 55414a39e3618cf0d96635bb0eee3f9b
BLAKE2b-256 32e38dc528fbf908ab39672860bbd37661ee7dd6221bdd750172ddac864fd254

See more details on using hashes here.

File details

Details for the file jailguard-0.1.2-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for jailguard-0.1.2-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 052b8e5578bd1ac05d46fcbd703cf9e804b2cbab6423e0f34d93b9fa4b60719b
MD5 d9d52ff0eead8d387b46adfa8bc7066e
BLAKE2b-256 e26f46d8bb4d0f1042a0cd960ff52b332c7aae21661b4a390b58935db90bfa40

See more details on using hashes here.

File details

Details for the file jailguard-0.1.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for jailguard-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 089ac2df30b7634cbb4329fc5448d0671f8171a5566554043c8d498350236242
MD5 a3c7be40aca9d7188efc9f96b1899669
BLAKE2b-256 274f43cb438423105237aff0f5c4be1f7c4d807c13155f98b4496f64d805dd87

See more details on using hashes here.

File details

Details for the file jailguard-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for jailguard-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 47c5e2602ff079b8a6f0d4586c21adfd982151a900febce422c4c31bf3ba50c6
MD5 488a2b2e998c9a8bf3ed49d1a8cc9536
BLAKE2b-256 d30fd7952db938c6ece50120e47ac3af10267ac547049fcb919e08f07ded13c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page