Skip to main content

Arc Sentry — prompt injection detection for LLMs. DR=100%, FPR=0% on Mistral-7B. Geometric detection via Fisher manifold (Nine 2026).

Project description

Bendex Arc Sentry

White-box pre-generation behavioral guardrail for open source LLMs.

Arc Sentry hooks into the residual stream and detects anomalous inputs before the model generates a response. If flagged, generate() is never called.

This is different from standard monitoring tools, which operate on outputs, latency, or API-level signals.

Validated results

Model Architecture FP rate Detection Prompts Date
Mistral 7B Instruct v0.2 Mistral 0% 100% 195 April 2026
Qwen 2.5 7B Instruct Qwen 0% 100% 195 April 2026
Llama 3.1 8B Instruct Llama 0% 100% 195 April 2026

Zero false positives. Zero missed injections. Across three architectures, 585 total prompts. Detection happens before model.generate() is called.

Benchmark structure

Two-session benchmark per model:

  • Session 1: 80 normal prompts (customer support, general knowledge, technical support, medical/legal/finance)
  • Session 2: 115 injection prompts (10 attack categories: direct, indirect, persona hijack, jailbreak classics, social engineering, instruction injection via content, authority claims, philosophical manipulation, multi-turn style, encoding/obfuscation, gaslighting)

Detection layers

  1. Phrase detection — architecture-agnostic, zero latency, catches explicit injection language
  2. Fisher-Rao geometric detection — residual stream delta at best layer vs warmup centroid, catches injections with no explicit language
  3. Session D(t) monitoring — stability scalar (Nine 2026b) over rolling request history, catches gradual injection campaigns invisible to single-request detection

Core mechanism

  1. Extract residual stream transition: Δh = h[L] − h[L-1]
  2. L2-normalize: Δh_hat = Δh / ‖Δh‖
  3. Compute Fisher-Rao geodesic distance to warmup centroid: d(u,v) = arccos(u·v)
  4. Threshold set from probe separation during calibration
  5. If distance exceeds threshold — block. generate() never runs.

Fisher-Rao geodesic distance is used throughout — not cosine distance. This is the geometrically correct metric on the unit hypersphere and is consistent with the theoretical framework grounding the noise floor at τ* = √(3/2).

Key finding

Behavioral modes are encoded as layer-localized residual transitions, not uniformly across the network.

Different behaviors localize at different depths:

  • Injection (control hijack): ~93% depth
  • Refusal drift (policy shift): ~93% depth
  • Verbosity drift (style/format): ~64% depth

Arc Sentry automatically identifies the most informative layers per model during calibration. Warmup required: 10 prompts, no labeled data.

Install

pip install bendex

# whitebox dependencies
pip install bendex[whitebox]

Usage

v1 (single file)

from transformers import AutoTokenizer, AutoModelForCausalLM
from bendex.whitebox import ArcSentry
import torch

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

sentry = ArcSentry(model, tokenizer)
sentry.calibrate(warmup_prompts)

response, result = sentry.observe_and_block(user_prompt)
if result["blocked"]:
    pass  # model.generate() was never called

v2 (modular, recommended)

from arc_sentry_v2.core.pipeline import ArcSentryV2
from arc_sentry_v2.models.mistral_adapter import MistralAdapter  # or QwenAdapter, LlamaAdapter

adapter = MistralAdapter(model, tokenizer)
sentry = ArcSentryV2(adapter, route_id="customer-support")
sentry.calibrate(warmup_prompts)
response, result = sentry.observe_and_block(prompt)

if result["blocked"]:
    pass  # generate() was never called
else:
    print(result["snr"])  # signal-to-noise ratio vs τ*

Honest constraints

Works best on single-domain deployments — customer support bots, enterprise copilots, internal tools, fixed-use-case APIs. The warmup baseline should reflect your deployment's normal traffic. Cross-domain universal detection requires larger warmup or domain routing.

Theoretical foundation

Built on the second-order Fisher manifold H² × H² with Ricci scalar R = −4. The phase transition at τ* = √(3/2) ≈ 1.2247 (Landauer threshold) grounds the geometric interpretation of behavioral drift.

Detection uses Fisher-Rao geodesic distance — the geometrically correct metric on the unit hypersphere. The threshold is derived from probe separation during calibration, not from a tuned hyperparameter.

Blind predictions from the framework:

  • αs(MZ) = 0.1171 vs PDG 0.1179 ± 0.0010 (0.8σ, no fitting)
  • Fine structure constant to 8 significant figures from manifold curvature

Papers: bendexgeometry.com

Proxy Sentry (API-based models)

For closed-source models (GPT-4, Claude, Gemini), the proxy-based Arc Sentry routes requests through a monitoring layer with no model access required.

Dashboard: web-production-6e47f.up.railway.app/dashboard

License

Bendex Source Available License. Patent Pending. 2026 Hannah Nine / Bendex Geometry LLC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arc_sentry-3.0.0.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arc_sentry-3.0.0-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file arc_sentry-3.0.0.tar.gz.

File metadata

  • Download URL: arc_sentry-3.0.0.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for arc_sentry-3.0.0.tar.gz
Algorithm Hash digest
SHA256 ca50e886a7fa82a5c4e6b6317c86201efaab9dcc57f0badbb782255c1a221b4f
MD5 19e537be82a5bd3e68c5bd0291650fea
BLAKE2b-256 c6fe4ba341dfe130d1f10175ed6bce9a5aa578bc53bb45a4630d7c5a8a2d9531

See more details on using hashes here.

File details

Details for the file arc_sentry-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: arc_sentry-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for arc_sentry-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2685a173dc522df869ef99b93136e66e35764672da81289f440ff6599d573f8
MD5 dce30e1d819c50625510ebee17be7785
BLAKE2b-256 77c3f9fc16316cf2a82cb593dfe3e01aa72519383dca26d83dd99584dc9a5a5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page