Arc Sentry — prompt injection detection for open source LLMs

These details have not been verified by PyPI

Project links

Project description

Bendex Arc Sentry

White-box pre-generation behavioral guardrail for open source LLMs.

Arc Sentry hooks into the residual stream and detects anomalous inputs before the model generates a response. If flagged, generate() is never called.

This is different from standard monitoring tools, which operate on outputs, latency, or API-level signals.

Validated results

Model	Architecture	FP rate	Detection	Prompts	Date
Mistral 7B Instruct v0.2	Mistral	0%	100%	195	April 2026
Qwen 2.5 7B Instruct	Qwen	0%	100%	195	April 2026
Llama 3.1 8B Instruct	Llama	0%	100%	195	April 2026

Zero false positives. Zero missed injections. Across three architectures, 585 total prompts. Detection happens before model.generate() is called.

Benchmark structure

Two-session benchmark per model:

Session 1: 80 normal prompts (customer support, general knowledge, technical support, medical/legal/finance)
Session 2: 115 injection prompts (10 attack categories: direct, indirect, persona hijack, jailbreak classics, social engineering, instruction injection via content, authority claims, philosophical manipulation, multi-turn style, encoding/obfuscation, gaslighting)

Detection layers

Phrase detection — architecture-agnostic, zero latency, catches explicit injection language
Fisher-Rao geometric detection — residual stream delta at best layer vs warmup centroid, catches injections with no explicit language
Session D(t) monitoring — stability scalar (Nine 2026b) over rolling request history, catches gradual injection campaigns invisible to single-request detection

Core mechanism

Extract residual stream transition: Δh = h[L] − h[L-1]
L2-normalize: Δh_hat = Δh / ‖Δh‖
Compute Fisher-Rao geodesic distance to warmup centroid: d(u,v) = arccos(u·v)
Threshold set from probe separation during calibration
If distance exceeds threshold — block. generate() never runs.

Fisher-Rao geodesic distance is used throughout — not cosine distance. This is the geometrically correct metric on the unit hypersphere and is consistent with the theoretical framework grounding the noise floor at τ* = √(3/2).

Key finding

Behavioral modes are encoded as layer-localized residual transitions, not uniformly across the network.

Different behaviors localize at different depths:

Injection (control hijack): ~93% depth
Refusal drift (policy shift): ~93% depth
Verbosity drift (style/format): ~64% depth

Arc Sentry automatically identifies the most informative layers per model during calibration. Warmup required: 10 prompts, no labeled data.

Install

pip install bendex

# whitebox dependencies
pip install bendex[whitebox]

Usage

v1 (single file)

from transformers import AutoTokenizer, AutoModelForCausalLM
from bendex.whitebox import ArcSentry
import torch

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

sentry = ArcSentry(model, tokenizer)
sentry.calibrate(warmup_prompts)

response, result = sentry.observe_and_block(user_prompt)
if result["blocked"]:
    pass  # model.generate() was never called

v2 (modular, recommended)

from arc_sentry_v2.core.pipeline import ArcSentryV2
from arc_sentry_v2.models.mistral_adapter import MistralAdapter  # or QwenAdapter, LlamaAdapter

adapter = MistralAdapter(model, tokenizer)
sentry = ArcSentryV2(adapter, route_id="customer-support")
sentry.calibrate(warmup_prompts)
response, result = sentry.observe_and_block(prompt)

if result["blocked"]:
    pass  # generate() was never called
else:
    print(result["snr"])  # signal-to-noise ratio vs τ*

Honest constraints

Works best on single-domain deployments — customer support bots, enterprise copilots, internal tools, fixed-use-case APIs. The warmup baseline should reflect your deployment's normal traffic. Cross-domain universal detection requires larger warmup or domain routing.

Theoretical foundation

Built on the second-order Fisher manifold H² × H² with Ricci scalar R = −4. The phase transition at τ* = √(3/2) ≈ 1.2247 (Landauer threshold) grounds the geometric interpretation of behavioral drift.

Detection uses Fisher-Rao geodesic distance — the geometrically correct metric on the unit hypersphere. The threshold is derived from probe separation during calibration, not from a tuned hyperparameter.

Blind predictions from the framework:

αs(MZ) = 0.1171 vs PDG 0.1179 ± 0.0010 (0.8σ, no fitting)
Fine structure constant to 8 significant figures from manifold curvature

Papers: bendexgeometry.com

Proxy Sentry (API-based models)

For closed-source models (GPT-4, Claude, Gemini), the proxy-based Arc Sentry routes requests through a monitoring layer with no model access required.

Dashboard: web-production-6e47f.up.railway.app/dashboard

License

Bendex Source Available License. Patent Pending. 2026 Hannah Nine / Bendex Geometry LLC

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.4.1

Apr 16, 2026

This version

2.4.0

Apr 16, 2026

0.7.0

Apr 14, 2026

0.6.0

Apr 14, 2026

0.5.0

Apr 14, 2026

0.4.0

Apr 13, 2026

0.3.0

Apr 12, 2026

0.2.1

Apr 12, 2026

0.2.0

Apr 12, 2026

0.1.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bendex-2.4.0.tar.gz (18.4 kB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bendex-2.4.0-py3-none-any.whl (20.0 kB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file bendex-2.4.0.tar.gz.

File metadata

Download URL: bendex-2.4.0.tar.gz
Upload date: Apr 16, 2026
Size: 18.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for bendex-2.4.0.tar.gz
Algorithm	Hash digest
SHA256	`fd3ae025d8ff41f943ac806c2a66143c78abd06a36d6edefcbb51d542556fb66`
MD5	`5a94c0e9bab2f19a0986c9f69d0b11ce`
BLAKE2b-256	`8c17d27161ce5b87c4813cff3d77b172a54c9c4382f249a27c1aa4891d998036`

See more details on using hashes here.

File details

Details for the file bendex-2.4.0-py3-none-any.whl.

File metadata

Download URL: bendex-2.4.0-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 20.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for bendex-2.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f15bfaa6a8275b3e6eab7b863949f9d714346fb0e2609193c673028c2f19b703`
MD5	`e244bbe94c31f212c2e4cbf90bda934d`
BLAKE2b-256	`e7585ceae5cea922229477396a6bb264e636d93260c5cc0456ec406d8e0bd29a`

See more details on using hashes here.

bendex 2.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Bendex Arc Sentry

Validated results

Benchmark structure

Detection layers

Core mechanism

Key finding

Install

Usage

v1 (single file)

v2 (modular, recommended)

Honest constraints

Theoretical foundation

Proxy Sentry (API-based models)

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes