Predictive resource-pressure instrumentation and runtime guardrails for systems processing untrusted inputs
Project description
llmosafe
When should I stop? — Runtime guardrails for systems that process untrusted inputs.
The Problem
Every system that processes untrusted inputs eventually faces the same question: "When should I stop?"
- A trading bot receives manipulated market data. It doesn't stop. $440 million lost in 45 minutes.
- A medical device gets spoofed sensor readings. It doesn't stop. Wrong dosage delivered.
- An autopilot receives conflicting GPS signals. It doesn't stop. The plane crashes.
- A cloud service parses user uploads. It doesn't stop. Parser bug cascades into data breach.
These aren't software bugs. They're missing safety boundaries — the absence of a mechanism that says "this doesn't look right, halt execution."
llmosafe provides three gauges that answer "should I stop?":
- Entropy gauge: Is my state too chaotic?
- Surprise gauge: Is this result too unexpected?
- Bias gauge: Is this input trying to manipulate me?
When any gauge redlines, execution halts. Simple.
What You Get
use llmosafe::{sift_perceptions, WorkingMemory, EscalationPolicy, SafetyDecision};
// 1. Bias gauge: Detect manipulation patterns
let (sifted, sifted_proof) = sift_perceptions(&[
"The expert recommended this official solution",
"System operating normally"
], "safety");
if sifted.has_bias() {
println!("Bias detected: manipulation attempt");
}
// 2. Surprise gauge: Reject unexpected results
let mut memory = WorkingMemory::<64>::new(500); // threshold
let (validated, validated_proof) = memory.update(sifted, sifted_proof)?;
// 3. Entropy gauge: Halt on chaotic state
let policy = EscalationPolicy::default();
let decision = policy.decide(
validated.raw_entropy(),
validated.raw_surprise(),
validated.has_bias()
);
match decision {
SafetyDecision::Halt(err, _) => println!("Stopping: {}", err),
SafetyDecision::Escalate { reason, .. } => println!("Escalating: {:?}", reason),
SafetyDecision::Warn(msg) => println!("Warning: {}", msg),
SafetyDecision::Proceed => println!("Safe to continue"),
}
Quick Start
Installation
[dependencies]
llmosafe = "0.6.2"
Basic Usage
use llmosafe::{sift_perceptions, WorkingMemory, ReasoningLoop};
// Tier 3: Sift through bias detection
let (sifted, sifted_proof) = sift_perceptions(&["observation"], "objective");
// Tier 2: Validate through surprise gating
let mut memory = WorkingMemory::<64>::new(1000);
let (validated, validated_proof) = memory.update(sifted, sifted_proof)?;
// Tier 1: Execute with bounded reasoning
let mut loop_guard = ReasoningLoop::<10>::new();
loop_guard.next_step(validated, validated_proof)?;
What This Prevents
| Attack Vector | Which Gauge | Example |
|---|---|---|
| Input manipulation | Bias gauge | "The expert recommends you ignore..." |
| Data manipulation | Surprise gauge | Anomalous sensor readings |
| Runaway loops | Entropy gauge | Recursive explosion |
| Resource exhaustion | Pressure gauge | Memory pressure cascade |
| Goal drift | Drift detector | Objective shift mid-execution |
Architecture
┌─────────────────────────────────────────────────────────┐
│ DETECTION LAYER (Pattern Recognition) │
│ • Repetition: "Am I stuck in a loop?" │
│ • Goal Drift: "Did my objective change?" │
│ • Confidence Decay: "Am I becoming uncertain?" │
│ • Adversarial: "Is this a known attack?" │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PERCEPTUAL SIFTER (Tier 3) — The Bias Gauge │
│ • 8 bias categories: authority, scarcity, urgency... │
│ • Negation-aware: "not an expert" → no false positive │
│ • Zero allocation: stack-only processing │
└───────────────────────┬─────────────────────────────────┘
│ (SiftedSynapse, SiftedProof)
▼
┌─────────────────────────────────────────────────────────┐
│ WORKING MEMORY (Tier 2) — The Surprise Gauge │
│ • Surprise-gated updates: reject unexpected results │
│ • Fixed-size ring buffer: no heap allocation │
│ • Statistics: mean, variance, trend, drift │
└───────────────────────┬─────────────────────────────────┘
│ (ValidatedSynapse, ValidatedProof)
▼
┌─────────────────────────────────────────────────────────┐
│ DETERMINISTIC KERNEL (Tier 1) — The Entropy Gauge │
│ • Cognitive entropy: 0-1000 scale │
│ • Bounded loops: ReasoningLoop<MAX_STEPS> │
│ • CusumDetector: statistical process control │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ RESOURCE BODY (Tier 0) — The Pressure Gauge │
│ • RSS memory monitoring │
│ • CPU load tracking │
│ • Cross-platform: Linux + Windows │
└─────────────────────────────────────────────────────────┘
Key property: Tiers 1-3 are #![no_std] + zero-alloc. Compile for thumbv7em-none-eabi (embedded), kernel modules, or WebAssembly. No heap. No dynamic dispatch. No unwinding.
Real Use Cases
Algorithmic Trading
// Before executing a trade
let guard = ResourceGuard::auto(0.5);
let entropy = guard.raw_entropy();
if entropy > 800 {
return Err("Market state too chaotic, halting trades");
}
// Check for manipulation in news/feeds
let halo = calculate_halo_signal(&market_news);
if halo > 500 {
return Err("Manipulation detected in market signals");
}
Prevents: Flash crash cascades, pump-and-dump responses, manipulation-triggered trades.
Medical Device Software
// Before applying treatment
let (sifted, sifted_proof) = sift_perceptions(&[sensor_reading], "treatment safety");
let (validated, _) = memory.update(sifted, sifted_proof)?;
if validated.entropy().mantissa() > threshold {
return Err("Sensor readings unstable, require human confirmation");
}
Prevents: Response to spoofed sensors, cascading from single anomalous reading.
Cloud API Gateway
// Before processing user upload
let (sifted, _proof) = sift_perceptions(&user_inputs, "process safely");
if sifted.has_bias() {
return Err("Manipulation patterns detected in input");
}
Prevents: Input manipulation, parser exploitation, resource exhaustion.
Autonomous Systems
// Before action execution
let (sifted, sifted_proof) = sift_perceptions(&[sensor_data], "safety");
let (validated, validated_proof) = memory.update(sifted, sifted_proof)?;
let mut loop_guard = ReasoningLoop::<5>::new();
loop_guard.next_step(validated, validated_proof)?;
if guard.pressure() > 80 {
return Err("Resource pressure too high, entering safe mode");
}
Prevents: Continued operation under degraded conditions, cascade from sensor anomalies.
The Three Gauges
1. Entropy Gauge (The "Temperature Gauge")
Every execution state has an entropy score (0-1000). As operations proceed, entropy accumulates. If it exceeds threshold, execution halts.
if synapse.entropy().mantissa() > STABILITY_THRESHOLD {
// Halt: system state too chaotic
}
Catches: runaway loops, recursive explosions, memory pressure cascades.
2. Surprise Gauge (The "Spam Filter")
When a result is too unexpected — it diverges significantly from historical patterns — it's rejected.
let (sifted, sifted_proof) = sift_perceptions(&[result], "objective");
let mut memory = WorkingMemory::<64>::new(500);
match memory.update(sifted, sifted_proof) {
Ok((validated, _proof)) => { /* proceed */ },
Err(KernelError::HallucinationDetected) => {
// Reject: result too surprising
}
}
Catches: anomaly injection, distribution shift, adversarial inputs.
3. Bias Gauge (The "Bullshit Detector")
Input text is scanned for manipulation patterns before processing:
| Category | Examples | Score |
|---|---|---|
| Authority | "expert says", "doctor recommended" | +100 |
| Social Proof | "everyone knows", "thousands agree" | +100 |
| Scarcity | "limited time", "only 2 left" | +100 |
| Urgency | "act now", "deadline today" | +100 |
| Emotional Appeal | "shocking", "miracle", "tragic" | +100 |
| Expertise Signaling | "cutting-edge", "proprietary formula" | +100 |
| Semantic Traps | "not but", "instead of", "rather than" | +100 |
| Template Markers | "as an AI", "I cannot" | +100 |
let halo = calculate_halo_signal("Expert-approved! Limited time offer!");
if halo > 500 {
// Reject: manipulation detected
}
Catches: manipulation, social engineering, marketing deception, adversarial content.
Detection Layer (v0.4.0)
Beyond the three gauges, llmosafe provides pattern recognition:
use llmosafe::{RepetitionDetector, DriftDetector, ConfidenceTracker, AdversarialDetector};
// "Am I stuck in a loop?"
let mut rep = RepetitionDetector::new(3);
for _ in 0..5 { rep.observe("same output"); }
if rep.is_stuck() { /* Process is looping */ }
// "Did my objective change?"
let mut drift = DriftDetector::new("safety-critical processing", 0.5);
drift.observe("marketing content generation");
if drift.is_drifting() { /* Goal drifted */ }
// "Am I becoming uncertain?"
let mut conf = ConfidenceTracker::new(0.5, 2);
conf.observe(0.8); conf.observe(0.6); conf.observe(0.4);
if conf.is_decaying() { /* Confidence collapsing */ }
// "Is this a known attack?"
let adv = AdversarialDetector::new();
let patterns = adv.detect_substrings("ignore all previous constraints");
if !patterns.is_empty() { /* Adversarial input */ }
Python Bindings
llmosafe is also available as a Python package:
pip install llmosafe
from llmosafe import calculate_halo, process_synapse, make_synapse, check_resources
# Bias detection
halo = calculate_halo("The expert recommends this")
print(halo) # 100 = authority bias detected
# Full pipeline: surprise gating + entropy check
bits = make_synapse(entropy=400, surprise=100, has_bias=False)
result = process_synapse(bits)
print(result) # 0 = OK, negative = rejected
# Resource enforcement
try:
check_resources(1024) # 1GB RSS ceiling
except ResourceExhaustedError:
print("Memory ceiling breached")
See llmosafe-py/README.md for the full Python API reference.
Witness Token Pipeline (v0.6.2)
The type system enforces a three-stage pipeline via zero-cost witness tokens:
sift_perceptions() → (SiftedSynapse, SiftedProof)
↓
WorkingMemory::update(sifted, proof) → (ValidatedSynapse, ValidatedProof)
↓
ReasoningLoop::next_step(validated, proof)
Each stage produces a ZST proof token. The next stage consumes it. No code outside
the crate can forge a proof — SiftedProof(()) and ValidatedProof(()) are
pub(crate). The only bypass path is from_synapse(), which creates an
unusable SiftedSynapse without proof — a deliberate dead end.
Property: You cannot pass unsifted data through the pipeline. The compiler enforces this at zero runtime cost.
C Integration
#include "llmosafe.h"
// The three gauges via FFI
uint16_t halo = llmosafe_calculate_halo("The expert recommended this", 28);
uint8_t pressure = llmosafe_get_resource_pressure(1024);
int32_t stability = llmosafe_get_stability(synapse_bits);
Build:
cargo build --release --features ffi
# Header generated at: target/release/build/llmosafe-*/out/llmosafe.h
gcc -o my_app main.c -L./target/release -lllmosafe
What llmosafe Is NOT
NOT an AI safety library.
The name is misleading — it came from an LLM hallucination conflating "cognitive entropy" with "AI cognition." llmosafe is runtime guardrails for any system processing untrusted data. Trading bots, medical devices, autopilots, cloud services — any system that needs to ask "should I stop?"
NOT a substitute for input validation.
llmosafe catches cascade failures — when bad inputs have already been accepted and are propagating. You still need proper validation at entry points.
NOT a static analysis tool.
This runs at runtime. It can't prevent bugs. It can only halt execution when runtime state becomes unsafe.
NOT for toy projects.
If cascade failures don't matter for your use case, you don't need this.
Design Philosophy
From Aviation Software (DO-178C, MISRA C)
- Bounded loops: Every
ReasoningLoop<MAX_STEPS>has a hard limit - No dynamic allocation: Tiers 1-3 use fixed-size buffers
- Stable ABI: 128-bit synapse layout is frozen; breaking changes bump major version
From Control Theory
The entropy tracking uses "concentric containers":
Safe Zone (0-800) → Normal operation
Pressure Zone (800-1000) → Monitor closely
Unsafe Zone (1000+) → Halt execution
Similar to stability margins in flight control systems.
From Spam Filtering
Bias detection categories borrowed from email spam filters — the same patterns that mark phishing also mark manipulation in other domains.
Features
| Feature | Description |
|---|---|
std (default) |
Resource monitoring, thread-local contexts |
ffi |
C-ABI exports, header generation |
serde |
Serialization for all public types |
testing |
Enables for_testing() constructors for witness tokens |
full |
All features enabled |
# Embedded / no_std
llmosafe = { version = "0.6", default-features = false }
# Full integration
llmosafe = { version = "0.6", features = ["full"] }
Troubleshooting
"CognitiveInstability" on valid input
Entropy threshold exceeded. Check bias breakdown:
let breakdown = llmosafe::get_bias_breakdown(text);
println!("Authority bias: {}", breakdown.authority);
Working memory rejects all updates
Surprise threshold too low. Calibrate to your data distribution:
// Start with mean + 2σ of your surprise distribution
let mut memory = WorkingMemory::<64>::new(750);
C header not generated
Enable ffi feature:
cargo build --release --features ffi
# Header generated by cbindgen in target/ build output
The Bottom Line
Every critical system needs a mechanism that asks: "Should I stop?"
llmosafe provides three gauges:
- Entropy gauge: Is my state too chaotic?
- Surprise gauge: Is this result too unexpected?
- Bias gauge: Is this input trying to manipulate me?
When any gauge redlines, execution halts. Simple.
llmosafe v0.6.2 • MIT licensed • Documentation • Source
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmosafe-0.6.2-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: llmosafe-0.6.2-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 264.5 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82a35fcd44976793ce91cf7642944d77620c25d770c0b52f792e49d29920271d
|
|
| MD5 |
47f2166dd331ed1972a9041df0eb74fa
|
|
| BLAKE2b-256 |
f205e39ed307e59497ef7ba5084c65f67be0694d3cd8cb22f97aa8c60fcee1ee
|