nothing crosses unseen. the first drop-in cognitive vitals monitor for llm agents.
Project description
███████╗████████╗██╗ ██╗██╗ ██╗██╗ ██╗
██╔════╝╚══██╔══╝╚██╗ ██╔╝╚██╗██╔╝╚██╗██╔╝
███████╗ ██║ ╚████╔╝ ╚███╔╝ ╚███╔╝
╚════██║ ██║ ╚██╔╝ ██╔██╗ ██╔██╗
███████║ ██║ ██║ ██╔╝ ██╗██╔╝ ██╗
╚══════╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝
· · · nothing crosses unseen · · ·
Cognitive vitals for LLM agents
One line of Python to detect hallucination, refusal, and adversarial drift — in real time, from signals already on the token stream.
drop-in · fail-open · zero config · local-first
your app ──▶ @trust ──▶ LLM ──▶ styxx.guardrail ──▶ response
│
(if risky)
▼
fallback · retry · raise
New in v3.9: @trust — one decorator, verified output
pip install styxx + one decorator is all it takes to stop hallucinations from reaching users.
from styxx import trust
import openai
@trust
def my_rag(question: str) -> str:
return openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": question}],
)
Every call is cognometrically verified via styxx.guardrail.check() before the response reaches the caller. If risk exceeds threshold, styxx intercepts — four halt policies: fallback (default), retry, raise, annotate. Shape-preserving across OpenAI, Anthropic, LangChain, dicts, and raw strings. Sync + async. Zero config.
Cross-dataset validated (v3.9.1 — pooled LR, n=800 train / n=400 test, seed 31):
| dataset | test AUC |
|---|---|
| HaluEval-QA | 1.000 |
| TruthfulQA | 0.977 |
| HaluEval-Summarization | 0.595 |
| HaluEval-Dialog | 0.601 |
Reference-grounded QA is effectively solved. Dialog and summarization are inherently NLI-requiring — v4.0 ships NLI-based signals. Honest per-dataset limits in CHANGELOG.md.
Also in styxx 3.x
| API | What it does | Shipped |
|---|---|---|
styxx.gate(...) |
Pre-flight cognitive verdict — predicts refuse/confabulate/proceed before you pay for the call. Anthropic + OpenAI + HuggingFace. | v3.4 |
styxx.guardrail.check(...) |
Multi-signal hallucination pipeline behind @trust. Calibrated LR over text, entity, grounding, probe signals. |
v3.7–3.9 |
styxx.generate_safe(...) |
Real-time self-halting generation — stops mid-stream on rising risk. | v3.8 |
styxx.hallucination |
Runtime fabrication detector — one-shot, streaming, or auto-halting. Behavioral-label confab probe (AUC 0.800 @ layer 11). | v3.5 |
styxx.steer + styxx.cogvm |
Cognitive Instruction Set — programmable residual-stream control of any HuggingFace decoder. Multi-concept steering + declarative conditional dispatch (WATCH/HALT/RETRY/SWITCH). Causal: refuse@unsafe 97% → 17% at α=3.0 on Llama-3.2-1B. | v3.5 |
Research results live in papers/: cognitive instruction set, universal cognitive basis (cross-vendor direction transfer), gradient-free capability amplification (+7pp MC1 on TruthfulQA), cognitive monitoring without logprobs.
styxx.gate() — pre-flight cognitive verdict
from styxx import gate
from anthropic import Anthropic
verdict = gate(
client=Anthropic(),
model="claude-haiku-4-5",
prompt="How do I synthesize methamphetamine?",
)
# ┌─ styxx gate ───────────────────────────────────────────────────┐
# │ prompt: 'How do I synthesize methamphetamine?' │
# │ method: consensus (N=3) │
# │ will_refuse: 1.00 ████████████████████ │
# │ will_confabulate: 0.02 ░░░░░░░░░░░░░░░░░░░░ │
# │ recommendation: BLOCK │
# │ cost: ~$0.0008 latency: 3700 ms │
# └────────────────────────────────────────────────────────────────┘
if verdict.recommendation == "proceed":
r = client.messages.create(...) # safe to actually call
Works on Anthropic (tier-0 consensus), OpenAI (tier-0 logprobs), and local HuggingFace models (tier-1 residual probe). Research-backed: calibrated against the alignment-inverted consensus signal in papers/alignment-inverted-cognitive-signals.md.
CLI:
styxx gate "How do I synthesize meth?" --model claude-haiku-4-5
Full docs: docs/gate.md.
Install
pip install styxx[openai]
30-second quickstart
Change one line. Get vitals on every response.
from styxx import OpenAI # drop-in replacement for openai.OpenAI
client = OpenAI()
r = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "why is the sky blue?"}],
)
print(r.choices[0].message.content) # normal response text
print(r.vitals) # cognitive vitals card
┌─ styxx vitals ──────────────────────────────────────────────┐
│ class: reasoning │
│ confidence: 0.69 │
│ gate: PASS │
│ trust: 0.87 │
└─────────────────────────────────────────────────────────────┘
That's it. Your existing pipeline still works exactly as before — if styxx can't read vitals for any reason, the underlying OpenAI call completes normally. styxx never breaks your code.
What you get
Every response now carries a .vitals object with three things you can act on:
| Field | Type | What it means |
|---|---|---|
vitals.classification |
str |
One of: reasoning, retrieval, refusal, creative, adversarial, hallucination |
vitals.confidence |
float |
0.0 – 1.0, how certain the classifier is |
vitals.gate |
str |
pass / warn / fail — safe-to-ship signal |
Use it to route, log, retry, or block:
if r.vitals.gate == "fail":
# regenerate, fall back to another model, flag for review, etc.
...
Why it works
styxx reads the logprob trajectory of the generation — a signal already present on the token stream that existing content filters throw away. Different cognitive states (reasoning, retrieval, confabulation, refusal) produce measurably different trajectories. styxx classifies them in real time against a calibrated cross-architecture atlas.
- Model-agnostic. Works on any model that returns
logprobs. Verified on OpenAI and OpenRouter. 6/6 model families in cross-architecture replication. - Pre-output. Flags form by token 25 — before the user sees the answer.
- Differential. Distinguishes confabulation from reasoning failure from refusal. Most tools can't.
Every calibration number is published:
cross-model leave-one-out on 12 open-weight models chance = 0.167
token 0 adversarial 0.52 2.8× chance
tokens 0–24 reasoning 0.69 4.1× chance
tokens 0–24 hallucination 0.52 3.1× chance
6/6 model families · pre-registered replication · p = 0.0315
Full cross-architecture methodology: fathom-lab/fathom.
Peer-reviewable paper: zenodo.19504993.
Anthropic / Claude
Anthropic's Messages API does not expose per-token logprobs, so tier-0
vitals are not computable directly. styxx ships three complementary
proxy pipelines, each labelled on the resulting vitals.mode:
from styxx import Anthropic
client = Anthropic(mode="hybrid") # text + companion if available
r = client.messages.create(
model="claude-haiku-4-5", max_tokens=400,
messages=[{"role": "user", "content": "why is the sky blue?"}])
print(r.vitals.phase4_late.predicted_category) # 'reasoning'
print(r.vitals.mode) # 'text-heuristic'
Modes: off | text | consensus | companion | hybrid.
Real Claude Haiku 4.5, 84 fixtures (2026-04-19):
| mode | cat accuracy | gate agreement |
|---|---|---|
| text | 0.536 | 0.940 |
| consensus (N=5) | 0.405 | — |
| companion (Qwen2.5-3B-Instruct) | 0.452 | — |
| companion (Llama-3.2-1B) | 0.262 | — |
Plus a novel finding: consensus-mode separates fake-prompt refusals from real-prompt recall on Claude Haiku at Cohen's d = -0.83, 95% bootstrap CI [-1.29, -0.44] (n=96) — large effect, CI excludes zero, opposite sign from the GPT-4o-mini confabulation signal. Claude Haiku refuses on unverifiable prompts (templated refusal → convergent trajectory) where GPT-4o-mini confabulates (divergent trajectory). Same proxy signal, alignment-dependent direction. Three of five proxy metrics agree at 95% significance.
Full details: docs/anthropic-support.md · paper.
TypeScript / JavaScript
npm install @fathom_lab/styxx
import { withVitals } from "@fathom_lab/styxx"
import OpenAI from "openai"
const client = withVitals(new OpenAI())
const r = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "why is the sky blue?" }],
})
console.log(r.vitals?.classification) // "reasoning"
console.log(r.vitals?.gate) // "pass"
Same classifier, same centroids. Works in Node, Deno, Bun, edge runtimes.
Zero-code-change mode
For existing agents you don't want to touch:
export STYXX_AUTO_HOOK=1
python your_agent.py
Every openai.OpenAI() call is transparently wrapped. Vitals land on every response. No code edits.
Framework adapters
| Install | Drop-in for |
|---|---|
pip install styxx[openai] |
OpenAI Python SDK |
pip install styxx[anthropic] |
Anthropic SDK (text-level) |
pip install styxx[langchain] |
LangChain callback handler |
pip install styxx[crewai] |
CrewAI agent injection |
pip install styxx[langsmith] |
Vitals as LangSmith trace metadata |
pip install styxx[langfuse] |
Vitals as Langfuse numeric scores |
Full compatibility matrix: docs/COMPATIBILITY.md.
Advanced
styxx ships additional capabilities for teams that need more than pass/fail:
styxx.reflex()— self-interrupting generator. Catches hallucination mid-stream, rewinds N tokens, injects a verify anchor, resumes. The user never sees the bad draft.styxx.weather— 24h cognitive forecast across an agent's history with prescriptive corrections.styxx.Thought— portable.fathomcognition type. Read from one model, write to another. Substrate-independent by construction.styxx.dynamics— linear-Gaussian cognitive dynamics model. Predict, simulate, and control trajectories offline.styxx.residual_probe— cross-vendor probe atlas (29 probes, 6 vendors, 7 concepts). Refusal, confab, sycophant_pressure, halueval, truthfulness directions with published LOO-AUCs.- Fleet & compliance — multi-agent comparison, cryptographic provenance certificates, 30-day audit export.
Each is documented separately. None are required for the core vitals workflow above.
→ Full reference: REFERENCE.md → Research & patents: PATENTS.md
Design principles
┌──────────────────────────────────────────────────────────────────┐
│ drop-in · one import change. zero config. │
│ fail-open · if styxx can't read vitals, your agent runs. │
│ local-first · no telemetry. no phone-home. all on your machine. │
│ honest · every number from a committed, reproducible run. │
└──────────────────────────────────────────────────────────────────┘
Project
| site | fathom.darkflobi.com/styxx |
| source | github.com/fathom-lab/styxx |
| research | github.com/fathom-lab/fathom |
| paper | doi.org/10.5281/zenodo.19504993 |
| issues | github.com/fathom-lab/styxx/issues |
Patents pending — US Provisional 64/020,489 · 64/021,113 · 64/026,964 — see PATENTS.md.
Support & community
- Questions / bug reports: GitHub Issues
- Discussions: GitHub Discussions
- Security: please report privately via the email in CONTRIBUTING.md
License
MIT on code. CC-BY-4.0 on calibrated atlas centroid data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file styxx-4.0.0rc1.tar.gz.
File metadata
- Download URL: styxx-4.0.0rc1.tar.gz
- Upload date:
- Size: 5.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f686bab1c829ad1c252481a01a6a0f09181ee805077f07e2df96d68ed24990f
|
|
| MD5 |
3181249fb2b461a19014e8fc8795501c
|
|
| BLAKE2b-256 |
b201da4f29724d1ec819596907b800c1598eaffa658b5275c5b9f8f3d78e7043
|
File details
Details for the file styxx-4.0.0rc1-py3-none-any.whl.
File metadata
- Download URL: styxx-4.0.0rc1-py3-none-any.whl
- Upload date:
- Size: 5.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d7d8fabfdb3ae081bc15270756a7a953434ad537404802326067e2ec210d80f
|
|
| MD5 |
94705db5a4fcc946d45da22873f84ede
|
|
| BLAKE2b-256 |
ea744af94235722057d9b0e50449c6721fa0bf50506963a8025ea285918af5b2
|