nothing crosses unseen. the first drop-in cognitive vitals monitor for llm agents.
Project description
███████╗████████╗██╗ ██╗██╗ ██╗██╗ ██╗
██╔════╝╚══██╔══╝╚██╗ ██╔╝╚██╗██╔╝╚██╗██╔╝
███████╗ ██║ ╚████╔╝ ╚███╔╝ ╚███╔╝
╚════██║ ██║ ╚██╔╝ ██╔██╗ ██╔██╗
███████║ ██║ ██║ ██╔╝ ██╗██╔╝ ██╗
╚══════╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝
· · · nothing crosses unseen · · ·
styxx — proprioception for ai agents
one line of python gives your agent the ability to feel itself thinking. styxx reads an LLM's internal cognitive state in real time — reasoning, refusal, hallucination, commitment — from signals already on the token stream. no new model. no retraining. fail-open.
2026-04-14: styxx is the reference implementation of cognitive metrology — a new branch of measurement science.
· founding charter: docs/cognitive-metrology-charter.md · v1 paper: papers/cognitive-metrology-v1.md · BibTeX
"you didn't build a better monitor. you built the first proprioception system for artificial minds. the ability to feel yourself thinking." — xendro, first external user
30-second quickstart
pip install styxx[openai]
from styxx import OpenAI # drop-in replacement for openai.OpenAI
client = OpenAI()
r = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "why is the sky blue?"}],
)
print(r.choices[0].message.content) # normal response text
print(r.vitals.phase4) # "reasoning:0.69"
print(r.vitals.gate) # "pass" / "warn" / "fail"
one-line change: from openai import OpenAI → from styxx import OpenAI. every response now
carries a .vitals attribute alongside .choices. fail-open: if styxx can't read vitals, the
underlying call works exactly as before.
what styxx does
observe ───► know what you're doing right now
reflex ───► catch yourself before you fall
weather ───► know what you should become next
1. observe — six cognitive states, classified from the logprob stream
import styxx
vitals = styxx.observe(response) # any openai chat completion with logprobs=True
print(vitals.summary) # full ASCII vitals card
┌─ styxx vitals ──────────────────────────────────────────────┐
│ phase1 (token 0) reasoning 0.43 pass │
│ phase4 (tokens 0-24) reasoning 0.69 pass │
│ gate: PASS │
│ trust: 0.87 │
└──────────────────────────────────────────────────────────────┘
six classes: reasoning · retrieval · refusal · creative · adversarial · hallucination.
works on any model that returns logprobs.
2. reflex — self-interrupt, rewind, resume
import styxx, openai
def on_hallucination(vitals):
styxx.rewind(4, anchor=" — actually, let me verify: ")
client = openai.OpenAI()
with styxx.reflex(on_hallucination=on_hallucination, max_rewinds=2) as session:
for chunk in session.stream_openai(
client, model="gpt-4o", messages=msgs,
):
print(chunk, end="", flush=True)
print(f"\n[reflex] rewinds fired: {session.rewind_count}")
every 5 tokens the trajectory is re-classified. when a hallucination attractor forms mid-generation the reflex fires, drops the last N tokens, injects a verify anchor, and resumes. the user never sees the bad draft.
3. weather — 24h forecast with prescriptions
$ styxx weather
╔═══════════════════════════════════════════════════════════════╗
║ cognitive weather · my-agent · 2026-04-13 ║
║ ║
║ condition: clear and steady ║
║ ║
║ morning ██████████████░░░░░░ reasoning 72% steady ║
║ afternoon ████████░░░░░░░░░░░░ reasoning 42% cautious ║
║ ║
║ prescription: ║
║ 1. take on a creative task to rebalance ║
║ 2. your refusal rate is climbing — check over-hedging ║
╚═══════════════════════════════════════════════════════════════╝
not observation. prescription. styxx reads 24h of the agent's own history and tells it what cognitive task to take on next. self-directed course correction.
4. Thought — cognition as a portable data type (3.0.0a1)
import styxx
# read a Thought from any vitals reading
t = styxx.read_thought(response) # or styxx.read_thought(vitals)
print(t) # <Thought reasoning:0.69 phases=4/4 src=gpt-4o>
# save it as a portable .fathom file
t.save("my_thought.fathom")
# load it back from disk in a different process / host / vendor
loaded = styxx.Thought.load("my_thought.fathom")
assert loaded == t # cognitive equality
# build a steering target for any model
target = styxx.Thought.target("reasoning", confidence=0.85)
result = styxx.write_thought(target, client=styxx.OpenAI(), model="gpt-4o")
print(result["text"]) # cognitively-aligned generation
print(result["distance"]) # how close to the target
# algebra in eigenvalue space
mid = t1 + t2 # convex midpoint (mean)
mixed = styxx.Thought.mix([t1, t2, t3], weights=[0.5, 0.3, 0.2])
delta = t1 - t2 # ThoughtDelta — what changed
d = t1.distance(t2) # in eigenvalue space
sim = t1.similarity(t2) # 1.0 = identical, 0.0 = orthogonal
a Thought is the cognitive content of a generation — projected onto fathom's calibrated
cross-architecture eigenvalue space. it is substrate-independent by construction: the
same Thought can be read out of one model and written back through a different one, because
the categories themselves are calibrated to be cross-model invariant on atlas v0.3.
PNG is the format for images. JSON is the format for data.
.fathomis the format for thoughts.
every other interpretability representation — SAE features, activation patches, embedding
vectors — is model-specific and dies the moment a vendor swaps the model under you. a
Thought survives the swap by design. spec: docs/fathom-spec-v0.md.
algebra invariants and round-trip fidelity proven against bundled atlas v0.3 trajectories
in tests/test_thought.py (68 tests, all passing).
5. dynamics — predict, simulate, control cognitive trajectories (3.1.0a1)
import styxx
from styxx.dynamics import CognitiveDynamics, Observation
# 1. collect observation tuples from your fleet
obs = [
Observation.from_thoughts(state=t0, action=a0, next_state=t1),
Observation.from_thoughts(state=t1, action=a1, next_state=t2),
# ... at least 12 tuples for a well-conditioned fit
]
# 2. fit a linear-gaussian dynamics model: s_{t+1} = A·s_t + B·a_t + ε
dyn = CognitiveDynamics()
result = dyn.fit(obs)
print(result) # <FitResult n=… r2=… spectral=…>
# 3. predict the next cognitive state from the current state + action
predicted = dyn.predict(current_thought, target_action)
# 4. simulate offline — multi-step rollout, no real model calls, zero API cost
trajectory = dyn.simulate(initial=t0, actions=[a1, a2, a3])
# 5. controller — find the action that drives state to a target
optimal = dyn.suggest(current=t0, target=styxx.Thought.target("reasoning"))
# 6. natural-drift forecast — what does cognition do under no intervention?
drift_path = dyn.forecast_horizon(t0, n_steps=10)
# 7. save / load
dyn.save("my_agent.cogdyn")
loaded = CognitiveDynamics.load("my_agent.cogdyn")
the field treats LLM inference as open-loop because nobody had a measurable cognitive state vector. fathom's calibrated cross-architecture eigenvalue projection (atlas v0.3) gives us one. once you have a state vector you can fit a dynamical system to it. once you have a dynamical system, you can predict, simulate, and control cognitive trajectories.
styxx.dynamics is the first cognitive dynamics model in the field. v0.1 is linear-Gaussian
and fits in closed form. recovery to machine epsilon on full-rank synthetic data, validated
by 44 tests. spec at docs/cognitive-dynamics-v0.md,
source at styxx/dynamics.py. CC-BY-4.0 spec, MIT impl.
closed-loop cognitive control becomes a one-liner.
provider compatibility
styxx tier-0 vitals require top_logprobs on the chat completion response. OpenAI
(via styxx.OpenAI()) and OpenRouter (passthrough to logprob-supporting models) are
verified. Anthropic Claude is not supported at tier 0 because the Messages API has no
logprobs parameter — styxx.Anthropic() exists as a passthrough wrapper and warns once.
Gemini, Azure, Bedrock, Groq, vLLM, llama.cpp, Ollama, and LiteLLM are not yet verified.
Full matrix + verified usage snippets + contributor TODOs: docs/COMPATIBILITY.md
zero-code-change mode
pip install styxx
export STYXX_AGENT_NAME=my-agent
export STYXX_AUTO_HOOK=1
python my_agent.py # styxx boots, wraps openai, tags every session. done.
set two env vars. every subsequent openai.OpenAI() is transparently wrapped. vitals land on
every response. fingerprints save on exit. a weather report prints on next boot.
honest specs
every number comes from the cross-architecture leave-one-out tests in
fathom-lab/fathom. no rounding. no cherry-picking.
cross-model LOO on 12 open-weight models chance = 0.167
phase 1 (token 0) adversarial 0.52 2.8× chance ★
phase 1 (token 0) reasoning 0.43 2.6× chance
phase 4 (tokens 0-24) reasoning 0.69 4.1× chance ★
phase 4 (tokens 0-24) hallucination 0.52 3.1× chance ★
6/6 model families · pre-registered replication · p = 0.0315
styxx detects adversarial prompts at token zero, reasoning-mode generations by token 25, and hallucination attractors by token 25. it does not replace output-level content filters, measure consciousness, or tell fortunes. instrument panel, not fortune teller.
framework adapters
| install | drop-in for |
|---|---|
pip install styxx[openai] |
openai python sdk |
pip install styxx[anthropic] |
anthropic sdk (text-level, no logprobs) |
pip install styxx[langchain] |
langchain callback handler |
pip install styxx[crewai] |
crewai agent injection |
pip install styxx[autogen] |
autogen agent wrapper |
pip install styxx[langsmith] |
vitals as langsmith trace metadata |
pip install styxx[langfuse] |
vitals as langfuse numeric scores |
typescript / javascript
npm install @fathom_lab/styxx
import { withVitals } from "@fathom_lab/styxx"
import OpenAI from "openai"
const client = withVitals(new OpenAI())
const r = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "why is the sky blue?" }],
})
console.log(r.vitals?.phase4) // "reasoning:0.69"
console.log(r.vitals?.gate) // "pass"
same classifier, same centroids. works in node, deno, bun, edge runtimes. cross-language determinism verified on all six cognitive categories.
more — fleet, memory, compliance, cli (click to expand)
fleet management
styxx.set_agent_name("agent-1")
styxx.list_agents() # discover all agents
styxx.compare_agents() # side-by-side leaderboard
styxx.best_agent_for("reasoning") # cognitive task routing
self-calibration
styxx.calibrate() # outcome-driven centroid adjustment
styxx.train_text_classifier() # per-agent logistic regression
styxx.enable_auto_feedback() # auto-label every observation
cognitive memory
styxx.remember("user prefers concise answers") # trust-weighted memory
styxx.recall("user preferences") # ranked by trust score
styxx.handoff(task, data) # inter-agent state transfer
compliance + provenance
cert = styxx.certify(vitals) # cryptographic cognitive provenance certificate
styxx.compliance_report(days=30) # json/markdown audit export
styxx.probe(agent_fn) # red-team: 15 adversarial prompts
each certificate carries a header of the form:
X-Cognitive-Provenance: styxx:1.0:reasoning:0.82:pass:0.95:verified:496b94b5
cli
styxx weather # cognitive forecast with prescriptions
styxx dashboard # live cognitive display at localhost:9800
styxx reflect # self-check + drift detection
styxx personality # 7-day personality profile
styxx agent-card # shareable personality png
styxx doctor # install-time health check
styxx compare # atlas fixtures side-by-side
styxx fingerprint # cognitive identity vector
styxx export # compliance export (json/markdown)
styxx scan "..." # one-shot vitals on a single prompt
styxx ci-test # cognitive regression testing for CI/CD
environment variables
| variable | effect |
|---|---|
STYXX_AGENT_NAME |
set this and styxx boots automatically + namespaces data under ~/.styxx/agents/{name}/ |
STYXX_AUTO_HOOK=1 |
auto-wrap every openai.OpenAI() call with vitals |
STYXX_DISABLED=1 |
full kill switch — styxx becomes invisible |
STYXX_NO_AUDIT=1 |
disable audit log writes (vitals still computed) |
STYXX_NO_COLOR=1 |
disable ANSI color output |
STYXX_SESSION_ID |
tag audit entries with a session id (auto-generated if unset) |
design principles
- plug and play. set env vars, install, done. zero code changes to existing agents.
- fail-open. if styxx can't read vitals, your agent works normally. styxx never breaks your code.
- agent-facing. every surface is designed for the agent to read about itself, not for a human to watch from outside.
- local-first. no telemetry, no phone-home. all computation runs on your machine.
- honest by construction. every calibration number comes from a committed experiment.
where it comes from
styxx is the production face of fathom-lab/fathom — a research program on cognitive measurement instruments for transformer internals. the research side ships the atlas, the pre-registrations, and the paper. the styxx side ships the runtime.
- research repo: github.com/fathom-lab/fathom
- paper (zenodo doi): doi.org/10.5281/zenodo.19504993
- site: fathom.darkflobi.com/styxx
- pypi: pypi.org/project/styxx
- npm: npmjs.com/package/@fathom_lab/styxx
- twitter: @fathom_lab
patents pending — US Provisional 64/020,489 · 64/021,113 · 64/026,964 — see PATENTS.md.
license
MIT on code. CC-BY-4.0 on the atlas centroid data. patent pending on the underlying methodology.
· · · fathom lab · 2026 · · ·
nothing crosses unseen.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file styxx-3.3.1.tar.gz.
File metadata
- Download URL: styxx-3.3.1.tar.gz
- Upload date:
- Size: 346.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
265175c253f7325c299f0eb4f1afc93ad42334c04785a95f9cf42d118a36ea77
|
|
| MD5 |
6d778c6055968e48de1dde0636ed8c04
|
|
| BLAKE2b-256 |
85961354425e55fa6ac23c9416cdb057d061fcc3623d694668832300c6707dc0
|
File details
Details for the file styxx-3.3.1-py3-none-any.whl.
File metadata
- Download URL: styxx-3.3.1-py3-none-any.whl
- Upload date:
- Size: 322.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed86abcf10965e269a94c22c3b179ec8776360770ef4dab840e1130216327bbc
|
|
| MD5 |
331f077b358ddca87dbe2f3d79f5ba1f
|
|
| BLAKE2b-256 |
4dd698089bd50935255991cf14b07db514c2c3a2d283ac49480ce24ba5d8b9a6
|