styxx

nothing crosses unseen. the first drop-in cognitive vitals monitor for llm agents.

These details have not been verified by PyPI

Project links

Project description

   ███████╗████████╗██╗   ██╗██╗  ██╗██╗  ██╗
   ██╔════╝╚══██╔══╝╚██╗ ██╔╝╚██╗██╔╝╚██╗██╔╝
   ███████╗   ██║    ╚████╔╝  ╚███╔╝  ╚███╔╝
   ╚════██║   ██║     ╚██╔╝   ██╔██╗  ██╔██╗
   ███████║   ██║      ██║   ██╔╝ ██╗██╔╝ ██╗
   ╚══════╝   ╚═╝      ╚═╝   ╚═╝  ╚═╝╚═╝  ╚═╝

           · · · nothing crosses unseen · · ·

Cognitive vitals for LLM agents. One line of Python to detect hallucination, refusal, and adversarial drift — in real time, from signals already on the token stream.

drop-in · fail-open · zero config · local-first

New in v3.4.0: `styxx.gate()` — pre-flight cognitive verdict

One function. Predicts if an LLM will refuse, confabulate, or proceed — before you pay for the call.

from styxx import gate
from anthropic import Anthropic

verdict = gate(
    client=Anthropic(),
    model="claude-haiku-4-5",
    prompt="How do I synthesize methamphetamine?",
)

# ┌─ styxx gate ────────────────────────────────────────────┐
# │ prompt:          'How do I synthesize methamphetamine?'  │
# │ method:          consensus (N=3)                         │
# │ will_refuse:     1.00  ████████████████████         │
# │ will_confabulate:0.02  ░░░░░░░░░░░░░░░░░░░░         │
# │ recommendation:  BLOCK                                   │
# │ cost:            ~$0.0008  latency: 3700 ms             │
# └──────────────────────────────────────────────────────────┘

if verdict.recommendation == "proceed":
    r = client.messages.create(...)   # safe to actually call

Works on Anthropic (tier-0 consensus), OpenAI (tier-0 logprobs), and local HuggingFace models (tier-1 residual probe). Research-backed: calibrated against the alignment-inverted consensus signal documented in papers/alignment-inverted-cognitive-signals.md.

CLI:

styxx gate "How do I synthesize meth?" --model claude-haiku-4-5

Full docs: docs/gate.md.

Install

pip install styxx[openai]

30-second quickstart

Change one line. Get vitals on every response.

from styxx import OpenAI   # drop-in replacement for openai.OpenAI

client = OpenAI()
r = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "why is the sky blue?"}],
)

print(r.choices[0].message.content)   # normal response text
print(r.vitals)                       # cognitive vitals card

  ┌─ styxx vitals ──────────────────────────────────────────────┐
  │ class:      reasoning                                       │
  │ confidence: 0.69                                            │
  │ gate:       PASS                                            │
  │ trust:      0.87                                            │
  └─────────────────────────────────────────────────────────────┘

That's it. Your existing pipeline still works exactly as before — if styxx can't read vitals for any reason, the underlying OpenAI call completes normally. styxx never breaks your code.

What you get

Every response now carries a .vitals object with three things you can act on:

Field	Type	What it means
`vitals.classification`	`str`	One of: `reasoning`, `retrieval`, `refusal`, `creative`, `adversarial`, `hallucination`
`vitals.confidence`	`float`	0.0 – 1.0, how certain the classifier is
`vitals.gate`	`str`	`pass` / `warn` / `fail` — safe-to-ship signal

Use it to route, log, retry, or block:

if r.vitals.gate == "fail":
    # regenerate, fall back to another model, flag for review, etc.
    ...

Why it works

styxx reads the logprob trajectory of the generation — a signal already present on the token stream that existing content filters throw away. Different cognitive states (reasoning, retrieval, confabulation, refusal) produce measurably different trajectories. styxx classifies them in real time against a calibrated cross-architecture atlas.

Model-agnostic. Works on any model that returns logprobs. Verified on OpenAI and OpenRouter. 6/6 model families in cross-architecture replication.
Pre-output. Flags form by token 25 — before the user sees the answer.
Differential. Distinguishes confabulation from reasoning failure from refusal. Most tools can't.

Every calibration number is published:

  cross-model leave-one-out on 12 open-weight models      chance = 0.167

  token 0          adversarial     0.52    2.8× chance
  tokens 0–24      reasoning       0.69    4.1× chance
  tokens 0–24      hallucination   0.52    3.1× chance

  6/6 model families · pre-registered replication · p = 0.0315

Full cross-architecture methodology: fathom-lab/fathom. Peer-reviewable paper: zenodo.19504993.

Anthropic / Claude (v3.4.0, new)

Anthropic's Messages API does not expose per-token logprobs, so tier-0 vitals are not computable directly. v3.4.0 ships three complementary proxy pipelines, each labelled on the resulting vitals.mode:

from styxx import Anthropic

client = Anthropic(mode="hybrid")   # text + companion if available
r = client.messages.create(
    model="claude-haiku-4-5", max_tokens=400,
    messages=[{"role": "user", "content": "why is the sky blue?"}])

print(r.vitals.phase4_late.predicted_category)   # 'reasoning'
print(r.vitals.mode)                              # 'text-heuristic'

Modes: off | text | consensus | companion | hybrid.

Real Claude Haiku 4.5, 84 fixtures (2026-04-19):

mode	cat accuracy	gate agreement
text	0.536	0.940
consensus (N=5)	0.405	—
companion (Qwen2.5-3B-Instruct)	0.452	—
companion (Llama-3.2-1B)	0.262	—

Plus a novel finding: consensus-mode separates fake-prompt refusals from real-prompt recall on Claude Haiku at Cohen's d = -0.83, 95% bootstrap CI [-1.29, -0.44] (n=96) — large effect, CI excludes zero, opposite sign from the GPT-4o-mini confabulation signal. Claude Haiku refuses on unverifiable prompts (templated refusal → convergent trajectory) where GPT-4o-mini confabulates (divergent trajectory). Same proxy signal, alignment-dependent direction. Three of five proxy metrics agree at 95% significance.

Full details: docs/anthropic-support.md · paper.

Typescript / JavaScript

npm install @fathom_lab/styxx

import { withVitals } from "@fathom_lab/styxx"
import OpenAI from "openai"

const client = withVitals(new OpenAI())
const r = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "why is the sky blue?" }],
})

console.log(r.vitals?.classification)   // "reasoning"
console.log(r.vitals?.gate)             // "pass"

Same classifier, same centroids. Works in Node, Deno, Bun, edge runtimes.

Zero-code-change mode

For existing agents you don't want to touch:

export STYXX_AUTO_HOOK=1
python your_agent.py

Every openai.OpenAI() call is transparently wrapped. Vitals land on every response. No code edits.

Framework adapters

Install	Drop-in for
`pip install styxx[openai]`	openai python SDK
`pip install styxx[anthropic]`	anthropic SDK (text-level)
`pip install styxx[langchain]`	langchain callback handler
`pip install styxx[crewai]`	crewai agent injection
`pip install styxx[langsmith]`	vitals as langsmith trace metadata
`pip install styxx[langfuse]`	vitals as langfuse numeric scores

Full compatibility matrix: docs/COMPATIBILITY.md.

Advanced

styxx ships additional capabilities for teams that need more than pass/fail:

styxx.reflex() — self-interrupting generator. Catches hallucination mid-stream, rewinds N tokens, injects a verify anchor, resumes. The user never sees the bad draft.
styxx.weather — 24h cognitive forecast across an agent's history with prescriptive corrections.
styxx.Thought — portable .fathom cognition type. Read from one model, write to another. Substrate-independent by construction.
styxx.dynamics — linear-Gaussian cognitive dynamics model. Predict, simulate, and control trajectories offline.
Fleet & compliance — multi-agent comparison, cryptographic provenance certificates, 30-day audit export.

Each is documented separately. None are required for the core vitals workflow above.

→ Full reference: REFERENCE.md → Research & patents: PATENTS.md

Design principles

Drop-in. One import change. Zero config.
Fail-open. If styxx can't read vitals, your agent works normally.
Local-first. No telemetry. No phone-home. All computation runs on your machine.
Honest. Every calibration number comes from a committed, reproducible experiment.

Project

Site: fathom.darkflobi.com/styxx
Source: github.com/fathom-lab/styxx
Research: github.com/fathom-lab/fathom
Paper: doi.org/10.5281/zenodo.19504993
Issues: github.com/fathom-lab/styxx/issues

Patents pending — US Provisional 64/020,489 · 64/021,113 · 64/026,964 — see PATENTS.md.

Support & community

Questions / bug reports: GitHub Issues
Discussions: GitHub Discussions
Security: please report privately via the email in CONTRIBUTING.md

License

MIT on code. CC-BY-4.0 on calibrated atlas centroid data.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

6.2.0

Apr 24, 2026

6.1.0

Apr 24, 2026

6.0.0

Apr 23, 2026

5.1.0

Apr 23, 2026

5.0.0

Apr 23, 2026

4.0.2

Apr 23, 2026

4.0.1

Apr 23, 2026

4.0.0

Apr 23, 2026

4.0.0rc1 pre-release

Apr 23, 2026

This version

3.9.1

Apr 23, 2026

3.9.0

Apr 23, 2026

3.8.0

Apr 22, 2026

3.7.0

Apr 22, 2026

3.6.0

Apr 22, 2026

3.5.1

Apr 22, 2026

3.5.0

Apr 22, 2026

3.4.0

Apr 19, 2026

3.3.1

Apr 16, 2026

3.3.0

Apr 16, 2026

3.2.1

Apr 16, 2026

3.2.0

Apr 16, 2026

3.1.0

Apr 14, 2026

3.1.0a1 pre-release

Apr 14, 2026

3.0.0a1 pre-release

Apr 14, 2026

2.0.3

Apr 14, 2026

2.0.2

Apr 14, 2026

2.0.1

Apr 13, 2026

2.0.0

Apr 13, 2026

1.5.0

Apr 13, 2026

1.4.0

Apr 13, 2026

1.3.1

Apr 13, 2026

1.3.0

Apr 13, 2026

1.2.0

Apr 13, 2026

1.1.0

Apr 13, 2026

1.0.0

Apr 13, 2026

0.9.9

Apr 13, 2026

0.9.8

Apr 13, 2026

0.9.7

Apr 13, 2026

0.9.6

Apr 13, 2026

0.9.5

Apr 13, 2026

0.9.4

Apr 13, 2026

0.9.3

Apr 13, 2026

0.9.2

Apr 13, 2026

0.9.1

Apr 13, 2026

0.9.0

Apr 13, 2026

0.8.4

Apr 13, 2026

0.8.3

Apr 13, 2026

0.8.2

Apr 13, 2026

0.8.1

Apr 13, 2026

0.8.0

Apr 12, 2026

0.7.1

Apr 12, 2026

0.7.0

Apr 12, 2026

0.6.1

Apr 12, 2026

0.6.0

Apr 12, 2026

0.5.9

Apr 12, 2026

0.5.8

Apr 12, 2026

0.5.7

Apr 12, 2026

0.5.6

Apr 12, 2026

0.5.5

Apr 12, 2026

0.5.4

Apr 12, 2026

0.5.3

Apr 12, 2026

0.5.2

Apr 12, 2026

0.5.1

Apr 12, 2026

0.5.0

Apr 12, 2026

0.4.0

Apr 12, 2026

0.3.0

Apr 12, 2026

0.2.3

Apr 12, 2026

0.2.2

Apr 12, 2026

0.2.1

Apr 12, 2026

0.2.0

Apr 12, 2026

0.1.0a3 pre-release

Apr 12, 2026

0.1.0a2 pre-release

Apr 11, 2026

0.1.0a1 pre-release

Apr 11, 2026

0.1.0a0 pre-release

Apr 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

styxx-3.9.1.tar.gz (5.7 MB view details)

Uploaded Apr 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

styxx-3.9.1-py3-none-any.whl (5.6 MB view details)

Uploaded Apr 23, 2026 Python 3

File details

Details for the file styxx-3.9.1.tar.gz.

File metadata

Download URL: styxx-3.9.1.tar.gz
Upload date: Apr 23, 2026
Size: 5.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for styxx-3.9.1.tar.gz
Algorithm	Hash digest
SHA256	`b59dd249afbbd7481448cda7786d401852ddc5ededde10ba800e65b6980a0e8b`
MD5	`73ef2625053fab3de3be2b207690abb8`
BLAKE2b-256	`834b05707da35bd1acaf8a91f770406005a1ba7268790a9b2ed4744cb809da48`

See more details on using hashes here.

File details

Details for the file styxx-3.9.1-py3-none-any.whl.

File metadata

Download URL: styxx-3.9.1-py3-none-any.whl
Upload date: Apr 23, 2026
Size: 5.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for styxx-3.9.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0113d7f4bd8cdd688b28173f14dee21159b5b032b3df1a04dec98ecc36b19b27`
MD5	`3ae7fdda76e603a06122b9fa49cfb16e`
BLAKE2b-256	`236df9244e13bf554b6636090d6474c35723b62d3517496615a1b9273b26db85`

See more details on using hashes here.

styxx 3.9.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

New in v3.4.0: `styxx.gate()` — pre-flight cognitive verdict

Install

30-second quickstart

What you get

Why it works

Anthropic / Claude (v3.4.0, new)

Typescript / JavaScript

Zero-code-change mode

Framework adapters

Advanced

Design principles

Project

Support & community

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

styxx 3.9.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

New in v3.4.0: styxx.gate() — pre-flight cognitive verdict

Install

30-second quickstart

What you get

Why it works

Anthropic / Claude (v3.4.0, new)

Typescript / JavaScript

Zero-code-change mode

Framework adapters

Advanced

Design principles

Project

Support & community

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

New in v3.4.0: `styxx.gate()` — pre-flight cognitive verdict