Mathematical health monitor for LLMs — detect degradation and hallucination without NLP

These details have not been verified by PyPI

Project links

Project description

LLM EKG

Is your AI getting dumber? Now you can prove it.

Quick Start • Live Monitoring • Report • How It Works • Paper

LLM EKG is a mathematical health monitor for Large Language Models. It analyzes LLM outputs as time series to detect degradation, hallucination, and behavioral drift — using pure mathematics, not NLP.

No embeddings. No tokenizers. No external AI. Just numpy.

RESULT: DEGRADED (74/100)
Trend: +0.1651
Hallucination risk: 22.96%
Mean persistence: 0.578

Why?

Every company runs LLMs in production. Nobody monitors their output quality mathematically.

GPT-4 getting lazier over time? LLM EKG detects it.
Claude hallucinating more after an update? LLM EKG catches it.
Your fine-tuned model degrading silently? LLM EKG raises the alarm.

The big labs will never build this — it exposes their problems. So we did.

Quick Start

pip install llm-ekg

One command

# Auto-detects format (ChatGPT, Claude, CSV, JSONL, plain text)
llm-ekg conversation.json

# Explicit format
llm-ekg --format chatgpt export.json -o report.html

Three lines of Python

from llm_ekg import LLMAnalyzer

analyzer = LLMAnalyzer()
for response in my_responses:
    result = analyzer.ingest(response["text"])

print(f"{analyzer.get_summary()['verdict']} — {analyzer.get_summary()['global_score_100']}/100")

Live Monitoring

Wrap your OpenAI or Anthropic client. Zero code changes.

from llm_ekg import LiveMonitor

monitor = LiveMonitor()

# OpenAI
import openai
client = monitor.wrap_openai(openai.OpenAI())

# Use exactly as before — monitoring is automatic
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Check health anytime
print(f"Score: {monitor.score}/100 — {monitor.verdict}")

# Generate full HTML report
monitor.report("ekg.html")

Works with Anthropic too:

import anthropic
client = monitor.wrap_anthropic(anthropic.Anthropic())

Security Layer

Detect compromised models by comparing against a known-good baseline.

Capture baseline

# Run against trusted model output to create baseline profile
llm-ekg trusted_conversation.json --baseline baseline.json

Security check

# Compare new output against baseline (default: 3 sigma threshold)
llm-ekg new_conversation.json --security-check baseline.json

# Adjust sensitivity
llm-ekg new_conversation.json --security-check baseline.json --sigma 2.0

Output:

SECURITY: CLEAN (0/16 features deviated, threshold: 3.0 sigma)

or:

SECURITY: COMPROMISED (5/16 features deviated, threshold: 3.0 sigma)
  ! hedge_ratio: baseline=0.0312 current=0.1847 z=6.42
  ! repetition_score: baseline=0.0521 current=0.2103 z=4.88
  ...

Python API

from llm_ekg import LLMAnalyzer, SecurityBaseline

# 1. Build baseline from trusted session
analyzer = LLMAnalyzer()
for r in trusted_responses:
    analyzer.ingest(r["text"])
baseline = SecurityBaseline.from_analyzer(analyzer, model="gpt-4")
baseline.save("gpt4_baseline.json")

# 2. Check new session
baseline = SecurityBaseline.load("gpt4_baseline.json")
analyzer2 = LLMAnalyzer()
for r in new_responses:
    analyzer2.ingest(r["text"])
report = baseline.check(analyzer2, sigma=3.0)
print(report.status)  # "CLEAN", "WARNING", or "COMPROMISED"

Live monitoring with security

from llm_ekg import LiveMonitor

monitor = LiveMonitor()
client = monitor.wrap_openai(openai.OpenAI())

# ... use client normally ...

# Save baseline after trusted session
monitor.save_baseline("baseline.json")

# Later: check against baseline
sec_report = monitor.security_check("baseline.json", sigma=3.0)
print(sec_report.status)

# Generate report with security section
monitor.report("report.html", security_report=sec_report)

How it works

Security-sensitive features are weighted higher (1.5x for hedge_ratio, repetition_score, confidence_mismatch; 1.2x for vocab_diversity, specificity_score, assertion_density). A backdoored model changes these signals before the semantic content shifts — the math catches it first.

Status	Meaning
CLEAN	All features within baseline
WARNING	1-3 features deviated
COMPROMISED	4+ features with significant drift

What It Detects

Signal	Meaning
Anomaly rising	Model quality is degrading
Drift spike	Sudden behavioral shift
Confidence mismatch	Hallucination (specific claims + zero hedging)
Assertion density up	Model becoming overconfident
Persistence > 0.5	Degradation is trending, not random
Persistence < 0.5	Model self-correcting

How It Works

LLM EKG extracts 16 numerical features from each response — no NLP, no language models, no semantic analysis:

Degradation signals (0-11): response length, word count, vocabulary diversity, word length, sentence count, sentence length, punctuation density, hedge ratio, list usage, code ratio, repetition score, latency.

Hallucination signature (12-15): specificity score (concrete details density), confidence mismatch (specificity vs hedging gap), assertion density (certainty vs uncertainty ratio), self-consistency (internal contradiction score).

These features feed into a proprietary behavioral state engine that computes anomaly scores, drift magnitude, and multi-scale persistence analysis.

All diagnostics are data-driven — zero hardcoded thresholds. Every metric is compared against its own distribution within the session.

HTML Report

Self-contained HTML file. No JavaScript dependencies. Opens in any browser.

9 sections: Executive Summary, Hallucination Monitor, EKG Temporal, Behavioral Metrics (M0-M3), Drift Map, Multi-Scale Analysis, Trend Persistence, Feature Timeline, Diagnostic.

Run the demo

git clone https://github.com/iafiscal1212/llm-ekg.git
cd llm-ekg
pip install -e .
python demo.py
# Open demo_ekg_report.html

Supported Formats

Format	Extension	Source
ChatGPT	`.json`	Settings → Export data
Claude	`.json`	claude.ai export
API Log	`.csv`	CSV with `response` column
JSONL	`.jsonl`	One JSON per line
Plain Text	`.txt`	Blank-line separated

Dependencies

numpy + matplotlib. That's it.

Optional: openai and/or anthropic for live monitoring.

pip install llm-ekg[openai]     # OpenAI wrapper
pip install llm-ekg[anthropic]  # Anthropic wrapper
pip install llm-ekg[all]        # Both

License

Cite

If you use LLM EKG in your research, please cite:

@software{esteban2026llmekg,
  author    = {Esteban, Carmen},
  title     = {LLM EKG: A Mathematical Health Monitor for Large Language Models},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19284461},
  url       = {https://doi.org/10.5281/zenodo.19284461}
}

Author

Carmen Esteban — IAFISCAL & PARTNERS

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.1

Mar 31, 2026

1.1.0

Mar 29, 2026

1.0.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_ekg-1.1.1.tar.gz (32.6 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_ekg-1.1.1-py3-none-any.whl (26.1 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file llm_ekg-1.1.1.tar.gz.

File metadata

Download URL: llm_ekg-1.1.1.tar.gz
Upload date: Mar 31, 2026
Size: 32.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_ekg-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5130e6c72bf755a72802b578c7070f831bde59bc053448f8bf851c1cfea71e28`
MD5	`0a8901707f1029628333eb2ee3ae4969`
BLAKE2b-256	`da41e5e60caafe11f6b938461853cc0d4cc8e0a3a0b749301c66367afb815037`

See more details on using hashes here.

File details

Details for the file llm_ekg-1.1.1-py3-none-any.whl.

File metadata

Download URL: llm_ekg-1.1.1-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 26.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_ekg-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`53f43f5fa4805f86a3c30c16091682422c71bbe51bbb19e49660e518a8c929a5`
MD5	`f9ecf0ae3698539dfabeedab1dbc9f4d`
BLAKE2b-256	`d5d958c56e05ba1d1eab52a66d3628dde3aa49ec723bfccde58978922f49880a`

See more details on using hashes here.

llm-ekg 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLM EKG

Why?

Quick Start

One command

Three lines of Python

Live Monitoring

Security Layer

Capture baseline

Security check

Python API

Live monitoring with security

How it works

What It Detects

How It Works

HTML Report

Run the demo

Supported Formats

Dependencies

License

Cite

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes