Mathematical health monitor for LLMs — detect degradation and hallucination without NLP
Project description
LLM EKG
Is your AI getting dumber? Now you can prove it.
Quick Start • Live Monitoring • Report • How It Works • Paper
LLM EKG is a mathematical health monitor for Large Language Models. It analyzes LLM outputs as time series to detect degradation, hallucination, and behavioral drift — using pure mathematics, not NLP.
No embeddings. No tokenizers. No external AI. Just numpy.
RESULT: DEGRADED (74/100)
Trend: +0.1651
Hallucination risk: 22.96%
Mean persistence: 0.578
Why?
Every company runs LLMs in production. Nobody monitors their output quality mathematically.
- GPT-4 getting lazier over time? LLM EKG detects it.
- Claude hallucinating more after an update? LLM EKG catches it.
- Your fine-tuned model degrading silently? LLM EKG raises the alarm.
The big labs will never build this — it exposes their problems. So we did.
Quick Start
pip install llm-ekg
One command
# Auto-detects format (ChatGPT, Claude, CSV, JSONL, plain text)
llm-ekg conversation.json
# Explicit format
llm-ekg --format chatgpt export.json -o report.html
Three lines of Python
from llm_ekg import LLMAnalyzer
analyzer = LLMAnalyzer()
for response in my_responses:
result = analyzer.ingest(response["text"])
print(f"{analyzer.get_summary()['verdict']} — {analyzer.get_summary()['global_score_100']}/100")
Live Monitoring
Wrap your OpenAI or Anthropic client. Zero code changes.
from llm_ekg import LiveMonitor
monitor = LiveMonitor()
# OpenAI
import openai
client = monitor.wrap_openai(openai.OpenAI())
# Use exactly as before — monitoring is automatic
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Check health anytime
print(f"Score: {monitor.score}/100 — {monitor.verdict}")
# Generate full HTML report
monitor.report("ekg.html")
Works with Anthropic too:
import anthropic
client = monitor.wrap_anthropic(anthropic.Anthropic())
Security Layer
Detect compromised models by comparing against a known-good baseline.
Capture baseline
# Run against trusted model output to create baseline profile
llm-ekg trusted_conversation.json --baseline baseline.json
Security check
# Compare new output against baseline (default: 3 sigma threshold)
llm-ekg new_conversation.json --security-check baseline.json
# Adjust sensitivity
llm-ekg new_conversation.json --security-check baseline.json --sigma 2.0
Output:
SECURITY: CLEAN (0/16 features deviated, threshold: 3.0 sigma)
or:
SECURITY: COMPROMISED (5/16 features deviated, threshold: 3.0 sigma)
! hedge_ratio: baseline=0.0312 current=0.1847 z=6.42
! repetition_score: baseline=0.0521 current=0.2103 z=4.88
...
Python API
from llm_ekg import LLMAnalyzer, SecurityBaseline
# 1. Build baseline from trusted session
analyzer = LLMAnalyzer()
for r in trusted_responses:
analyzer.ingest(r["text"])
baseline = SecurityBaseline.from_analyzer(analyzer, model="gpt-4")
baseline.save("gpt4_baseline.json")
# 2. Check new session
baseline = SecurityBaseline.load("gpt4_baseline.json")
analyzer2 = LLMAnalyzer()
for r in new_responses:
analyzer2.ingest(r["text"])
report = baseline.check(analyzer2, sigma=3.0)
print(report.status) # "CLEAN", "WARNING", or "COMPROMISED"
Live monitoring with security
from llm_ekg import LiveMonitor
monitor = LiveMonitor()
client = monitor.wrap_openai(openai.OpenAI())
# ... use client normally ...
# Save baseline after trusted session
monitor.save_baseline("baseline.json")
# Later: check against baseline
sec_report = monitor.security_check("baseline.json", sigma=3.0)
print(sec_report.status)
# Generate report with security section
monitor.report("report.html", security_report=sec_report)
How it works
Security-sensitive features are weighted higher (1.5x for hedge_ratio, repetition_score, confidence_mismatch; 1.2x for vocab_diversity, specificity_score, assertion_density). A backdoored model changes these signals before the semantic content shifts — the math catches it first.
| Status | Meaning |
|---|---|
| CLEAN | All features within baseline |
| WARNING | 1-3 features deviated |
| COMPROMISED | 4+ features with significant drift |
What It Detects
| Signal | Meaning |
|---|---|
| Anomaly rising | Model quality is degrading |
| Drift spike | Sudden behavioral shift |
| Confidence mismatch | Hallucination (specific claims + zero hedging) |
| Assertion density up | Model becoming overconfident |
| Persistence > 0.5 | Degradation is trending, not random |
| Persistence < 0.5 | Model self-correcting |
How It Works
LLM EKG extracts 16 numerical features from each response — no NLP, no language models, no semantic analysis:
Degradation signals (0-11): response length, word count, vocabulary diversity, word length, sentence count, sentence length, punctuation density, hedge ratio, list usage, code ratio, repetition score, latency.
Hallucination signature (12-15): specificity score (concrete details density), confidence mismatch (specificity vs hedging gap), assertion density (certainty vs uncertainty ratio), self-consistency (internal contradiction score).
These features feed into a proprietary behavioral state engine that computes anomaly scores, drift magnitude, and multi-scale persistence analysis.
All diagnostics are data-driven — zero hardcoded thresholds. Every metric is compared against its own distribution within the session.
HTML Report
Self-contained HTML file. No JavaScript dependencies. Opens in any browser.
9 sections: Executive Summary, Hallucination Monitor, EKG Temporal, Behavioral Metrics (M0-M3), Drift Map, Multi-Scale Analysis, Trend Persistence, Feature Timeline, Diagnostic.
Run the demo
git clone https://github.com/iafiscal1212/llm-ekg.git
cd llm-ekg
pip install -e .
python demo.py
# Open demo_ekg_report.html
Supported Formats
| Format | Extension | Source |
|---|---|---|
| ChatGPT | .json |
Settings → Export data |
| Claude | .json |
claude.ai export |
| API Log | .csv |
CSV with response column |
| JSONL | .jsonl |
One JSON per line |
| Plain Text | .txt |
Blank-line separated |
Dependencies
numpy + matplotlib. That's it.
Optional: openai and/or anthropic for live monitoring.
pip install llm-ekg[openai] # OpenAI wrapper
pip install llm-ekg[anthropic] # Anthropic wrapper
pip install llm-ekg[all] # Both
License
Business Source License 1.1 — the same license used by Redis, MariaDB, and Sentry.
- Always free: personal use, internal monitoring, research, education, open source
- Commercial license required: if you sell LLM monitoring as a service
- Converts to Apache 2.0: March 28, 2030
Contact: carmen@iafiscal.es
Cite
If you use LLM EKG in your research, please cite:
@software{esteban2026llmekg,
author = {Esteban, Carmen},
title = {LLM EKG: A Mathematical Health Monitor for Large Language Models},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19284461},
url = {https://doi.org/10.5281/zenodo.19284461}
}
Author
Carmen Esteban — IAFISCAL & PARTNERS
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_ekg-1.1.0.tar.gz.
File metadata
- Download URL: llm_ekg-1.1.0.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
823b3f4c9c329f00ec6c0ee149f9567defb50024aec3e71c84e33a1b59fc8012
|
|
| MD5 |
f698124906d6df0d507970485e8e2fcf
|
|
| BLAKE2b-256 |
c7adfdc7b3f25183af573420c444758fd6176ef4d49373a32f3297bd80447018
|
File details
Details for the file llm_ekg-1.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_ekg-1.1.0-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd6c6a6cc1bb44e3cfc44c906cb2e873049850147168db1b9374f12d8777dade
|
|
| MD5 |
490b29ddebfa7c3f5e3b01e18e6dff8c
|
|
| BLAKE2b-256 |
5240a63ec9088852b08fe9b1f622989c4e11e4946075292e8fccfbc23c2c4cbf
|