Skip to main content

Mathematical health monitor for LLMs — detect degradation and hallucination without NLP

Project description

LLM EKG

Is your AI getting dumber? Now you can prove it.

DOI PyPI Python

Quick StartLive MonitoringReportHow It WorksPaper


LLM EKG is a mathematical health monitor for Large Language Models. It analyzes LLM outputs as time series to detect degradation, hallucination, and behavioral drift — using pure mathematics, not NLP.

No embeddings. No tokenizers. No external AI. Just numpy.

RESULT: DEGRADED (74/100)
Trend: +0.1651
Hallucination risk: 22.96%
Mean persistence: 0.578

Why?

Every company runs LLMs in production. Nobody monitors their output quality mathematically.

  • GPT-4 getting lazier over time? LLM EKG detects it.
  • Claude hallucinating more after an update? LLM EKG catches it.
  • Your fine-tuned model degrading silently? LLM EKG raises the alarm.

The big labs will never build this — it exposes their problems. So we did.

Quick Start

pip install llm-ekg

One command

# Auto-detects format (ChatGPT, Claude, CSV, JSONL, plain text)
llm-ekg conversation.json

# Explicit format
llm-ekg --format chatgpt export.json -o report.html

Three lines of Python

from llm_ekg import LLMAnalyzer

analyzer = LLMAnalyzer()
for response in my_responses:
    result = analyzer.ingest(response["text"])

print(f"{analyzer.get_summary()['verdict']}{analyzer.get_summary()['global_score_100']}/100")

Live Monitoring

Wrap your OpenAI or Anthropic client. Zero code changes.

from llm_ekg import LiveMonitor

monitor = LiveMonitor()

# OpenAI
import openai
client = monitor.wrap_openai(openai.OpenAI())

# Use exactly as before — monitoring is automatic
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Check health anytime
print(f"Score: {monitor.score}/100 — {monitor.verdict}")

# Generate full HTML report
monitor.report("ekg.html")

Works with Anthropic too:

import anthropic
client = monitor.wrap_anthropic(anthropic.Anthropic())

Security Layer

Detect compromised models by comparing against a known-good baseline.

Capture baseline

# Run against trusted model output to create baseline profile
llm-ekg trusted_conversation.json --baseline baseline.json

Security check

# Compare new output against baseline (default: 3 sigma threshold)
llm-ekg new_conversation.json --security-check baseline.json

# Adjust sensitivity
llm-ekg new_conversation.json --security-check baseline.json --sigma 2.0

Output:

SECURITY: CLEAN (0/16 features deviated, threshold: 3.0 sigma)

or:

SECURITY: COMPROMISED (5/16 features deviated, threshold: 3.0 sigma)
  ! hedge_ratio: baseline=0.0312 current=0.1847 z=6.42
  ! repetition_score: baseline=0.0521 current=0.2103 z=4.88
  ...

Python API

from llm_ekg import LLMAnalyzer, SecurityBaseline

# 1. Build baseline from trusted session
analyzer = LLMAnalyzer()
for r in trusted_responses:
    analyzer.ingest(r["text"])
baseline = SecurityBaseline.from_analyzer(analyzer, model="gpt-4")
baseline.save("gpt4_baseline.json")

# 2. Check new session
baseline = SecurityBaseline.load("gpt4_baseline.json")
analyzer2 = LLMAnalyzer()
for r in new_responses:
    analyzer2.ingest(r["text"])
report = baseline.check(analyzer2, sigma=3.0)
print(report.status)  # "CLEAN", "WARNING", or "COMPROMISED"

Live monitoring with security

from llm_ekg import LiveMonitor

monitor = LiveMonitor()
client = monitor.wrap_openai(openai.OpenAI())

# ... use client normally ...

# Save baseline after trusted session
monitor.save_baseline("baseline.json")

# Later: check against baseline
sec_report = monitor.security_check("baseline.json", sigma=3.0)
print(sec_report.status)

# Generate report with security section
monitor.report("report.html", security_report=sec_report)

How it works

Security-sensitive features are weighted higher (1.5x for hedge_ratio, repetition_score, confidence_mismatch; 1.2x for vocab_diversity, specificity_score, assertion_density). A backdoored model changes these signals before the semantic content shifts — the math catches it first.

Status Meaning
CLEAN All features within baseline
WARNING 1-3 features deviated
COMPROMISED 4+ features with significant drift

What It Detects

Signal Meaning
Anomaly rising Model quality is degrading
Drift spike Sudden behavioral shift
Confidence mismatch Hallucination (specific claims + zero hedging)
Assertion density up Model becoming overconfident
Persistence > 0.5 Degradation is trending, not random
Persistence < 0.5 Model self-correcting

How It Works

LLM EKG extracts 16 numerical features from each response — no NLP, no language models, no semantic analysis:

Degradation signals (0-11): response length, word count, vocabulary diversity, word length, sentence count, sentence length, punctuation density, hedge ratio, list usage, code ratio, repetition score, latency.

Hallucination signature (12-15): specificity score (concrete details density), confidence mismatch (specificity vs hedging gap), assertion density (certainty vs uncertainty ratio), self-consistency (internal contradiction score).

These features feed into a proprietary behavioral state engine that computes anomaly scores, drift magnitude, and multi-scale persistence analysis.

All diagnostics are data-driven — zero hardcoded thresholds. Every metric is compared against its own distribution within the session.

HTML Report

Self-contained HTML file. No JavaScript dependencies. Opens in any browser.

9 sections: Executive Summary, Hallucination Monitor, EKG Temporal, Behavioral Metrics (M0-M3), Drift Map, Multi-Scale Analysis, Trend Persistence, Feature Timeline, Diagnostic.

Run the demo

git clone https://github.com/iafiscal1212/llm-ekg.git
cd llm-ekg
pip install -e .
python demo.py
# Open demo_ekg_report.html

Supported Formats

Format Extension Source
ChatGPT .json Settings → Export data
Claude .json claude.ai export
API Log .csv CSV with response column
JSONL .jsonl One JSON per line
Plain Text .txt Blank-line separated

Dependencies

numpy + matplotlib. That's it.

Optional: openai and/or anthropic for live monitoring.

pip install llm-ekg[openai]     # OpenAI wrapper
pip install llm-ekg[anthropic]  # Anthropic wrapper
pip install llm-ekg[all]        # Both

License

Copyright (c) 2024-2026 Carmen Esteban. All rights reserved. No part of this software may be copied, modified, distributed or used without express written permission.

Cite

If you use LLM EKG in your research, please cite:

@software{esteban2026llmekg,
  author    = {Esteban, Carmen},
  title     = {LLM EKG: A Mathematical Health Monitor for Large Language Models},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.19284461},
  url       = {https://doi.org/10.5281/zenodo.19284461}
}

Author

Carmen EstebanIAFISCAL & PARTNERS

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_ekg-1.1.1.tar.gz (32.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_ekg-1.1.1-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_ekg-1.1.1.tar.gz.

File metadata

  • Download URL: llm_ekg-1.1.1.tar.gz
  • Upload date:
  • Size: 32.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_ekg-1.1.1.tar.gz
Algorithm Hash digest
SHA256 5130e6c72bf755a72802b578c7070f831bde59bc053448f8bf851c1cfea71e28
MD5 0a8901707f1029628333eb2ee3ae4969
BLAKE2b-256 da41e5e60caafe11f6b938461853cc0d4cc8e0a3a0b749301c66367afb815037

See more details on using hashes here.

File details

Details for the file llm_ekg-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_ekg-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for llm_ekg-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53f43f5fa4805f86a3c30c16091682422c71bbe51bbb19e49660e518a8c929a5
MD5 f9ecf0ae3698539dfabeedab1dbc9f4d
BLAKE2b-256 d5d958c56e05ba1d1eab52a66d3628dde3aa49ec723bfccde58978922f49880a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page