Federated early-warning probe for silent LLM degradation.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Kapibara

These details have not been verified by PyPI

Project description

SEISMOGRAPH

Created by Tatiana Radchenko

pip install seismograph-probe   # the probe SDK — Python 3.11+

▶ Live dashboard: seismograph-weather.onrender.com/dashboard — real drift-weather for 4 production models, refreshed live. (Free host; first load may take ~30s if the instance is asleep.)

Your LLM didn't get worse. It changed — and nobody told you.

Teams build on LLM APIs they don't control. Providers update models silently — same name, same endpoint, different behavior — and prompts that worked yesterday break today, with no announcement and no alert. SEISMOGRAPH is the smoke detector for that risk: a privacy-preserving early-warning network that continuously probes models and tells you the moment one drifts from its baseline — before it costs you a customer.

In a reproducible backtest, detected the Anthropic Claude Sonnet 4 silent degradation on 2025-08-10 -- 38 days before the official Sep 17 postmortem and 19 days before the escalation became visible to users. Detection occurred during the 0.8% misrouting window, before any user-visible symptoms appeared.

SEISMOGRAPH Model Weather dashboard — live drift status for four production LLMs

Live "model weather" — open the public dashboard (no login).

Documentation: Whitepaper (PDF) · Roadmap · Security & threat model · Architecture · dev.to: Your LLM didn't get worse · DOI 10.5281/zenodo.21045517

Want this watched for you? I run Drift Defense on top of this engine — a free Drift Exposure Scan maps where a silent model change would hit your stack first.

Technical overview

SEISMOGRAPH detects semantic drift in third-party LLM APIs — behavioral change that emits no latency or uptime signal, so conventional monitoring misses it. A fixed, content-addressed canary suite runs against any OpenAI-compatible endpoint at temperature 0. Each response is reduced to privacy-preserving features — SHA-256 hashes plus ε=2.0 Laplace-DP-noised aggregates; raw prompts and outputs never leave the probe perimeter. Every batch is Ed25519-signed, and alerts are gated behind cross-observer quorum, so no single noisy probe can raise a false alarm. Built with a Python probe SDK, a FastAPI ingestion gateway, change-point detection (CUSUM + Bayesian online change-point), and full CI (ruff + pytest + CodeQL, 134 tests). Apache-2.0. In backtest, it flagged a major provider's drift 38 days before the public postmortem.

The problem

Every AI team eventually hits this at 2am:

json_parse_errors up 12%.  latency: normal.  uptime: 100%.
My prompt didn't change.  My code didn't change.
Is it me, or did the model silently change underneath me?

Provider APIs do not broadcast behavioral changes. Endpoints that return 200 can still produce subtly different outputs -- degraded JSON fidelity, shifted response length distributions, changed reasoning patterns. Standard monitoring (latency, error rate, uptime) is blind to semantic drift.

SEISMOGRAPH answers the question. Not by trusting a single observer, but by correlating canary probe signals across independent organisations so that no single bad actor -- or noisy probe -- can trigger a false alarm.

The proof: Phase 0 backtest

Anthropic published a postmortem on 2025-09-17 describing three infrastructure bugs that silently degraded output quality (no intentional model change). The first and longest-lived: a context-window routing error introduced 2025-08-05 that misrouted a fraction of Claude Sonnet 4 requests. It began as 0.8% misrouting (Phase 1) and escalated to ~16% on 2025-08-29 (Phase 2). This backtest models that first bug.

SEISMOGRAPH (simulated, SEED=42, reproducible) would have alerted on 2025-08-10:

CUSUM S- trace -- json_success_rate (anthropic/claude-sonnet-4@global)
  Baseline: mu0=0.9903, sigma0=0.00437, h=5.0, k=0.5

  Date        Phase          rate    S-      note
  -------------------------------------------------------
  2025-08-05  Phase1(0.8%)   0.9855  0.598   [bug introduced]
  2025-08-06  Phase1(0.8%)   0.9857  1.142
  2025-08-07  Phase1(0.8%)   0.9786  3.309
  2025-08-08  Phase1(0.8%)   0.9877  3.396
  2025-08-09  Phase1(0.8%)   0.9816  4.889
  2025-08-10  Phase1(0.8%)   0.9777  7.278   <<< FIRST ALERT
  ...
  2025-08-29  Phase2(16%)    --      --      [escalation visible to users]
  2025-09-17  --             --      --      [official postmortem published]

  Lead over escalation:  19 days
  Lead over postmortem:  38 days

Reproduce: python scripts/anthropic_backtest.py Full report: notebooks/anthropic_backtest_report.md

How it works

Privacy-first probe SDK

The probe runs inside your infrastructure. It executes a frozen canary suite (<=200 prompts, temperature 0) against your LLM API endpoint. Raw prompts and model outputs never leave your perimeter.

What gets transmitted:

SHA-256 hash of each response (not the response itself)
DP-noised distributional features: avg_output_length (Laplace, scale=4096), json_success_rate (Laplace, scale=0.5), result_count
Canary suite version hash (content-addressed, immutable baselines)
Probe public key (Ed25519, pseudonymous -- no org identity disclosed)

Epsilon budget: 2.0 per flush via the Laplace mechanism. Sequential composition tracking is a Phase 2 design item (REQ-PRIV-010).

Page-CUSUM change-point detection

The gateway ingests probe batches and feeds each DP-noised metric into a Page-CUSUM detector per (model_tuple, metric_name) tuple:

S+(n) = max(0, S+(n-1) + z(n) - k)    # upward shifts
S-(n) = max(0, S-(n-1) - z(n) - k)    # downward shifts
Alert when S+ or S- > h

Parameters: h=5.0, k=0.5, baseline_samples=30. The baseline window estimates mu0 and sigma0 from the first 30 observations before drift detection activates. Sigma is clamped at 1e-9 to prevent division by zero on constant-value streams.

CUSUM state is shared per (model_tuple, metric_name) across all client IDs -- contributing organisations build a shared baseline, which is what makes cross-org comparison possible.

Quorum Agreement Scorer

A single-organisation CUSUM alert is never promoted to a public drift alert. This filters probe bugs, network hiccups, and Sybil attacks.

QUORUM_MIN = 2  # minimum distinct org_ids required for a public alert

# Engine logic (engine/correlation.py):
scorer.ingest(ChangePointResult(change_detected=True, contributing_orgs=[client_id]))
org_count = scorer.promote_to_public_alert(model_tuple)
if org_count is not None:          # >= QUORUM_MIN orgs agree
    repo.save_public_alert(...)    # written to public_drift_alerts table
    scorer.clear(model_tuple)

The GET /v1/weather endpoint queries only PublicDriftAlert. Local single-org alerts are private fleet data, never surfaced publicly.

Storage schema

local_drift_alerts   -- private per-org CUSUM events (client_id, cusum_score)
public_drift_alerts  -- quorum-verified events (contributing_org_count)
telemetry_signals    -- raw ingested batches (DP-noised metrics only)

Quickstart

Use just the probe (publish signals to a gateway):

pip install seismograph-probe   # Python 3.11+

Run the full stack from source (gateway + dashboard + tests):

Requirements: Python 3.10+, pip

git clone https://github.com/Tania-coder/SEISMOGRAPH.git
cd SEISMOGRAPH
pip install -e ".[dev]"

Terminal 1 -- start the gateway:

uvicorn gateway.main:app --host 0.0.0.0 --port 8000 --reload

Browser -- model weather dashboard:

http://localhost:8000/dashboard

Polls GET /v1/weather every 60 seconds. Shows STABLE / DRIFTING per model tuple with last alert timestamp and recent JSON success rate.

Terminal 2 -- run the federated quorum demo:

python scripts/demo_simulation.py

Watch two independent organisations (Client A: startup, Client B: enterprise) discover a silent model update in real-time. Phase 1 shows a stable baseline. Phase 2 shows Client A detecting drift while the public dashboard stays STABLE (quorum not met -- the privacy gate holds). Phase 3 shows Client B confirming the same degradation, quorum is reached, and the dashboard flips to DRIFTING.

  [sunny] -> STABLE  | json_rate=0.951 | last_alert=none
  ...
  [storm] -> DRIFTING | json_rate=0.312 | last_alert=2026-06-12T...

Repository structure

probe/
  sdk.py          -- ProbeSDK: span lifecycle, DP-noised flush, OTel attrs
  canary.py       -- CANARY_SUITE_V1 (3 prompts, content-addressed)
  privacy.py      -- Aggregator + Laplace DP noise + metric key whitelist

engine/
  detector.py     -- CUSUMDetector (Page-CUSUM, shared per model_tuple)
  correlation.py  -- AgreementScorer (QUORUM_MIN=2, cross-org quorum gate)
  models.py       -- SQLAlchemy 2.0 ORM: LocalDriftAlert, PublicDriftAlert
  repository.py   -- SignalRepository: save/query with naive-UTC timestamps

gateway/
  main.py         -- FastAPI app: POST /v1/signals, GET /v1/weather, GET /
  schema.py       -- Pydantic v2 schemas (extra=forbid, frozen=True)
  auth.py         -- Ed25519 stub (Phase 2: REQ-PRIV-002)

dashboard/static/
  index.html      -- dark-mode UI, CSS Grid weather cards
  app.js          -- vanilla JS, 60s polling, XSS-safe DOM construction

scripts/
  demo_simulation.py      -- federated quorum demo (two ProbeSDK clients)
  anthropic_backtest.py   -- Phase 0 reproducible backtest (SEED=42)

tests/
  test_gateway.py   -- 23 tests: ingestion, CUSUM, quorum, weather, dashboard
  test_storage.py   -- storage layer: save/query LocalDriftAlert + signals
  test_sdk.py       -- probe SDK: span lifecycle, flush, DP noise, dry_run
  conftest.py       -- autouse in-memory SQLite DB fixture

Test suite

134 passed, 0 failed
ruff: 0 violations across all Python files
CodeQL (security-extended): 0 open alerts

Key adversarial tests:

test_single_org_noise_blocked (T10): one org fires CUSUM -- weather stays STABLE
test_quorum_reached_triggers_dashboard (T11): two orgs fire CUSUM -- weather DRIFTING

Phase roadmap

Phase	Status	Milestone
0 -- Validation	COMPLETE	38-day backtest lead time validated
1 -- Solo MVP	COMPLETE	FastAPI + SQLite + dashboard + quorum live
2 -- Network growth	CORE COMPLETE	Ed25519 signing, ClickHouse layer, DP noise + quorum gating live
3 -- Enterprise	In progress	Multi-tenant + audit + webhooks shipped; SOC 2, in-VPC probe, SLAs planned

Architecture document

SEISMOGRAPH_Architecture.md -- 333 lines covering data flow, DP noise spec, CUSUM calibration rationale, quorum algorithm, OTel integration plan, security model (Ed25519 pseudonymous federation, Sybil resistance design), and open decisions with phase assignments.

Citation

If you use SEISMOGRAPH in your work, please cite the archived release:

@software{radchenko_seismograph_2026,
  author    = {Radchenko, Tatiana},
  title     = {{SEISMOGRAPH: A Federated, Privacy-Preserving Early-Warning
               Network for Silent LLM/Agent API Drift}},
  year      = {2026},
  publisher = {Zenodo},
  version   = {v1.0.1},
  doi       = {10.5281/zenodo.21045517},
  url       = {https://doi.org/10.5281/zenodo.21045517}
}

GitHub also renders a "Cite this repository" button from CITATION.cff.

Privacy by construction

The probe SDK is designed so that violating the privacy boundary requires actively removing a safety. The Aggregator class in probe/privacy.py:

Hashes every response with SHA-256 before storing it
Applies Laplace DP noise to every outgoing metric
Enforces an ALLOWED_METRIC_KEYS whitelist -- unknown keys are dropped
Never stores raw prompt text or raw model output

The gateway enforces a matching ALLOWED_METRIC_KEYS frozenset on inbound batches (422 on unknown keys). There is no code path in the system that stores or forwards raw prompt or output content.

License

Apache 2.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Kapibara

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.0

Jul 18, 2026

1.0.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seismograph_probe-1.1.0.tar.gz (39.5 kB view details)

Uploaded Jul 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seismograph_probe-1.1.0-py3-none-any.whl (44.9 kB view details)

Uploaded Jul 18, 2026 Python 3

File details

Details for the file seismograph_probe-1.1.0.tar.gz.

File metadata

Download URL: seismograph_probe-1.1.0.tar.gz
Upload date: Jul 18, 2026
Size: 39.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seismograph_probe-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`02886c5882183da1b12ebfb3296ecc9d45ee69d3df379ce6fcf2575cb8972c96`
MD5	`ec62af4885c6fc5a71166d68961849f6`
BLAKE2b-256	`d3454c00d95097a23d6d6f492835235a1101faa4dc634ff9bfd17a420e0b2114`

See more details on using hashes here.

Provenance

The following attestation bundles were made for seismograph_probe-1.1.0.tar.gz:

Publisher: release.yml on Tania-coder/SEISMOGRAPH

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: seismograph_probe-1.1.0.tar.gz
- Subject digest: 02886c5882183da1b12ebfb3296ecc9d45ee69d3df379ce6fcf2575cb8972c96
- Sigstore transparency entry: 2194992851
- Sigstore integration time: Jul 18, 2026
Source repository:
- Permalink: Tania-coder/SEISMOGRAPH@df4b900d82f7158f0867eca1f55381568dab4c4f
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/Tania-coder
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@df4b900d82f7158f0867eca1f55381568dab4c4f
- Trigger Event: release

File details

Details for the file seismograph_probe-1.1.0-py3-none-any.whl.

File metadata

Download URL: seismograph_probe-1.1.0-py3-none-any.whl
Upload date: Jul 18, 2026
Size: 44.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seismograph_probe-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`eea1a52993ca2353bb78fe1eaa4da01e965cf7743a48fc52dd3851e681133b30`
MD5	`d64a803bbc5e8abf868d9ccbe0a8b689`
BLAKE2b-256	`795ba6a520251db7ab0b84f4b7fb900c7e9b32d51ea67add0a5fd86fda137356`

See more details on using hashes here.

Provenance

The following attestation bundles were made for seismograph_probe-1.1.0-py3-none-any.whl:

Publisher: release.yml on Tania-coder/SEISMOGRAPH

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: seismograph_probe-1.1.0-py3-none-any.whl
- Subject digest: eea1a52993ca2353bb78fe1eaa4da01e965cf7743a48fc52dd3851e681133b30
- Sigstore transparency entry: 2194992854
- Sigstore integration time: Jul 18, 2026
Source repository:
- Permalink: Tania-coder/SEISMOGRAPH@df4b900d82f7158f0867eca1f55381568dab4c4f
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/Tania-coder
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@df4b900d82f7158f0867eca1f55381568dab4c4f
- Trigger Event: release

seismograph-probe 1.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SEISMOGRAPH

Technical overview

The problem

The proof: Phase 0 backtest

How it works

Privacy-first probe SDK

Page-CUSUM change-point detection

Quorum Agreement Scorer

Storage schema

Quickstart

Repository structure

Test suite

Phase roadmap

Architecture document

Citation

Privacy by construction

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance