Online Lyapunov-drift monitor for ML retraining loops: alert when the loop trends unstable, before eval metrics show it.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sophie.nguyenthuthuy

These details have not been verified by PyPI

Project description

lyapmon

Online Lyapunov-drift monitoring for ML retraining loops.

Your retraining DAG is a closed-loop dynamical system: the model shapes the data that trains the next model (data → train → deploy → data). Closed loops can go unstable — exposure bias, label feedback, recursive training — and when they do, the holdout eval is the last place it shows up.

lyapmon watches the loop the way control engineering watches a plant. Each cycle it builds a small state vector x_k from observables the pipeline already has, evaluates a Lyapunov candidate V(x_k), and runs an online test on the drift

ΔV_k = V(x_k+1) − V(x_k)

While the expected drift is negative the loop is contracting toward its commissioned-good state and may run autonomously. The first sustained positive drift fires an alert — and, wired as an Airflow gate, blocks the auto-deploy edge and pulls a human back in. That is bounded delegation packaged as an observability tool: the loop earns its autonomy cycle by cycle, and loses it the moment the stability evidence does.

ingest ──▶ train ──▶ evaluate ──▶ lyapunov_gate ──▶ deploy
                                       │
                                       ▼  E[ΔV] > 0, sustained
                                  ✗ fail task: block deploy, page a human

Why drift on V, not a threshold on eval loss?

Eval loss on a fixed holdout grows quadratically in the model's bias — it stays inside its noise band long after the loop has gone divergent. Distribution observables (training-batch PSI, prediction shift, parameter movement) grow linearly, and a trend test on increments fires before a level test on a lagging metric. The bundled simulation measures exactly this lead time against the rule it replaces (eval mean + 3σ, 2 consecutive):

$ lyapmon simulate --feedback-gain 0.65
...
lyapmon UNSTABLE at cycle 34
naive eval-loss alarm (mean+3σ, 2 consecutive) at cycle 42
lead time: 8 cycles

Mean lead over a 10-seed sweep is ~4 cycles with zero false alarms on stable and near-critical loops (asserted in tests/test_sim.py). With delayed outcome labels — the usual production reality — the lead widens (--label-delay 5 → 12 cycles), because the state vector is built from label-free observables that stay current while the eval waits for labels.

demo plot

Install

pip install lyapmon              # core: numpy only
pip install 'lyapmon[mlflow]'    # + MLflow logging/backfill
pip install 'lyapmon[prometheus]'# + Pushgateway export
pip install 'lyapmon[plot]'      # + simulation plots

Quickstart

from lyapmon import LyapunovMonitor, JSONLStore, WebhookAlerter, psi, mean_shift

monitor = LyapunovMonitor(
    features=["eval_auc", "psi_train", "pred_shift", "weight_delta"],
    warmup=10,                                  # cycles assumed healthy; fits V
    store=JSONLStore("/shared/lyapmon/history.jsonl"),
    alerters=[WebhookAlerter("https://hooks.slack.com/services/...")],
)

verdict = monitor.observe(
    {
        "eval_auc": auc,
        "psi_train": psi(reference_features, batch_features),
        "pred_shift": mean_shift(reference_preds, current_preds),
        "weight_delta": weight_delta_norm(prev_weights, new_weights),
    },
    cycle_id=run_id,
)

if verdict.unstable:
    block_deploy()   # verdict.top_contributors says which observable moved

The monitor is stateless across processes — everything (baseline, detector state, previous V) checkpoints into the store, so a fresh instance per DAG run behaves identically to a long-lived one (this is tested).

Airflow gate

from lyapmon.integrations.airflow import lyapunov_gate_callable
from airflow.operators.python import PythonOperator

gate = PythonOperator(
    task_id="lyapunov_gate",
    python_callable=lyapunov_gate_callable,
    op_kwargs=dict(
        features=["eval_auc", "psi_train", "pred_shift", "weight_delta"],
        history_path="/shared/lyapmon/history.jsonl",
        xcom_task_id="evaluate",        # evaluate task pushes the metrics dict
    ),
)
ingest >> train >> evaluate >> gate >> deploy

On sustained positive drift the gate raises LoopUnstableError: the deploy never runs, the DAG run is red, your existing on-call alerting takes it from there. After remediation, monitor.rebaseline() (or delete the checkpoint) re-commissions the loop with a fresh warmup.

MLflow

from lyapmon.integrations.mlflow import log_verdict, states_from_experiment

log_verdict(verdict)                 # lyapmon.V / .delta_V / .drift next to your run metrics

# Backfill a monitor over an existing retraining history:
for run_id, metrics in states_from_experiment("churn-retrain", FEATURES):
    monitor.observe(metrics, cycle_id=run_id)

Prometheus / Grafana

from lyapmon.integrations.prometheus import write_textfile
write_textfile(verdict, "/var/lib/node_exporter/lyapmon.prom", {"pipeline": "churn"})

Alert on lyapmon_status >= 3; graph lyapmon_drift against lyapmon_drift_threshold for the money chart.

Shell / BashOperator

lyapmon check --history /shared/history.jsonl \
  --features eval_auc,psi_train --metrics '{"eval_auc":0.91,"psi_train":0.04}' \
  --fail-on-unstable
lyapmon report --history /shared/history.jsonl

How it works

State vector. You name the observables; helpers (psi, ks_distance, mean_shift, rate_shift, weight_delta_norm) compute the standard ones from raw arrays. Everything is sample-only — no oracle access to truth.
Lyapunov candidate. Default is a diagonal Mahalanobis distance to a baseline fitted on the warmup window: V(x) = Σᵢ ((xᵢ − x*ᵢ)/σᵢ)² — positive definite around the commissioned-good state, unitless across mixed-scale features. A full quadratic form (QuadraticV) or any callable (CallableV, e.g. a learned/certified candidate) drops in unchanged.
Drift test. The conditional drift E[ΔV|x] is estimated by an EWMA of the increments; the alert threshold is calibrated from warmup noise (z · σ_ΔV · √(λ/(2−λ))) and must be breached consecutive cycles. A one-sided Page-Hinkley accumulator runs alongside to catch slow drift that hides under the EWMA threshold. Either detector ⇒ UNSTABLE.
Verdict. STABLE / WARNING / UNSTABLE plus the numbers and the top contributors to V (which observable is pushing the loop out).

The theory anchor is the Foster–Lyapunov drift criterion: negative expected one-step drift of a positive-definite V outside a small set implies stochastic stability. lyapmon monitors the empirical contrapositive — when the drift estimate turns and stays positive, the contraction evidence is gone, so the autonomy should be too. It is an early-warning instrument, not a certificate; for the certificate-side story (CEGIS-learned, dReal-verified candidates) see the companion project lyacert.

Demo

lyapmon simulate --feedback-gain 0.3            # below critical gain: stable forever
lyapmon simulate --feedback-gain 0.65           # slow-burn divergence, alarm + lead time
lyapmon simulate --feedback-gain 0.65 --plot demo.png

The simulated loop retrains on data partially generated under its own influence (exposure bias with amplification κ); the closed-loop pole is 1 − lr + lr·g·κ, so instability is a knob, not an anecdote — critical gain g* = 1/κ exactly. See demo/DEMO.md for the full Airflow + MLflow conference demo and talk track.

Development

uv venv .venv && uv pip install -e '.[dev,plot]'
.venv/bin/pytest
.venv/bin/ruff check src tests

See CONTRIBUTING.md for what makes a good PR (new state helpers, new orchestrator gates, detector invariants).

Citing

If you use lyapmon in your work, please cite it (CITATION.cff):

@software{lyapmon,
  author  = {Nguyen, Thuy},
  title   = {lyapmon: online Lyapunov-drift monitoring for ML retraining loops},
  url     = {https://github.com/sophie-nguyenthuthuy/lyapmon},
  version = {0.1.0},
  year    = {2026},
  license = {Apache-2.0},
}

License

Apache-2.0.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sophie.nguyenthuthuy

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lyapmon-0.1.0.tar.gz (137.3 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lyapmon-0.1.0-py3-none-any.whl (27.7 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file lyapmon-0.1.0.tar.gz.

File metadata

Download URL: lyapmon-0.1.0.tar.gz
Upload date: Jun 10, 2026
Size: 137.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lyapmon-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`72bb9c1795f36b4912efde32d0b32aced25ec03ae82b7ba9b5121e7686388cba`
MD5	`4fca95d5b55582c5a2e83de7e890b80c`
BLAKE2b-256	`b6214ac0dc4cb0bcc7e1594b2cd2fd11049c163c2dca8f4a45dffdea2e327a9f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lyapmon-0.1.0.tar.gz:

Publisher: release.yml on sophie-nguyenthuthuy/lyapmon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lyapmon-0.1.0.tar.gz
- Subject digest: 72bb9c1795f36b4912efde32d0b32aced25ec03ae82b7ba9b5121e7686388cba
- Sigstore transparency entry: 1779372381
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: sophie-nguyenthuthuy/lyapmon@e30726acde6ee5638a0ab1bda0ac5523cd53b44f
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/sophie-nguyenthuthuy
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e30726acde6ee5638a0ab1bda0ac5523cd53b44f
- Trigger Event: push

File details

Details for the file lyapmon-0.1.0-py3-none-any.whl.

File metadata

Download URL: lyapmon-0.1.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 27.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lyapmon-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e6aa54c0a818f1546e51e2bb6e05bb9711c616c28a8a99d861b0f73423ab8112`
MD5	`418d15487537cb87eb2d5152c627a41e`
BLAKE2b-256	`d06c2190b7c22422d1002c55edbf64b8da3aad8eba462478e7e64f1ab09caa0a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lyapmon-0.1.0-py3-none-any.whl:

Publisher: release.yml on sophie-nguyenthuthuy/lyapmon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lyapmon-0.1.0-py3-none-any.whl
- Subject digest: e6aa54c0a818f1546e51e2bb6e05bb9711c616c28a8a99d861b0f73423ab8112
- Sigstore transparency entry: 1779372525
- Sigstore integration time: Jun 10, 2026
Source repository:
- Permalink: sophie-nguyenthuthuy/lyapmon@e30726acde6ee5638a0ab1bda0ac5523cd53b44f
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/sophie-nguyenthuthuy
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e30726acde6ee5638a0ab1bda0ac5523cd53b44f
- Trigger Event: push

lyapmon 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

lyapmon

Why drift on V, not a threshold on eval loss?

Install

Quickstart

Airflow gate

MLflow

Prometheus / Grafana

Shell / BashOperator

How it works

Demo

Development

Citing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance