Quantify drift between two embedding spaces over the same corpus.
Project description
Embedding Drift Monitor (EDM)
Quantify how much an embedding space changes when the underlying model is updated, swapped, or retrained. Given the same corpus embedded under two models (or two versions of the same model), EDM computes a six-metric battery and produces an opinionated report identifying where the space shifted and how severely.
Installation
pip install embedding-drift-monitor
Dev install from source:
git clone https://github.com/Datasculptures/embedding-drift-monitor
cd embedding-drift-monitor
pip install -e .[dev]
Requires Python 3.11+ and a C compiler for the HDBSCAN dependency. On Windows, install Microsoft C++ Build Tools first.
Quickstart
edm compare reference.npy candidate.npy
That's it. EDM loads both matrices, runs all six metrics, classifies severity, and prints a report. The matrices must have the same number of rows (one row per corpus item) and can have different column counts.
For a fast single-metric check:
edm quick reference.npy candidate.npy
CLI Reference
edm compare
Full drift analysis. Runs all six metrics and produces a report.
edm compare EMBEDDINGS_A EMBEDDINGS_B [OPTIONS]
| Option | Default | Description |
|---|---|---|
-k, --k-values |
5,10,25,50 |
Comma-separated k values for neighbourhood analysis |
-f, --format |
text |
Output format: text, json, or markdown |
-o, --output |
stdout | Write output to this file |
-q, --quiet |
off | Suppress progress messages on stderr |
--force |
off | Overwrite existing output file |
-l, --labels |
— | Labels file (one label per line, corpus row order) |
-m, --metadata |
— | Metadata file (.csv or .json) with per-item attributes |
--include-per-point |
off | Embed per-point arrays in JSON output (required for identify-regions) |
--exclude-nan |
off | Drop rows containing NaN or Inf before analysis |
--sample-size |
5000 |
Max pairs sampled for distance/geometry metrics |
--seed |
0 |
RNG seed for deterministic sampling |
--config |
— | TOML file with custom severity thresholds |
--no-distance |
off | Skip the KS distance-distribution metric |
--no-geometry |
off | Skip the Mantel global-geometry metric |
--no-clusters |
off | Skip the HDBSCAN cluster-stability metric |
--no-hubs |
off | Skip the hubness N_k shift metric |
--min-cluster-size |
5 |
Minimum cluster size for HDBSCAN |
Examples:
# Text report to stdout
edm compare ref.npy cand.npy
# JSON output with per-point arrays, saved to file
edm compare ref.npy cand.npy --format json --include-per-point -o results.json
# Markdown report with region breakdown
edm compare ref.npy cand.npy --labels labels.txt --format markdown -o report.md
# Skip slow metrics on a large corpus
edm compare ref.npy cand.npy --no-clusters --no-geometry -k 5,10
# Exclude rows with NaN/Inf before analysis
edm compare ref.npy cand.npy --exclude-nan
edm quick
Fast single-metric check: Jaccard stability at k=10 only. No distance, geometry, cluster, or hubness metrics.
edm quick EMBEDDINGS_A EMBEDDINGS_B
No options. Prints one line:
Neighbourhood stability (k=10): 0.7234 [MODERATE DRIFT]
edm report
Convert a saved JSON results file to text or markdown.
edm report JSON_RESULTS [OPTIONS]
| Option | Default | Description |
|---|---|---|
-f, --format |
markdown |
text or markdown |
-o, --output |
stdout | Write to this file |
--force |
off | Overwrite existing output file |
edm report results.json --format markdown -o report.md
edm identify-regions
Apply a labels file to saved JSON results (which must have been produced with --include-per-point) to compute a per-region drift breakdown.
edm identify-regions JSON_RESULTS -l LABELS_FILE [OPTIONS]
| Option | Default | Description |
|---|---|---|
-l, --labels |
required | Labels file — one label per line |
-f, --format |
text |
text, json, or markdown |
-o, --output |
stdout | Write to this file |
--force |
off | Overwrite existing output file |
edm identify-regions results.json -l labels.txt --format markdown
Input Formats
Embedding matrices
Accepted formats: .npy, .npz, .csv, .tsv
.npy: NumPy binary format. Fastest. No pickle —allow_pickle=False..npz: NumPy compressed archive with exactly one array..csv/.tsv: Numeric data, optional header row. 10 MB cap.
Both matrices must have the same number of rows. Column counts may differ (different model dimensions are valid). File size cap: 50 MB per embedding file.
Labels file
Plain text, one label per line, UTF-8 encoding. Blank lines are not allowed. The label count must match the number of embedding rows exactly. Leading/trailing whitespace is stripped. Both LF and CRLF line endings are accepted.
electronics
electronics
books
clothing
books
Metadata file
CSV or JSON with one record per corpus item.
CSV: First row is a header. Subsequent rows are per-item values.
JSON: Top-level array of objects, one per item.
Numeric columns are stored as float64; columns that cannot be parsed as floats are stored as string arrays.
Metric Battery
| # | Metric | What it measures |
|---|---|---|
| 1 | Neighbourhood stability (Jaccard) | What fraction of each point's k nearest neighbours are the same in both spaces? A score of 1.0 means the local structure is perfectly preserved. A score near 0 means most neighbours changed. |
| 2 | Neighbourhood rank correlation (Spearman) | Among the common neighbours, do the distance rankings agree? Measures whether relative proximity is preserved, not just set membership. |
| 3 | Distance distribution shift (KS statistic) | Do pairwise distances follow the same statistical distribution in both spaces? A high KS statistic means the global scale has shifted. |
| 4 | Global geometry (Mantel correlation) | Is the pairwise distance matrix similar in both spaces? Measures overall geometric correspondence. A high Mantel r means items that were far apart remain far apart. |
| 5 | Cluster membership stability (HDBSCAN + ARI) | Do the natural clusters found in one space correspond to those in the other? Adjusted Rand Index — 1.0 is a perfect match, 0.0 is random. |
| 6 | Hubness shift (N_k Spearman) | Do the same items dominate as hub nodes (appearing frequently as a nearest neighbour)? A high correlation means hub structure is preserved. |
All six metrics run by default. Skip expensive ones with --no-clusters, --no-geometry, etc.
Severity Interpretation
Each metric is classified as LOW, MODERATE, HIGH, or CRITICAL based on configurable thresholds. The overall severity is the worst individual metric.
| Severity | Meaning | Recommended action |
|---|---|---|
| LOW | Minimal drift | Safe to deploy |
| MODERATE | Notable drift | Review before deploying |
| HIGH | Significant drift | Full review required |
| CRITICAL | Severe drift | Do not deploy without investigation |
Default thresholds (Jaccard stability, higher is better):
| Threshold | Severity |
|---|---|
| >= 0.80 | LOW |
| >= 0.50 | MODERATE |
| >= 0.20 | HIGH |
| < 0.20 | CRITICAL |
Configuration
Override thresholds via a TOML file named .edm.toml:
[thresholds.jaccard]
low = 0.85
moderate = 0.60
high = 0.30
higher_is_better = true
[thresholds.ks_statistic]
low = 0.05
moderate = 0.15
high = 0.30
higher_is_better = false
Pass it to any compare run:
edm compare ref.npy cand.npy --config .edm.toml
Output Formats
Text (default)
Human-readable. Sections separated by dividers. Suitable for terminal output and log files.
JSON
Machine-readable. Contains all metrics, per-metric severities, overall severity, and the recommendation. Use --include-per-point to add per-point score arrays (required for identify-regions).
JSON schema version: 4.0. Top-level keys: version, reference, candidate, corpus, overall_severity, metric_severities, metrics, regions, recommendation, warnings.
NaN values are serialized as null.
Markdown
Full structured report suitable for GitHub, Notion, or documentation systems. Includes a metadata table, one section per metric, a drift regions table (if labels were provided), and a recommendation.
Performance Notes
| Corpus size | Dimensions | Approx. time (all metrics) |
|---|---|---|
| 1K items | 768d | < 5s |
| 10K items | 768d | < 30s |
| 100K items | 768d | < 5 min (with default --sample-size) |
HDBSCAN cluster analysis is O(n^2) in memory. For corpora larger than ~23K items, EDM will warn that it may require > 4 GB RAM. Use --no-clusters to skip it.
Increasing --sample-size beyond 10,000 has diminishing returns for statistical accuracy at significant cost. The default of 5,000 is appropriate for most corpora.
Portfolio Context
EDM is the third tool in the datasculptures embedding space trilogy:
- LLE — What structures exist in this embedding space? (Exploration)
- RQB — When I reduce this space, how much structure is preserved? (Evaluation)
- EDM — When the underlying model changes, how much structure shifts? (Monitoring)
License
MIT License — Copyright 2026 Sean Patrick Morris / datasculptures
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedding_drift_monitor-1.0.0.tar.gz.
File metadata
- Download URL: embedding_drift_monitor-1.0.0.tar.gz
- Upload date:
- Size: 70.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48761669894c3254a77cfba9000ed985e9f7d23d03b72707a72c7ee9123edf25
|
|
| MD5 |
d180dda1610a9050dee5705092200e86
|
|
| BLAKE2b-256 |
df7487d043038fce8cdcd5fb3f60c9d074283298323b2da8335de108da33e874
|
File details
Details for the file embedding_drift_monitor-1.0.0-py3-none-any.whl.
File metadata
- Download URL: embedding_drift_monitor-1.0.0-py3-none-any.whl
- Upload date:
- Size: 43.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28da55d5e4f1f3977ca7b48ff2696ab8ee01eabfb26a0275d8c601a460385869
|
|
| MD5 |
f4b174bd317b93aeb482bb9620ccafbf
|
|
| BLAKE2b-256 |
4155b33658c176e9b6fd78dbedcbd591e42df8a2e994ec6c1083fd4801cfe8ac
|