Quantify drift between two embedding spaces over the same corpus.

These details have not been verified by PyPI

Project links

Project description

Embedding Drift Monitor (EDM)

Quantify how much an embedding space changes when the underlying model is updated, swapped, or retrained. Given the same corpus embedded under two models (or two versions of the same model), EDM computes a six-metric battery and produces an opinionated report identifying where the space shifted and how severely.

Installation

pip install embedding-drift-monitor

Dev install from source:

git clone https://github.com/Datasculptures/embedding-drift-monitor
cd embedding-drift-monitor
pip install -e .[dev]

Requires Python 3.11+ and a C compiler for the HDBSCAN dependency. On Windows, install Microsoft C++ Build Tools first.

Quickstart

edm compare reference.npy candidate.npy

That's it. EDM loads both matrices, runs all six metrics, classifies severity, and prints a report. The matrices must have the same number of rows (one row per corpus item) and can have different column counts.

For a fast single-metric check:

edm quick reference.npy candidate.npy

CLI Reference

`edm compare`

Full drift analysis. Runs all six metrics and produces a report.

edm compare EMBEDDINGS_A EMBEDDINGS_B [OPTIONS]

Option	Default	Description
`-k`, `--k-values`	`5,10,25,50`	Comma-separated k values for neighbourhood analysis
`-f`, `--format`	`text`	Output format: `text`, `json`, or `markdown`
`-o`, `--output`	stdout	Write output to this file
`-q`, `--quiet`	off	Suppress progress messages on stderr
`--force`	off	Overwrite existing output file
`-l`, `--labels`	—	Labels file (one label per line, corpus row order)
`-m`, `--metadata`	—	Metadata file (`.csv` or `.json`) with per-item attributes
`--include-per-point`	off	Embed per-point arrays in JSON output (required for `identify-regions`)
`--exclude-nan`	off	Drop rows containing NaN or Inf before analysis
`--sample-size`	`5000`	Max pairs sampled for distance/geometry metrics
`--seed`	`0`	RNG seed for deterministic sampling
`--config`	—	TOML file with custom severity thresholds
`--no-distance`	off	Skip the KS distance-distribution metric
`--no-geometry`	off	Skip the Mantel global-geometry metric
`--no-clusters`	off	Skip the HDBSCAN cluster-stability metric
`--no-hubs`	off	Skip the hubness N_k shift metric
`--min-cluster-size`	`5`	Minimum cluster size for HDBSCAN

Examples:

# Text report to stdout
edm compare ref.npy cand.npy

# JSON output with per-point arrays, saved to file
edm compare ref.npy cand.npy --format json --include-per-point -o results.json

# Markdown report with region breakdown
edm compare ref.npy cand.npy --labels labels.txt --format markdown -o report.md

# Skip slow metrics on a large corpus
edm compare ref.npy cand.npy --no-clusters --no-geometry -k 5,10

# Exclude rows with NaN/Inf before analysis
edm compare ref.npy cand.npy --exclude-nan

`edm quick`

Fast single-metric check: Jaccard stability at k=10 only. No distance, geometry, cluster, or hubness metrics.

edm quick EMBEDDINGS_A EMBEDDINGS_B

No options. Prints one line:

Neighbourhood stability (k=10): 0.7234 [MODERATE DRIFT]

`edm report`

Convert a saved JSON results file to text or markdown.

edm report JSON_RESULTS [OPTIONS]

Option	Default	Description
`-f`, `--format`	`markdown`	`text` or `markdown`
`-o`, `--output`	stdout	Write to this file
`--force`	off	Overwrite existing output file

edm report results.json --format markdown -o report.md

`edm identify-regions`

Apply a labels file to saved JSON results (which must have been produced with --include-per-point) to compute a per-region drift breakdown.

edm identify-regions JSON_RESULTS -l LABELS_FILE [OPTIONS]

Option	Default	Description
`-l`, `--labels`	required	Labels file — one label per line
`-f`, `--format`	`text`	`text`, `json`, or `markdown`
`-o`, `--output`	stdout	Write to this file
`--force`	off	Overwrite existing output file

edm identify-regions results.json -l labels.txt --format markdown

Input Formats

Embedding matrices

Accepted formats: .npy, .npz, .csv, .tsv

.npy: NumPy binary format. Fastest. No pickle — allow_pickle=False.
.npz: NumPy compressed archive with exactly one array.
.csv / .tsv: Numeric data, optional header row. 10 MB cap.

Both matrices must have the same number of rows. Column counts may differ (different model dimensions are valid). File size cap: 50 MB per embedding file.

Labels file

Plain text, one label per line, UTF-8 encoding. Blank lines are not allowed. The label count must match the number of embedding rows exactly. Leading/trailing whitespace is stripped. Both LF and CRLF line endings are accepted.

electronics
electronics
books
clothing
books

Metadata file

CSV or JSON with one record per corpus item.

CSV: First row is a header. Subsequent rows are per-item values.

JSON: Top-level array of objects, one per item.

Numeric columns are stored as float64; columns that cannot be parsed as floats are stored as string arrays.

Metric Battery

#	Metric	What it measures
1	Neighbourhood stability (Jaccard)	What fraction of each point's k nearest neighbours are the same in both spaces? A score of 1.0 means the local structure is perfectly preserved. A score near 0 means most neighbours changed.
2	Neighbourhood rank correlation (Spearman)	Among the common neighbours, do the distance rankings agree? Measures whether relative proximity is preserved, not just set membership.
3	Distance distribution shift (KS statistic)	Do pairwise distances follow the same statistical distribution in both spaces? A high KS statistic means the global scale has shifted.
4	Global geometry (Mantel correlation)	Is the pairwise distance matrix similar in both spaces? Measures overall geometric correspondence. A high Mantel r means items that were far apart remain far apart.
5	Cluster membership stability (HDBSCAN + ARI)	Do the natural clusters found in one space correspond to those in the other? Adjusted Rand Index — 1.0 is a perfect match, 0.0 is random.
6	Hubness shift (N_k Spearman)	Do the same items dominate as hub nodes (appearing frequently as a nearest neighbour)? A high correlation means hub structure is preserved.

All six metrics run by default. Skip expensive ones with --no-clusters, --no-geometry, etc.

Severity Interpretation

Each metric is classified as LOW, MODERATE, HIGH, or CRITICAL based on configurable thresholds. The overall severity is the worst individual metric.

Severity	Meaning	Recommended action
LOW	Minimal drift	Safe to deploy
MODERATE	Notable drift	Review before deploying
HIGH	Significant drift	Full review required
CRITICAL	Severe drift	Do not deploy without investigation

Default thresholds (Jaccard stability, higher is better):

Threshold	Severity
>= 0.80	LOW
>= 0.50	MODERATE
>= 0.20	HIGH
< 0.20	CRITICAL

Configuration

Override thresholds via a TOML file named .edm.toml:

[thresholds.jaccard]
low = 0.85
moderate = 0.60
high = 0.30
higher_is_better = true

[thresholds.ks_statistic]
low = 0.05
moderate = 0.15
high = 0.30
higher_is_better = false

Pass it to any compare run:

edm compare ref.npy cand.npy --config .edm.toml

Output Formats

Text (default)

Human-readable. Sections separated by dividers. Suitable for terminal output and log files.

JSON

Machine-readable. Contains all metrics, per-metric severities, overall severity, and the recommendation. Use --include-per-point to add per-point score arrays (required for identify-regions).

JSON schema version: 4.0. Top-level keys: version, reference, candidate, corpus, overall_severity, metric_severities, metrics, regions, recommendation, warnings.

NaN values are serialized as null.

Markdown

Full structured report suitable for GitHub, Notion, or documentation systems. Includes a metadata table, one section per metric, a drift regions table (if labels were provided), and a recommendation.

Performance Notes

Corpus size	Dimensions	Approx. time (all metrics)
1K items	768d	< 5s
10K items	768d	< 30s
100K items	768d	< 5 min (with default --sample-size)

HDBSCAN cluster analysis is O(n^2) in memory. For corpora larger than ~23K items, EDM will warn that it may require > 4 GB RAM. Use --no-clusters to skip it.

Increasing --sample-size beyond 10,000 has diminishing returns for statistical accuracy at significant cost. The default of 5,000 is appropriate for most corpora.

Portfolio Context

EDM is the third tool in the datasculptures embedding space trilogy:

LLE — What structures exist in this embedding space? (Exploration)
RQB — When I reduce this space, how much structure is preserved? (Evaluation)
EDM — When the underlying model changes, how much structure shifts? (Monitoring)

datasculptures.com

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

May 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding_drift_monitor-1.0.0.tar.gz (70.8 kB view details)

Uploaded May 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

embedding_drift_monitor-1.0.0-py3-none-any.whl (43.6 kB view details)

Uploaded May 7, 2026 Python 3

File details

Details for the file embedding_drift_monitor-1.0.0.tar.gz.

File metadata

Download URL: embedding_drift_monitor-1.0.0.tar.gz
Upload date: May 7, 2026
Size: 70.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for embedding_drift_monitor-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`48761669894c3254a77cfba9000ed985e9f7d23d03b72707a72c7ee9123edf25`
MD5	`d180dda1610a9050dee5705092200e86`
BLAKE2b-256	`df7487d043038fce8cdcd5fb3f60c9d074283298323b2da8335de108da33e874`

See more details on using hashes here.

File details

Details for the file embedding_drift_monitor-1.0.0-py3-none-any.whl.

File metadata

Download URL: embedding_drift_monitor-1.0.0-py3-none-any.whl
Upload date: May 7, 2026
Size: 43.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for embedding_drift_monitor-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`28da55d5e4f1f3977ca7b48ff2696ab8ee01eabfb26a0275d8c601a460385869`
MD5	`f4b174bd317b93aeb482bb9620ccafbf`
BLAKE2b-256	`4155b33658c176e9b6fd78dbedcbd591e42df8a2e994ec6c1083fd4801cfe8ac`

See more details on using hashes here.

embedding-drift-monitor 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Embedding Drift Monitor (EDM)

Installation

Quickstart

CLI Reference

edm compare

edm quick

edm report

edm identify-regions

Input Formats

Embedding matrices

Labels file

Metadata file

Metric Battery

Severity Interpretation

Configuration

Output Formats

Text (default)

JSON

Markdown

Performance Notes

Portfolio Context

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`edm compare`

`edm quick`

`edm report`

`edm identify-regions`