Perturbation Matching Hypothesis (PMH): estimate Sigma_task, matched PMH training, falsification controls — PyTorch, sklearn, HF.

These details have not been verified by PyPI

Project links

Project description

matching-pmh

Train on site A. Deploy on site B. Same labels.

Deploy QA gate: When your model fits training data but breaks on deploy — same task, same labels, different site, camera, or corpus — PMH estimates how representations should move at deploy, trains a shift-matched penalty, and tells you ship or do not ship only after matched beats wrong-direction and generic controls on deploy holdout. Start here →

This repository ships the Perturbation Matching Hypothesis (PMH) — a geometric theory of how training losses should respond to label-preserving deployment change. The paper (main.pdf) argues that domain shift, sensor noise, augmentation stress, compositional drift, temporal drift, style, and classical anisotropic penalties are one statistical problem: estimate the deployment nuisance covariance $\Sigma_{\text{task}}$, then train so the encoder Jacobian is matched to that geometry. CORAL, adversarial training, augmentation, metric learning, and alignment constraints become different estimators of the same object, not unrelated “robustness tricks.”

matching-pmh is the library + thirteen worked demos that implement the paper’s five-step recipe on real stacks (PyTorch, sklearn, Hugging Face). You are not picking a regularizer off a menu; you are identifying $\Sigma_{\text{task}}$ for your deploy shift and applying the matched PMH loss from Eq. (4) in the paper.

The idea in plain language

What moves at deploy without changing the label?
Examples: new camera or hospital (vision), new microphone (speech), new writing style (LLM), known lighting aug (depth), PGD-like perturbations (security). All of these are instances of a single random displacement $n$ with covariance $\Sigma_{\text{task}} = \mathrm{Cov}(n)$.

What training should do.
Add a PMH term that penalizes encoder Jacobian energy along a matrix $\Sigma'$ whose column space covers the nuisance range. When $\Sigma'$ is matched to $\Sigma_{\text{task}}$, deployment drift in representations can be driven down; when $\Sigma'$ is isotropic or wrong, the theory predicts specific failure modes — and the library runs those arms as controls, not optional extras.

What makes this a theory, not a hack.
The paper proves range coverage is necessary for quadratic Jacobian penalties, gives matched sufficiency in the linear model, extends to deep global minima under stated assumptions, and supplies falsification lemmas (wrong subspace, signal-aligned penalty) tested before you trust a deploy gain. See main.pdf §2–5 and block findings.

The five-step recipe (product spine)

Same steps in every notebook (§1–§8) and in pmh.recipe:

Step	Question you answer	Library entry points
0 — Scope	Same label semantics on train (A) and deploy (B)?	`check_applicability()`
1 — Identify	Which nuisance family fits? (seven types, D1–D7)	`suggest_nuisance()` · task table below
2 — Estimate	$\hat{\Sigma}_{\text{task}}$ from your data	`PMHTrainer.estimate()` · `PMHMatcher.fit()` · `estimate_style_sigma()`
3 — Apply	Matched PMH on hook $h$ (train) or projection (frozen features)	`PMHTrainer.fit` · `robust_fit` · `PMHLoss`
4 — Protocol	Keep PMH at 5--30% of task loss (hard cap)	`PMHConfig.golden_path()` · LOSS_SCALING
5 — Evidence	Matched beats wrong-direction and isotropic on deploy holdout	`evaluate_robust_fit` · `evaluate_baseline_vs_pmh`

Scope → identify nuisance family → estimate $\Sigma_{\text{task}}$ → matched PMH train → falsify on deploy holdout.

Details: Quickstart · Will PMH help? · API

What this repo promises

We provide	We do not claim
A closed, falsifiable training recipe once $\Sigma_{\text{task}}$ is identified	Universality on every leaderboard
13 pre-registered blocks (T1–T7) as copy-paste playbooks	That matched PMH always beats CORAL, DANN, or PGD-AT
Built-in matched / wrong / isotropic arms (Lemma C, Cor. E in the paper)	PMH on label-changing shifts (e.g. spurious correlation)
Theory-aligned estimators D1–D7 + geometry probes (`tdi`, …)	One demo preset replaces your domain data or reproduces every paper table row without tuning

Honest boundaries (from the paper): Colored MNIST / Waterbirds-style label-correlated nuisance is out of scope; Office-31 is a documented case where estimator eigengap can fail (Lemma D1), not a silent bug.

Pre-registered evidence: 12/13 paper blocks pass their criteria in main.pdf (see findings.html); Office-31 is the predicted D1 failure when the cross-domain subspace is ill-conditioned — run Step 5 before shipping.

Paper numbers vs this library

Block accuracies, mIoU gains, and other reported figures in the README, task pages, and main.pdf tables are paper results — full benchmarks, datasets, and schedules described in the PDF.

The pmh library on PyPI is for general use on your stack: same estimators and five-step recipe, but different demo loaders, defaults, and integration paths. It will not automatically replicate those paper numbers out of the box. Expect iteration on your side — hook choice, rank, PMHConfig / loss scale (LOSS_SCALING), more target data, and Step 5 on your deploy holdout — before you treat a run as “correct.” Notebooks under notebooks/tasks/ teach the workflow on built-in demos.

Short theory spine (no PDF required to start): docs/PRINCIPLE.md.
Synthesized block outcomes (HTML): docs/findings.html — regenerate with python scripts/build_findings_html.py.

Choose your depth

You want…	Open
Plain-language principle + five steps	docs/PRINCIPLE.md
“Will this help my deploy shift?”	docs/WHEN_PMH_HELPS.md
Copy-paste task for your nuisance	docs/tasks/index.md → notebook §8
Sklearn / frozen embeddings (T1)	t01-classical · `compare_arms_sklearn`
PyTorch site/camera (T4)	t04a-vision-domain · `PMHTrainer` (class-aligned D4)
Per-layer domain Gram (T4B)	t04b-multilayer-vision · `PMHTrainer(train_mode="feature_diff")`
Full proofs + block numbers	`main.pdf` · findings.html
Matched / wrong / isotropic benchmark	`run_benchmark_protocol` · `compare_arms`

Find your deployment story (T1 through T7)

Tasks are examples of the same principle — pick the closest deploy change, open the page + notebook, Run All on demo data, then plug in your pipeline in §8. Order follows the paper blocks (T1 first).

Task	What changes at deploy (labels fixed)	Real situations like yours	How $\hat{\Sigma}_{\text{task}}$ is built	`nuisance=`	Start
T1	Embedding cloud shifts between sites	Office-31; two labs’ tabular features; frozen ResNet vectors	Cross-domain subspace on features (D1)	`subspace`	T1
T2A	Undirected input corruption (no fixed direction)	ImageNet-C; sensor noise; blur/JPEG	Isotropic $\sigma^2 I$ (D2)	`isotropic`	T2A
T2B	Scanner / site appearance on X-ray	Hospital drift on CheXpert-style data	Isotropic $\sigma$ (D2)	`isotropic`	T2B
T3A	Camera / lighting; same keypoint semantics	Studio→wild pose; broadcast→fan video	Augmentation-induced deltas (D3)	`augmentation`	T3A
T3B	Photometry; depth meaning unchanged	Lighting on depth; synthetic→real RGB-D	Augmentation deltas (D3)	`augmentation`	T3B
T4A	New visual domain; same classes	Photo→sketch; warehouse A→B; country shift	Source−target feature Gram (D4)	`domain_shift`	T4A
T4B	Sim→real texture + layout; same seg map	GTA5→Cityscapes; synthetic seg→real	Domain Gram per layer (D4; paper multiscale)	`domain_shift`	T4B
T5A	3D atom coordinates move; property fixed	QM9 conformers; pose grids	Compositional blocks (D5)	`compositional`	T5A
T5B	Token groups change; code label fixed	Renames; comment stripping	Nuisance indices on tokens (D5)	`compositional`	T5B
T6A	Channel / room / codec; same transcript	New mic; Libri conditions	Temporal / content-residual (D6)	`temporal`	T6A
T6B	Sensor drift over time	HAR placement; IMU aging	Temporal residual (D6)	`temporal`	T6B
T7A	Surface form; facts unchanged	Bulleted vs prose; tone shift in LLMs	Style pairs → Gram (D7)	`style`	T7A
T7B	Adversarial directions at deploy	PGD stress; spoof patches	PGD delta subspace (D7)	`style` / PGD doc	T7B

Full index: 13 tasks · notebooks

pmh-train route --list

Seven nuisance types (one object, seven estimators)

Type	$\Sigma_{\text{task}}$ is…	Data you typically need
D1 subspace	Low-rank cross-domain difference	Labeled source + target features
D2 isotropic	Spherical noise level	Train distribution (+ noise level if known)
D3 augmentation	Span of aug-induced feature moves	Train + known augmentations
D4 domain	Gram of class-aligned source−target diffs (labels optional)	Train + deploy batches (labeled pairs preferred)
D5 compositional	Covariance on named coordinates	Train + which dims are nuisance
D6 temporal	Drift along time / sequence	Trajectories, sensor series
D7 style	Style / attack direction covariance	Same-content pairs or PGD deltas

If two rows sound similar, start with T1 (frozen vectors) or T4A (end-to-end vision). Your benchmark name does not matter — the nuisance law does.

Adapt any similar pipeline

Match deploy change to a row above (not the paper ID).
Open that task’s notebook — sections 1–8 always follow the five-step recipe.
Replace demo loaders with your data; keep the same nuisance= and estimate call.
Run Step 5 on deploy holdout; ship only if matched beats wrong-direction and generic isotropic (see WHEN_PMH_HELPS).

The demos in scripts/demos/ and notebooks/tasks/ exist to show the same ordering the theory predicts (matched → isotropic → wrong on geometry and drift metrics), not to define thirteen separate products.

Start here

Practitioners: docs/START.md — one function, one ship verdict (no paper, auto shift type).

pip install matching-pmh torch
pip install "matching-pmh[sklearn]"   # frozen-feature path
pmh-train try --quick                  # ~1 min: train + deploy report + SHIP / DO NOT SHIP

from pmh import try_pmh
from pmh.pytorch_eval import pytorch_demo_loaders

bundle = pytorch_demo_loaders(n=400, seed=0)
report = try_pmh(
    bundle.model, bundle.train_loader, bundle.val_loader,
    source_batches=bundle.source_batches, target_batches=bundle.target_batches,
    hook=bundle.encoder, head=bundle.head, epochs=5,
)
print(report.deploy_summary())
print(report.ship_verdict())  # auto nuisance= — you do not pick D1–D7 first

pmh-train doctor
pmh-train evaluate --demo --stack pytorch
pmh-train try --stack multilayer --quick   # T4B RGB CNN feature-diff demo

Path	Notebook	When
T1 classical / frozen features	t01-classical.ipynb · Colab	sklearn, embeddings
T4A vision domain	t04a-vision-domain.ipynb · Colab	PyTorch site/camera
T4B multilayer vision	t04b-multilayer-vision.ipynb · Colab	Per-layer feature-diff PMH

Read the theory: main.pdf · Block summary: findings.html

Documentation map

Doc	Role
`main.pdf`	Full theory, theorems, thirteen blocks
docs/START.md	Golden path — `try_pmh`, auto shift type, ship verdict
docs/MIGRATE.md	CORAL, sklearn, HF, augmentation
docs/LOSS_SCALING.md	PMH vs task loss (5--30%, enforced cap)
docs/GLOSSARY.md	Plain language ↔ code
docs/PRINCIPLE.md	Short PMH spine ($\Sigma_{\text{task}}$, five steps, library vs paper)
docs/index.md	Site hub
docs/cookbook/	Lightning + HF integration sketches
QUICKSTART.md	Install + commands
tasks/index.md	All tasks T1–T7 + deploy table
WHEN_PMH_HELPS.md	Fit, misfit, controls
api/index.md	`PMHTrainer`, presets, evaluate

Links

PyPI · Documentation site · Contributing

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

May 21, 2026

1.5.3

May 20, 2026

1.5.1

May 20, 2026

1.5.0

May 20, 2026

1.4.1

May 19, 2026

1.3.0

May 19, 2026

1.2.0

May 19, 2026

0.8.0

May 19, 2026

0.7.2

May 19, 2026

0.7.1

May 19, 2026

0.7.0

May 19, 2026

0.6.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_pmh-2.0.0.tar.gz (3.7 MB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

matching_pmh-2.0.0-py3-none-any.whl (163.2 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file matching_pmh-2.0.0.tar.gz.

File metadata

Download URL: matching_pmh-2.0.0.tar.gz
Upload date: May 21, 2026
Size: 3.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for matching_pmh-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c42b237ca7e82b3d8b0ed19c0047d95c1bf74b9605765812a59559ec50021b76`
MD5	`3ba5f31566910b50a3488d4f77c33d51`
BLAKE2b-256	`256221ad2040134adbaedc193e61357b2fc2e91fb413e3a23f46b66fd41d838d`

See more details on using hashes here.

File details

Details for the file matching_pmh-2.0.0-py3-none-any.whl.

File metadata

Download URL: matching_pmh-2.0.0-py3-none-any.whl
Upload date: May 21, 2026
Size: 163.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for matching_pmh-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a16bf0da5fe6fa4781afcd9e8cc6cdebde8e7b189cc48d23ccf29b7886e006a`
MD5	`07709388f055fd800f9dff7faebc95c6`
BLAKE2b-256	`5c3ac8be7c631c9673769bf604d0388e1a9d9b70ef3849541cea269e0201fb74`

See more details on using hashes here.

matching-pmh 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

matching-pmh

The idea in plain language

The five-step recipe (product spine)

What this repo promises

Paper numbers vs this library

Choose your depth

Find your deployment story (T1 through T7)

Seven nuisance types (one object, seven estimators)

Adapt any similar pipeline

Start here

Documentation map

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes