Perturbation Matching Hypothesis (PMH): estimate Sigma_task, matched PMH training, falsification controls — PyTorch, sklearn, HF.
Project description
matching-pmh
Train on site A. Deploy on site B. Same labels.
Deploy QA gate: When your model fits training data but breaks on deploy — same task, same labels, different site, camera, or corpus — PMH estimates how representations should move at deploy, trains a shift-matched penalty, and tells you ship or do not ship only after matched beats wrong-direction and generic controls on deploy holdout. Start here →
This repository ships the Perturbation Matching Hypothesis (PMH) — a geometric theory of how training losses should respond to label-preserving deployment change. The paper (main.pdf) argues that domain shift, sensor noise, augmentation stress, compositional drift, temporal drift, style, and classical anisotropic penalties are one statistical problem: estimate the deployment nuisance covariance $\Sigma_{\text{task}}$, then train so the encoder Jacobian is matched to that geometry. CORAL, adversarial training, augmentation, metric learning, and alignment constraints become different estimators of the same object, not unrelated “robustness tricks.”
matching-pmh is the library + thirteen worked demos that implement the paper’s five-step recipe on real stacks (PyTorch, sklearn, Hugging Face). You are not picking a regularizer off a menu; you are identifying $\Sigma_{\text{task}}$ for your deploy shift and applying the matched PMH loss from Eq. (4) in the paper.
The idea in plain language
What moves at deploy without changing the label?
Examples: new camera or hospital (vision), new microphone (speech), new writing style (LLM), known lighting aug (depth), PGD-like perturbations (security). All of these are instances of a single random displacement $n$ with covariance $\Sigma_{\text{task}} = \mathrm{Cov}(n)$.
What training should do.
Add a PMH term that penalizes encoder Jacobian energy along a matrix $\Sigma'$ whose column space covers the nuisance range. When $\Sigma'$ is matched to $\Sigma_{\text{task}}$, deployment drift in representations can be driven down; when $\Sigma'$ is isotropic or wrong, the theory predicts specific failure modes — and the library runs those arms as controls, not optional extras.
What makes this a theory, not a hack.
The paper proves range coverage is necessary for quadratic Jacobian penalties, gives matched sufficiency in the linear model, extends to deep global minima under stated assumptions, and supplies falsification lemmas (wrong subspace, signal-aligned penalty) tested before you trust a deploy gain. See main.pdf §2–5 and block findings.
The five-step recipe (product spine)
Same steps in every notebook (§1–§8) and in pmh.recipe:
| Step | Question you answer | Library entry points |
|---|---|---|
| 0 — Scope | Same label semantics on train (A) and deploy (B)? | check_applicability() |
| 1 — Identify | Which nuisance family fits? (seven types, D1–D7) | suggest_nuisance() · task table below |
| 2 — Estimate | $\hat{\Sigma}_{\text{task}}$ from your data | PMHTrainer.estimate() · PMHMatcher.fit() · estimate_style_sigma() |
| 3 — Apply | Matched PMH on hook $h$ (train) or projection (frozen features) | PMHTrainer.fit · robust_fit · PMHLoss |
| 4 — Protocol | Keep PMH at 5--30% of task loss (hard cap) | PMHConfig.golden_path() · LOSS_SCALING |
| 5 — Evidence | Matched beats wrong-direction and isotropic on deploy holdout | evaluate_robust_fit · evaluate_baseline_vs_pmh |
Scope → identify nuisance family → estimate $\Sigma_{\text{task}}$ → matched PMH train → falsify on deploy holdout.
Details: Quickstart · Will PMH help? · API
What this repo promises
| We provide | We do not claim |
|---|---|
| A closed, falsifiable training recipe once $\Sigma_{\text{task}}$ is identified | Universality on every leaderboard |
| 13 pre-registered blocks (T1–T7) as copy-paste playbooks | That matched PMH always beats CORAL, DANN, or PGD-AT |
| Built-in matched / wrong / isotropic arms (Lemma C, Cor. E in the paper) | PMH on label-changing shifts (e.g. spurious correlation) |
Theory-aligned estimators D1–D7 + geometry probes (tdi, …) |
One demo preset replaces your domain data or reproduces every paper table row without tuning |
Honest boundaries (from the paper): Colored MNIST / Waterbirds-style label-correlated nuisance is out of scope; Office-31 is a documented case where estimator eigengap can fail (Lemma D1), not a silent bug.
Pre-registered evidence: 12/13 paper blocks pass their criteria in main.pdf (see findings.html); Office-31 is the predicted D1 failure when the cross-domain subspace is ill-conditioned — run Step 5 before shipping.
Paper numbers vs this library
Block accuracies, mIoU gains, and other reported figures in the README, task pages, and main.pdf tables are paper results — full benchmarks, datasets, and schedules described in the PDF.
The pmh library on PyPI is for general use on your stack: same estimators and five-step recipe, but different demo loaders, defaults, and integration paths. It will not automatically replicate those paper numbers out of the box. Expect iteration on your side — hook choice, rank, PMHConfig / loss scale (LOSS_SCALING), more target data, and Step 5 on your deploy holdout — before you treat a run as “correct.” Notebooks under notebooks/tasks/ teach the workflow on built-in demos.
Short theory spine (no PDF required to start): docs/PRINCIPLE.md.
Synthesized block outcomes (HTML): docs/findings.html — regenerate with python scripts/build_findings_html.py.
Choose your depth
| You want… | Open |
|---|---|
| Plain-language principle + five steps | docs/PRINCIPLE.md |
| “Will this help my deploy shift?” | docs/WHEN_PMH_HELPS.md |
| Copy-paste task for your nuisance | docs/tasks/index.md → notebook §8 |
| Sklearn / frozen embeddings (T1) | t01-classical · compare_arms_sklearn |
| PyTorch site/camera (T4) | t04a-vision-domain · PMHTrainer (class-aligned D4) |
| Per-layer domain Gram (T4B) | t04b-multilayer-vision · PMHTrainer(train_mode="feature_diff") |
| Full proofs + block numbers | main.pdf · findings.html |
| Matched / wrong / isotropic benchmark | run_benchmark_protocol · compare_arms |
Find your deployment story (T1 through T7)
Tasks are examples of the same principle — pick the closest deploy change, open the page + notebook, Run All on demo data, then plug in your pipeline in §8. Order follows the paper blocks (T1 first).
| Task | What changes at deploy (labels fixed) | Real situations like yours | How $\hat{\Sigma}_{\text{task}}$ is built | nuisance= |
Start |
|---|---|---|---|---|---|
| T1 | Embedding cloud shifts between sites | Office-31; two labs’ tabular features; frozen ResNet vectors | Cross-domain subspace on features (D1) | subspace |
T1 |
| T2A | Undirected input corruption (no fixed direction) | ImageNet-C; sensor noise; blur/JPEG | Isotropic $\sigma^2 I$ (D2) | isotropic |
T2A |
| T2B | Scanner / site appearance on X-ray | Hospital drift on CheXpert-style data | Isotropic $\sigma$ (D2) | isotropic |
T2B |
| T3A | Camera / lighting; same keypoint semantics | Studio→wild pose; broadcast→fan video | Augmentation-induced deltas (D3) | augmentation |
T3A |
| T3B | Photometry; depth meaning unchanged | Lighting on depth; synthetic→real RGB-D | Augmentation deltas (D3) | augmentation |
T3B |
| T4A | New visual domain; same classes | Photo→sketch; warehouse A→B; country shift | Source−target feature Gram (D4) | domain_shift |
T4A |
| T4B | Sim→real texture + layout; same seg map | GTA5→Cityscapes; synthetic seg→real | Domain Gram per layer (D4; paper multiscale) | domain_shift |
T4B |
| T5A | 3D atom coordinates move; property fixed | QM9 conformers; pose grids | Compositional blocks (D5) | compositional |
T5A |
| T5B | Token groups change; code label fixed | Renames; comment stripping | Nuisance indices on tokens (D5) | compositional |
T5B |
| T6A | Channel / room / codec; same transcript | New mic; Libri conditions | Temporal / content-residual (D6) | temporal |
T6A |
| T6B | Sensor drift over time | HAR placement; IMU aging | Temporal residual (D6) | temporal |
T6B |
| T7A | Surface form; facts unchanged | Bulleted vs prose; tone shift in LLMs | Style pairs → Gram (D7) | style |
T7A |
| T7B | Adversarial directions at deploy | PGD stress; spoof patches | PGD delta subspace (D7) | style / PGD doc |
T7B |
Full index: 13 tasks · notebooks
pmh-train route --list
Seven nuisance types (one object, seven estimators)
| Type | $\Sigma_{\text{task}}$ is… | Data you typically need |
|---|---|---|
| D1 subspace | Low-rank cross-domain difference | Labeled source + target features |
| D2 isotropic | Spherical noise level | Train distribution (+ noise level if known) |
| D3 augmentation | Span of aug-induced feature moves | Train + known augmentations |
| D4 domain | Gram of class-aligned source−target diffs (labels optional) | Train + deploy batches (labeled pairs preferred) |
| D5 compositional | Covariance on named coordinates | Train + which dims are nuisance |
| D6 temporal | Drift along time / sequence | Trajectories, sensor series |
| D7 style | Style / attack direction covariance | Same-content pairs or PGD deltas |
If two rows sound similar, start with T1 (frozen vectors) or T4A (end-to-end vision). Your benchmark name does not matter — the nuisance law does.
Adapt any similar pipeline
- Match deploy change to a row above (not the paper ID).
- Open that task’s notebook — sections 1–8 always follow the five-step recipe.
- Replace demo loaders with your data; keep the same
nuisance=and estimate call. - Run Step 5 on deploy holdout; ship only if matched beats wrong-direction and generic isotropic (see WHEN_PMH_HELPS).
The demos in scripts/demos/ and notebooks/tasks/ exist to show the same ordering the theory predicts (matched → isotropic → wrong on geometry and drift metrics), not to define thirteen separate products.
Start here
Practitioners: docs/START.md — one function, one ship verdict (no paper, auto shift type).
pip install matching-pmh torch
pip install "matching-pmh[sklearn]" # frozen-feature path
pmh-train try --quick # ~1 min: train + deploy report + SHIP / DO NOT SHIP
from pmh import try_pmh
from pmh.pytorch_eval import pytorch_demo_loaders
bundle = pytorch_demo_loaders(n=400, seed=0)
report = try_pmh(
bundle.model, bundle.train_loader, bundle.val_loader,
source_batches=bundle.source_batches, target_batches=bundle.target_batches,
hook=bundle.encoder, head=bundle.head, epochs=5,
)
print(report.deploy_summary())
print(report.ship_verdict()) # auto nuisance= — you do not pick D1–D7 first
pmh-train doctor
pmh-train evaluate --demo --stack pytorch
pmh-train try --stack multilayer --quick # T4B RGB CNN feature-diff demo
| Path | Notebook | When |
|---|---|---|
| T1 classical / frozen features | t01-classical.ipynb · Colab | sklearn, embeddings |
| T4A vision domain | t04a-vision-domain.ipynb · Colab | PyTorch site/camera |
| T4B multilayer vision | t04b-multilayer-vision.ipynb · Colab | Per-layer feature-diff PMH |
Read the theory: main.pdf · Block summary: findings.html
Documentation map
| Doc | Role |
|---|---|
main.pdf |
Full theory, theorems, thirteen blocks |
| docs/START.md | Golden path — try_pmh, auto shift type, ship verdict |
| docs/MIGRATE.md | CORAL, sklearn, HF, augmentation |
| docs/LOSS_SCALING.md | PMH vs task loss (5--30%, enforced cap) |
| docs/GLOSSARY.md | Plain language ↔ code |
| docs/PRINCIPLE.md | Short PMH spine ($\Sigma_{\text{task}}$, five steps, library vs paper) |
| docs/index.md | Site hub |
| docs/cookbook/ | Lightning + HF integration sketches |
| QUICKSTART.md | Install + commands |
| tasks/index.md | All tasks T1–T7 + deploy table |
| WHEN_PMH_HELPS.md | Fit, misfit, controls |
| api/index.md | PMHTrainer, presets, evaluate |
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file matching_pmh-2.0.0.tar.gz.
File metadata
- Download URL: matching_pmh-2.0.0.tar.gz
- Upload date:
- Size: 3.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c42b237ca7e82b3d8b0ed19c0047d95c1bf74b9605765812a59559ec50021b76
|
|
| MD5 |
3ba5f31566910b50a3488d4f77c33d51
|
|
| BLAKE2b-256 |
256221ad2040134adbaedc193e61357b2fc2e91fb413e3a23f46b66fd41d838d
|
File details
Details for the file matching_pmh-2.0.0-py3-none-any.whl.
File metadata
- Download URL: matching_pmh-2.0.0-py3-none-any.whl
- Upload date:
- Size: 163.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a16bf0da5fe6fa4781afcd9e8cc6cdebde8e7b189cc48d23ccf29b7886e006a
|
|
| MD5 |
07709388f055fd800f9dff7faebc95c6
|
|
| BLAKE2b-256 |
5c3ac8be7c631c9673769bf604d0388e1a9d9b70ef3849541cea269e0201fb74
|