Skip to main content

Perturbation Matching Hypothesis (PMH): estimate Sigma_task, matched PMH training, falsification controls — PyTorch, sklearn, HF.

Project description

matching-pmh

Train on site A. Deploy on site B. Same labels.

Deploy QA gate: When your model fits training data but breaks on deploy — same task, same labels, different site, camera, or corpus — PMH estimates how representations should move at deploy, trains a shift-matched penalty, and tells you ship or do not ship only after matched beats wrong-direction and generic controls on deploy holdout. Start here →

This repository ships the Perturbation Matching Hypothesis (PMH) — a geometric theory of how training losses should respond to label-preserving deployment change. The paper (main.pdf) argues that domain shift, sensor noise, augmentation stress, compositional drift, temporal drift, style, and classical anisotropic penalties are one statistical problem: estimate the deployment nuisance covariance $\Sigma_{\text{task}}$, then train so the encoder Jacobian is matched to that geometry. CORAL, adversarial training, augmentation, metric learning, and alignment constraints become different estimators of the same object, not unrelated “robustness tricks.”

matching-pmh is the library + thirteen worked demos that implement the paper’s five-step recipe on real stacks (PyTorch, sklearn, Hugging Face). You are not picking a regularizer off a menu; you are identifying $\Sigma_{\text{task}}$ for your deploy shift and applying the matched PMH loss from Eq. (4) in the paper.


The idea in plain language

What moves at deploy without changing the label?
Examples: new camera or hospital (vision), new microphone (speech), new writing style (LLM), known lighting aug (depth), PGD-like perturbations (security). All of these are instances of a single random displacement $n$ with covariance $\Sigma_{\text{task}} = \mathrm{Cov}(n)$.

What training should do.
Add a PMH term that penalizes encoder Jacobian energy along a matrix $\Sigma'$ whose column space covers the nuisance range. When $\Sigma'$ is matched to $\Sigma_{\text{task}}$, deployment drift in representations can be driven down; when $\Sigma'$ is isotropic or wrong, the theory predicts specific failure modes — and the library runs those arms as controls, not optional extras.

What makes this a theory, not a hack.
The paper proves range coverage is necessary for quadratic Jacobian penalties, gives matched sufficiency in the linear model, extends to deep global minima under stated assumptions, and supplies falsification lemmas (wrong subspace, signal-aligned penalty) tested before you trust a deploy gain. See main.pdf §2–5 and block findings.


The five-step recipe (product spine)

Same steps in every notebook (§1–§8) and in pmh.recipe:

Step Question you answer Library entry points
0 — Scope Same label semantics on train (A) and deploy (B)? check_applicability()
1 — Identify Which nuisance family fits? (seven types, D1–D7) suggest_nuisance() · task table below
2 — Estimate $\hat{\Sigma}_{\text{task}}$ from your data PMHTrainer.estimate() · PMHMatcher.fit() · estimate_style_sigma()
3 — Apply Matched PMH on hook $h$ (train) or projection (frozen features) PMHTrainer.fit · robust_fit · PMHLoss
4 — Protocol Keep PMH at 5--30% of task loss (hard cap) PMHConfig.golden_path() · LOSS_SCALING
5 — Evidence Matched beats wrong-direction and isotropic on deploy holdout evaluate_robust_fit · evaluate_baseline_vs_pmh
Scope → identify nuisance family → estimate $\Sigma_{\text{task}}$ → matched PMH train → falsify on deploy holdout.

Details: Quickstart · Will PMH help? · API


What this repo promises

We provide We do not claim
A closed, falsifiable training recipe once $\Sigma_{\text{task}}$ is identified Universality on every leaderboard
13 pre-registered blocks (T1–T7) as copy-paste playbooks That matched PMH always beats CORAL, DANN, or PGD-AT
Built-in matched / wrong / isotropic arms (Lemma C, Cor. E in the paper) PMH on label-changing shifts (e.g. spurious correlation)
Theory-aligned estimators D1–D7 + geometry probes (tdi, …) One demo preset replaces your domain data or reproduces every paper table row without tuning

Honest boundaries (from the paper): Colored MNIST / Waterbirds-style label-correlated nuisance is out of scope; Office-31 is a documented case where estimator eigengap can fail (Lemma D1), not a silent bug.

Pre-registered evidence: 12/13 paper blocks pass their criteria in main.pdf (see findings.html); Office-31 is the predicted D1 failure when the cross-domain subspace is ill-conditioned — run Step 5 before shipping.

Paper numbers vs this library

Block accuracies, mIoU gains, and other reported figures in the README, task pages, and main.pdf tables are paper results — full benchmarks, datasets, and schedules described in the PDF.

The pmh library on PyPI is for general use on your stack: same estimators and five-step recipe, but different demo loaders, defaults, and integration paths. It will not automatically replicate those paper numbers out of the box. Expect iteration on your side — hook choice, rank, PMHConfig / loss scale (LOSS_SCALING), more target data, and Step 5 on your deploy holdout — before you treat a run as “correct.” Notebooks under notebooks/tasks/ teach the workflow on built-in demos.

Short theory spine (no PDF required to start): docs/PRINCIPLE.md.
Synthesized block outcomes (HTML): docs/findings.html — regenerate with python scripts/build_findings_html.py.


Choose your depth

You want… Open
Plain-language principle + five steps docs/PRINCIPLE.md
“Will this help my deploy shift?” docs/WHEN_PMH_HELPS.md
Copy-paste task for your nuisance docs/tasks/index.md → notebook §8
Sklearn / frozen embeddings (T1) t01-classical · compare_arms_sklearn
PyTorch site/camera (T4) t04a-vision-domain · PMHTrainer (class-aligned D4)
Per-layer domain Gram (T4B) t04b-multilayer-vision · PMHTrainer(train_mode="feature_diff")
Full proofs + block numbers main.pdf · findings.html
Matched / wrong / isotropic benchmark run_benchmark_protocol · compare_arms

Find your deployment story (T1 through T7)

Tasks are examples of the same principle — pick the closest deploy change, open the page + notebook, Run All on demo data, then plug in your pipeline in §8. Order follows the paper blocks (T1 first).

Task What changes at deploy (labels fixed) Real situations like yours How $\hat{\Sigma}_{\text{task}}$ is built nuisance= Start
T1 Embedding cloud shifts between sites Office-31; two labs’ tabular features; frozen ResNet vectors Cross-domain subspace on features (D1) subspace T1
T2A Undirected input corruption (no fixed direction) ImageNet-C; sensor noise; blur/JPEG Isotropic $\sigma^2 I$ (D2) isotropic T2A
T2B Scanner / site appearance on X-ray Hospital drift on CheXpert-style data Isotropic $\sigma$ (D2) isotropic T2B
T3A Camera / lighting; same keypoint semantics Studio→wild pose; broadcast→fan video Augmentation-induced deltas (D3) augmentation T3A
T3B Photometry; depth meaning unchanged Lighting on depth; synthetic→real RGB-D Augmentation deltas (D3) augmentation T3B
T4A New visual domain; same classes Photo→sketch; warehouse A→B; country shift Source−target feature Gram (D4) domain_shift T4A
T4B Sim→real texture + layout; same seg map GTA5→Cityscapes; synthetic seg→real Domain Gram per layer (D4; paper multiscale) domain_shift T4B
T5A 3D atom coordinates move; property fixed QM9 conformers; pose grids Compositional blocks (D5) compositional T5A
T5B Token groups change; code label fixed Renames; comment stripping Nuisance indices on tokens (D5) compositional T5B
T6A Channel / room / codec; same transcript New mic; Libri conditions Temporal / content-residual (D6) temporal T6A
T6B Sensor drift over time HAR placement; IMU aging Temporal residual (D6) temporal T6B
T7A Surface form; facts unchanged Bulleted vs prose; tone shift in LLMs Style pairs → Gram (D7) style T7A
T7B Adversarial directions at deploy PGD stress; spoof patches PGD delta subspace (D7) style / PGD doc T7B

Full index: 13 tasks · notebooks

pmh-train route --list

Seven nuisance types (one object, seven estimators)

Type $\Sigma_{\text{task}}$ is… Data you typically need
D1 subspace Low-rank cross-domain difference Labeled source + target features
D2 isotropic Spherical noise level Train distribution (+ noise level if known)
D3 augmentation Span of aug-induced feature moves Train + known augmentations
D4 domain Gram of class-aligned source−target diffs (labels optional) Train + deploy batches (labeled pairs preferred)
D5 compositional Covariance on named coordinates Train + which dims are nuisance
D6 temporal Drift along time / sequence Trajectories, sensor series
D7 style Style / attack direction covariance Same-content pairs or PGD deltas

If two rows sound similar, start with T1 (frozen vectors) or T4A (end-to-end vision). Your benchmark name does not matter — the nuisance law does.


Adapt any similar pipeline

  1. Match deploy change to a row above (not the paper ID).
  2. Open that task’s notebook — sections 1–8 always follow the five-step recipe.
  3. Replace demo loaders with your data; keep the same nuisance= and estimate call.
  4. Run Step 5 on deploy holdout; ship only if matched beats wrong-direction and generic isotropic (see WHEN_PMH_HELPS).

The demos in scripts/demos/ and notebooks/tasks/ exist to show the same ordering the theory predicts (matchedisotropicwrong on geometry and drift metrics), not to define thirteen separate products.


Start here

Practitioners: docs/START.md — one function, one ship verdict (no paper, auto shift type).

pip install matching-pmh torch
pip install "matching-pmh[sklearn]"   # frozen-feature path
pmh-train try --quick                  # ~1 min: train + deploy report + SHIP / DO NOT SHIP
from pmh import try_pmh
from pmh.pytorch_eval import pytorch_demo_loaders

bundle = pytorch_demo_loaders(n=400, seed=0)
report = try_pmh(
    bundle.model, bundle.train_loader, bundle.val_loader,
    source_batches=bundle.source_batches, target_batches=bundle.target_batches,
    hook=bundle.encoder, head=bundle.head, epochs=5,
)
print(report.deploy_summary())
print(report.ship_verdict())  # auto nuisance= — you do not pick D1–D7 first
pmh-train doctor
pmh-train evaluate --demo --stack pytorch
pmh-train try --stack multilayer --quick   # T4B RGB CNN feature-diff demo
Path Notebook When
T1 classical / frozen features t01-classical.ipynb · Colab sklearn, embeddings
T4A vision domain t04a-vision-domain.ipynb · Colab PyTorch site/camera
T4B multilayer vision t04b-multilayer-vision.ipynb · Colab Per-layer feature-diff PMH

Read the theory: main.pdf · Block summary: findings.html


Documentation map

Doc Role
main.pdf Full theory, theorems, thirteen blocks
docs/START.md Golden pathtry_pmh, auto shift type, ship verdict
docs/MIGRATE.md CORAL, sklearn, HF, augmentation
docs/LOSS_SCALING.md PMH vs task loss (5--30%, enforced cap)
docs/GLOSSARY.md Plain language ↔ code
docs/PRINCIPLE.md Short PMH spine ($\Sigma_{\text{task}}$, five steps, library vs paper)
docs/index.md Site hub
docs/cookbook/ Lightning + HF integration sketches
QUICKSTART.md Install + commands
tasks/index.md All tasks T1–T7 + deploy table
WHEN_PMH_HELPS.md Fit, misfit, controls
api/index.md PMHTrainer, presets, evaluate

Links

PyPI · Documentation site · Contributing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_pmh-2.0.0.tar.gz (3.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matching_pmh-2.0.0-py3-none-any.whl (163.2 kB view details)

Uploaded Python 3

File details

Details for the file matching_pmh-2.0.0.tar.gz.

File metadata

  • Download URL: matching_pmh-2.0.0.tar.gz
  • Upload date:
  • Size: 3.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for matching_pmh-2.0.0.tar.gz
Algorithm Hash digest
SHA256 c42b237ca7e82b3d8b0ed19c0047d95c1bf74b9605765812a59559ec50021b76
MD5 3ba5f31566910b50a3488d4f77c33d51
BLAKE2b-256 256221ad2040134adbaedc193e61357b2fc2e91fb413e3a23f46b66fd41d838d

See more details on using hashes here.

File details

Details for the file matching_pmh-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: matching_pmh-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 163.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for matching_pmh-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a16bf0da5fe6fa4781afcd9e8cc6cdebde8e7b189cc48d23ccf29b7886e006a
MD5 07709388f055fd800f9dff7faebc95c6
BLAKE2b-256 5c3ac8be7c631c9673769bf604d0388e1a9d9b70ef3849541cea269e0201fb74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page