Matching Principle for ML: estimate deployment nuisance geometry (Sigma_task, D1-D7) and train any encoder with matched PMH on your representations
Project description
matching-pmh
Deployment geometry in. Matched robustness out.
Estimate Sigma_task (D1–D7) · train any encoder with matched PMH · falsify with controls
PyPI · GitHub · Walkthroughs · Theory · Quickstart
matching-pmh implements the Matching Principle: make your encoder robust along the directions that actually shift between train and deploy—not every input direction that happens to correlate with labels.
| Step | What you do | What the library does |
|---|---|---|
| 1. Nuisance | Name what can change at deployment without changing the label (site, lighting, sensor, format, …). | Registry + suggest_nuisance / nuisance="auto" to pick an estimator family. |
| 2. Geometry | Collect source/target (or augmentation) batches that expose that variation. | Estimate Σ_task — covariance of label-preserving deployment nuisance — via D1–D7 (shift, augment, sequence, style Gram, …). |
| 3. Training | Keep your task loss; point a hook at representations h = φ_θ(x). |
Add matched PMH: shrink Jacobian sensitivity along Σ_task (matched penalty), not isotropic VAT/CORAL-on-weights alone. |
Your stack, your hook. ResNet / ViT (timm, torchvision), GNN mean-pool, Whisper-style encoders, causal LMs with LoRA, or frozen features + PMHMatcher / sklearn. Two phases (estimate once → train every step), one tensor h, no framework lock-in.
Theory: definitions, lemmas, and loss forms in docs/THEORY.md (LaTeX-friendly).
Not a paper reproduction kit — adapt your own pipeline.
Start here: Getting started → Choose your setup → Gallery
30-second start
pip install matching-pmh
python examples/01_domain_shift_d4.py
pmh-train list-methods
from pmh import PMHMatcher, PMHTrainer, PMHConfig
# NumPy / sklearn frozen features
matcher = PMHMatcher(nuisance="domain_shift", rank=32).fit(x_source, x_target)
# PyTorch — estimate + train in one call
trainer = PMHTrainer(model, hook="backbone", nuisance="auto", pmh_config=PMHConfig.balanced())
trainer.fit(train_loader, source_batches=src_loader, target_batches=tgt_loader, epochs=20)
Getting started · Choose setup · Benchmarks & TDI · Troubleshooting · 18 walkthroughs
Problem, object, repair, unification
| Problem | ERM uses every input direction that predicts labels—including nuisances harmful at deployment (lighting, site, sensor noise, formatting, renameable identifiers, …). |
| Object | Sigma_task = covariance of label-preserving deployment nuisance n (under law Q_n). |
| Repair | Matched PMH shrinks encoder sensitivity along Sigma_task, not uniformly (isotropic PMH / generic VAT). |
| Unification | CORAL, domain Grams, augmentation stacks, metric-learning directions, adversarial subspaces, and style Grams are different estimators of the same Sigma_task (Lemma D1–D7). |
Matched loss (schematic): L = L_task + lambda * Tr(J_phi^T J_phi Sigma') with range(Sigma') covering range(Sigma_task). Details: THEORY.md.
How it fits your codebase
Phase A (once) Phase B (every step)
---------------- --------------------
source/target data -> x, y from your loader
| |
encoder (eval) -> encoder (train) -> h
| |
estimate D1-D7 -> L_task(h, y) + PMHLoss(h, Sigma_hat)
|
artifact.pt
| You keep | Library adds |
|---|---|
| Model, optimizer, task loss | SigmaTaskConfig, estimate_from_config |
| Data loaders | collect_features (optional) |
| Training loop / Trainer | PMHLoss.capped_total or PMHTrainer |
Walkthroughs (18 templates)
| # | Guide | Run |
|---|---|---|
| 1 | PyTorch + D4 | examples/01_domain_shift_d4.py |
| 2 | ResNet + D4 | examples/12_resnet_hook_d4.py |
| 3 | Office-31 + sklearn | examples/06_office31_sklearn.py |
| 4 | Multi-layer CNN | examples/07_vision_multilayer.py |
| 5 | Compositional D5 | examples/13_compositional_train_d5.py |
| 6 | LLM style D7 | examples/08_hf_style_d7.py |
| 7 | HF Trainer + DPO | examples/11_dpo_lora_style_pmh.py |
| 8 | Falsification controls | examples/04_falsification_controls.py |
| 9 | CLI JSON jobs | pmh-train estimate --config ... |
| 10 | Lightning | examples/09_lightning_module.py |
| 11 | Temporal D6 | API in guide |
| 12 | ViT / CLS + D4 | examples/14_vit_cls_d4.py |
| 13 | Speech encoder + D4 | examples/15_speech_encoder_d4.py |
| 14 | QM9 / molecules D5 | examples/16_qm9_molecule_d5.py |
| 15 | Code / tokens D5 | examples/17_code_tokens_d5.py |
| 16 | Augmentations D3 | examples/18_augmentation_d3.py |
| 17 | Compare arms on your pipeline | examples/20_compare_training_arms.py |
| 18 | PMHTrainer quickstart | examples/01_domain_shift_d4.py |
Estimators at a glance (D1–D7)
| Deployment story | Method | SigmaTaskConfig |
|---|---|---|
| Different site / camera / corpus; P(y given x) stable | D4 | SigmaTaskConfig.for_domain(rank=32) |
| Low-rank shift; labels on both domains | D1 | SigmaTaskConfig.for_subspace(rank=32) |
| Unstructured sensor / acquisition noise | D2 | SigmaTaskConfig.for_isotropic(dim, noise_level) |
| Known augmentation modes (color, blur, crop, …) | D3 | SigmaTaskConfig.for_augmentation() + aug_deltas |
| Nuisance on specific coordinates (atoms, tokens) | D5 | SigmaTaskConfig.for_compositional(indices) |
| Drift along time within a sequence | D6 | SigmaTaskConfig.for_temporal() |
| LLM style / format; semantics fixed | D7 | SigmaTaskConfig.for_alignment(rank=32) |
pmh-train list-methods
Hybrid nuisances: estimate separate Sigma matrices and add separate PMHLoss terms.
Install
pip install matching-pmh
| Extra | Use case |
|---|---|
pip install "matching-pmh[vision]" |
ResNet / ViT examples |
pip install "matching-pmh[hf]" |
D7 style Gram (Transformers) |
pip install "matching-pmh[hf-lora]" |
LoRA + DPO example |
pip install "matching-pmh[sklearn,vision]" |
Office-31 pipeline |
pip install "matching-pmh[lightning]" |
Lightning callback |
pip install "matching-pmh[all]" |
Development + docs |
From source:
git clone https://github.com/vishalstark512/matching-pmh.git
cd matching-pmh && pip install -e ".[dev]" && pytest -q
Documentation
| Document | Purpose |
|---|---|
| GETTING_STARTED.md | Main adoption guide (start here) |
| CHOOSE_YOUR_SETUP.md | Pick API by stack and data |
| TROUBLESHOOTING.md | Errors, preflight, hook dim |
| gallery/ | Copy-paste: vision / tabular / NLP |
| hooks.md | ResNet, timm, HF hooks |
| ADAPT_YOUR_PIPELINE.md | Integration checklist |
| walkthroughs/ | 18 stack-specific tutorials |
| THEORY.md | Mathematics |
Citation
Cite the Grand Unification / Matching Principle manuscript. See CITATION.cff in the repository.
@software{matching_pmh,
title = {matching-pmh: Matched PMH training from estimated deployment nuisance geometry},
author = {Rajput, Vishal},
year = {2026},
url = {https://github.com/vishalstark512/matching-pmh}
}
Contributing
See CONTRIBUTING.md.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file matching_pmh-1.3.0.tar.gz.
File metadata
- Download URL: matching_pmh-1.3.0.tar.gz
- Upload date:
- Size: 126.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5aa3577b813cb573b809bb94137f8a33b324727186a27178369b5bca7f6e42b
|
|
| MD5 |
c5af57b9b627192459b564fdeb705865
|
|
| BLAKE2b-256 |
eea2070db7214b7de0eda93464417c12ef96cda325e2ab230bb59a87a4353933
|
File details
Details for the file matching_pmh-1.3.0-py3-none-any.whl.
File metadata
- Download URL: matching_pmh-1.3.0-py3-none-any.whl
- Upload date:
- Size: 77.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a59d3b759a018e6a4c6761514709d6e1ac37b7e86ad6b157639c1186352fab5d
|
|
| MD5 |
36f95d934c48c8a6b9188b3efabbb715
|
|
| BLAKE2b-256 |
1fc6a24ca0fcbc43995fb05624179c50d110c801308236fe8c276b36056b6724
|