Architecture-agnostic matching principle: estimate Sigma_task (D1-D7) and train any encoder with matched PMH penalties
Project description
matching-pmh
Deployment geometry in. Matched robustness out.
Estimate Sigma_task (D1–D7) · train any encoder with matched PMH · falsify with controls
PyPI · GitHub · Walkthroughs · Theory · Quickstart
matching-pmh is a research-grade PyTorch library for the Matching Principle:
- Name what changes at deployment without changing the label.
- Estimate that nuisance geometry Sigma_task (covariance of label-preserving deployment variation).
- Add a matched Jacobian penalty on your representations
h = phi_theta(x).
Works with your stack: ResNet, ViT, GNN, Whisper-style encoders, causal LMs with LoRA, or frozen features + sklearn. Full math (LaTeX): docs/THEORY.md.
Design goal: two phases, one hook tensor
h, no framework lock-in—not a paper reproduction kit.
New users start here: Getting started (adoption guide) → Choose your setup → Gallery templates
30-second start
pip install matching-pmh
python examples/01_domain_shift_d4.py
pmh-train list-methods
from pmh import PMHMatcher, PMHTrainer, PMHConfig
# NumPy / sklearn frozen features
matcher = PMHMatcher(nuisance="domain_shift", rank=32).fit(x_source, x_target)
# PyTorch — estimate + train in one call
trainer = PMHTrainer(model, hook="backbone", nuisance="auto", pmh_config=PMHConfig.balanced())
trainer.fit(train_loader, source_batches=src_loader, target_batches=tgt_loader, epochs=20)
Getting started · Choose setup · Troubleshooting · 18 walkthroughs
Problem, object, repair, unification
| Problem | ERM uses every input direction that predicts labels—including nuisances harmful at deployment (lighting, site, sensor noise, formatting, renameable identifiers, …). |
| Object | Sigma_task = covariance of label-preserving deployment nuisance n (under law Q_n). |
| Repair | Matched PMH shrinks encoder sensitivity along Sigma_task, not uniformly (isotropic PMH / generic VAT). |
| Unification | CORAL, domain Grams, augmentation stacks, metric-learning directions, adversarial subspaces, and style Grams are different estimators of the same Sigma_task (Lemma D1–D7). |
Matched loss (schematic): L = L_task + lambda * Tr(J_phi^T J_phi Sigma') with range(Sigma') covering range(Sigma_task). Details: THEORY.md.
How it fits your codebase
Phase A (once) Phase B (every step)
---------------- --------------------
source/target data -> x, y from your loader
| |
encoder (eval) -> encoder (train) -> h
| |
estimate D1-D7 -> L_task(h, y) + PMHLoss(h, Sigma_hat)
|
artifact.pt
| You keep | Library adds |
|---|---|
| Model, optimizer, task loss | SigmaTaskConfig, estimate_from_config |
| Data loaders | collect_features (optional) |
| Training loop / Trainer | PMHLoss.capped_total or PMHTrainer |
Walkthroughs (18 templates)
| # | Guide | Run |
|---|---|---|
| 1 | PyTorch + D4 | examples/01_domain_shift_d4.py |
| 2 | ResNet + D4 | examples/12_resnet_hook_d4.py |
| 3 | Office-31 + sklearn | examples/06_office31_sklearn.py |
| 4 | Multi-layer CNN | examples/07_vision_multilayer.py |
| 5 | Compositional D5 | examples/13_compositional_train_d5.py |
| 6 | LLM style D7 | examples/08_hf_style_d7.py |
| 7 | HF Trainer + DPO | examples/11_dpo_lora_style_pmh.py |
| 8 | Falsification controls | examples/04_falsification_controls.py |
| 9 | CLI JSON jobs | pmh-train estimate --config ... |
| 10 | Lightning | examples/09_lightning_module.py |
| 11 | Temporal D6 | API in guide |
| 12 | ViT / CLS + D4 | examples/14_vit_cls_d4.py |
| 13 | Speech encoder + D4 | examples/15_speech_encoder_d4.py |
| 14 | QM9 / molecules D5 | examples/16_qm9_molecule_d5.py |
| 15 | Code / tokens D5 | examples/17_code_tokens_d5.py |
| 16 | Augmentations D3 | examples/18_augmentation_d3.py |
| 17 | Compare arms on your pipeline | examples/20_compare_training_arms.py |
| 18 | PMHTrainer quickstart | examples/01_domain_shift_d4.py |
Estimators at a glance (D1–D7)
| Deployment story | Method | SigmaTaskConfig |
|---|---|---|
| Different site / camera / corpus; P(y given x) stable | D4 | SigmaTaskConfig.for_domain(rank=32) |
| Low-rank shift; labels on both domains | D1 | SigmaTaskConfig.for_subspace(rank=32) |
| Unstructured sensor / acquisition noise | D2 | SigmaTaskConfig.for_isotropic(dim, noise_level) |
| Known augmentation modes (color, blur, crop, …) | D3 | SigmaTaskConfig.for_augmentation() + aug_deltas |
| Nuisance on specific coordinates (atoms, tokens) | D5 | SigmaTaskConfig.for_compositional(indices) |
| Drift along time within a sequence | D6 | SigmaTaskConfig.for_temporal() |
| LLM style / format; semantics fixed | D7 | SigmaTaskConfig.for_alignment(rank=32) |
pmh-train list-methods
Hybrid nuisances: estimate separate Sigma matrices and add separate PMHLoss terms.
Install
pip install matching-pmh
| Extra | Use case |
|---|---|
pip install "matching-pmh[vision]" |
ResNet / ViT examples |
pip install "matching-pmh[hf]" |
D7 style Gram (Transformers) |
pip install "matching-pmh[hf-lora]" |
LoRA + DPO example |
pip install "matching-pmh[sklearn,vision]" |
Office-31 pipeline |
pip install "matching-pmh[lightning]" |
Lightning callback |
pip install "matching-pmh[all]" |
Development + docs |
From source:
git clone https://github.com/vishalstark512/matching-pmh.git
cd matching-pmh && pip install -e ".[dev]" && pytest -q
Documentation
| Document | Purpose |
|---|---|
| GETTING_STARTED.md | Main adoption guide (start here) |
| CHOOSE_YOUR_SETUP.md | Pick API by stack and data |
| TROUBLESHOOTING.md | Errors, preflight, hook dim |
| gallery/ | Copy-paste: vision / tabular / NLP |
| hooks.md | ResNet, timm, HF hooks |
| ADAPT_YOUR_PIPELINE.md | Integration checklist |
| walkthroughs/ | 18 stack-specific tutorials |
| THEORY.md | Mathematics |
Citation
Cite the Grand Unification / Matching Principle manuscript. See CITATION.cff in the repository.
@software{matching_pmh,
title = {matching-pmh: Matched PMH training from estimated deployment nuisance geometry},
author = {Rajput, Vishal},
year = {2026},
url = {https://github.com/vishalstark512/matching-pmh}
}
Contributing
See CONTRIBUTING.md.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file matching_pmh-1.2.0.tar.gz.
File metadata
- Download URL: matching_pmh-1.2.0.tar.gz
- Upload date:
- Size: 116.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a1cc79b2db0b9ab89d7d17d56618c4f37616e62c437298c169ed6d047fcbece
|
|
| MD5 |
4d3f6d6a0ea6ef41785dab63b7741a0e
|
|
| BLAKE2b-256 |
023a86e656c5e8229aa61a5f87d23dc71b70ae028e0d076815625f77095c9664
|
File details
Details for the file matching_pmh-1.2.0-py3-none-any.whl.
File metadata
- Download URL: matching_pmh-1.2.0-py3-none-any.whl
- Upload date:
- Size: 70.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4d77c0870552d18c82291240be4d8d2a1972c3e6e60b85038776059863ac8f0
|
|
| MD5 |
e05d165c1cc2ba4f8ca4f0c735b6b675
|
|
| BLAKE2b-256 |
5555c9f46a939b272b3e786611f5f285d89db658aaac1ef08baf633b8dd4aaaf
|