Skip to main content

Matching Principle for ML: estimate deployment nuisance geometry (Sigma_task, D1-D7) and train any encoder with matched PMH on your representations

Project description

matching-pmh

Deployment geometry in. Matched robustness out.

Estimate Sigma_task (D1–D7) · train any encoder with matched PMH · falsify with controls

PyPI Python License: MIT CI

PyPI · GitHub · Walkthroughs · Theory · Quickstart


matching-pmh implements the Matching Principle: make your encoder robust along the directions that actually shift between train and deploy—not every input direction that happens to correlate with labels.

Step What you do What the library does
1. Nuisance Name what can change at deployment without changing the label (site, lighting, sensor, format, …). Registry + suggest_nuisance / nuisance="auto" to pick an estimator family.
2. Geometry Collect source/target (or augmentation) batches that expose that variation. Estimate Σ_task — covariance of label-preserving deployment nuisance — via D1–D7 (shift, augment, sequence, style Gram, …).
3. Training Keep your task loss; point a hook at representations h = φ_θ(x). Add matched PMH: shrink Jacobian sensitivity along Σ_task (matched penalty), not isotropic VAT/CORAL-on-weights alone.

Your stack, your hook. ResNet / ViT (timm, torchvision), GNN mean-pool, Whisper-style encoders, causal LMs with LoRA, or frozen features + PMHMatcher / sklearn. Two phases (estimate once → train every step), one tensor h, no framework lock-in.

Theory: definitions, lemmas, and loss forms in docs/THEORY.md (LaTeX-friendly).

Not a paper reproduction kit — adapt your own pipeline.
Start here: Getting startedChoose your setupGallery


30-second start

pip install matching-pmh
python examples/01_domain_shift_d4.py
pmh-train list-methods
from pmh import PMHMatcher, PMHTrainer, PMHConfig

# NumPy / sklearn frozen features
matcher = PMHMatcher(nuisance="domain_shift", rank=32).fit(x_source, x_target)

# PyTorch — estimate + train in one call
trainer = PMHTrainer(model, hook="backbone", nuisance="auto", pmh_config=PMHConfig.balanced())
trainer.fit(train_loader, source_batches=src_loader, target_batches=tgt_loader, epochs=20)

Getting started · Choose setup · Benchmarks & TDI · Troubleshooting · 18 walkthroughs


Problem, object, repair, unification

Problem ERM uses every input direction that predicts labels—including nuisances harmful at deployment (lighting, site, sensor noise, formatting, renameable identifiers, …).
Object Sigma_task = covariance of label-preserving deployment nuisance n (under law Q_n).
Repair Matched PMH shrinks encoder sensitivity along Sigma_task, not uniformly (isotropic PMH / generic VAT).
Unification CORAL, domain Grams, augmentation stacks, metric-learning directions, adversarial subspaces, and style Grams are different estimators of the same Sigma_task (Lemma D1–D7).

Matched loss (schematic): L = L_task + lambda * Tr(J_phi^T J_phi Sigma') with range(Sigma') covering range(Sigma_task). Details: THEORY.md.


How it fits your codebase

 Phase A (once)              Phase B (every step)
 ----------------              --------------------
 source/target data    ->      x, y from your loader
       |                            |
 encoder (eval)        ->      encoder (train) -> h
       |                            |
 estimate D1-D7        ->      L_task(h, y) + PMHLoss(h, Sigma_hat)
       |
 artifact.pt
You keep Library adds
Model, optimizer, task loss SigmaTaskConfig, estimate_from_config
Data loaders collect_features (optional)
Training loop / Trainer PMHLoss.capped_total or PMHTrainer

Walkthroughs (18 templates)

# Guide Run
1 PyTorch + D4 examples/01_domain_shift_d4.py
2 ResNet + D4 examples/12_resnet_hook_d4.py
3 Office-31 + sklearn examples/06_office31_sklearn.py
4 Multi-layer CNN examples/07_vision_multilayer.py
5 Compositional D5 examples/13_compositional_train_d5.py
6 LLM style D7 examples/08_hf_style_d7.py
7 HF Trainer + DPO examples/11_dpo_lora_style_pmh.py
8 Falsification controls examples/04_falsification_controls.py
9 CLI JSON jobs pmh-train estimate --config ...
10 Lightning examples/09_lightning_module.py
11 Temporal D6 API in guide
12 ViT / CLS + D4 examples/14_vit_cls_d4.py
13 Speech encoder + D4 examples/15_speech_encoder_d4.py
14 QM9 / molecules D5 examples/16_qm9_molecule_d5.py
15 Code / tokens D5 examples/17_code_tokens_d5.py
16 Augmentations D3 examples/18_augmentation_d3.py
17 Compare arms on your pipeline examples/20_compare_training_arms.py
18 PMHTrainer quickstart examples/01_domain_shift_d4.py

Estimators at a glance (D1–D7)

Deployment story Method SigmaTaskConfig
Different site / camera / corpus; P(y given x) stable D4 SigmaTaskConfig.for_domain(rank=32)
Low-rank shift; labels on both domains D1 SigmaTaskConfig.for_subspace(rank=32)
Unstructured sensor / acquisition noise D2 SigmaTaskConfig.for_isotropic(dim, noise_level)
Known augmentation modes (color, blur, crop, …) D3 SigmaTaskConfig.for_augmentation() + aug_deltas
Nuisance on specific coordinates (atoms, tokens) D5 SigmaTaskConfig.for_compositional(indices)
Drift along time within a sequence D6 SigmaTaskConfig.for_temporal()
LLM style / format; semantics fixed D7 SigmaTaskConfig.for_alignment(rank=32)
pmh-train list-methods

Hybrid nuisances: estimate separate Sigma matrices and add separate PMHLoss terms.


Install

pip install matching-pmh
Extra Use case
pip install "matching-pmh[vision]" ResNet / ViT examples
pip install "matching-pmh[hf]" D7 style Gram (Transformers)
pip install "matching-pmh[hf-lora]" LoRA + DPO example
pip install "matching-pmh[sklearn,vision]" Office-31 pipeline
pip install "matching-pmh[lightning]" Lightning callback
pip install "matching-pmh[all]" Development + docs

From source:

git clone https://github.com/vishalstark512/matching-pmh.git
cd matching-pmh && pip install -e ".[dev]" && pytest -q

Documentation

Document Purpose
GETTING_STARTED.md Main adoption guide (start here)
CHOOSE_YOUR_SETUP.md Pick API by stack and data
TROUBLESHOOTING.md Errors, preflight, hook dim
gallery/ Copy-paste: vision / tabular / NLP
hooks.md ResNet, timm, HF hooks
ADAPT_YOUR_PIPELINE.md Integration checklist
walkthroughs/ 18 stack-specific tutorials
THEORY.md Mathematics

Citation

Cite the Grand Unification / Matching Principle manuscript. See CITATION.cff in the repository.

@software{matching_pmh,
  title  = {matching-pmh: Matched PMH training from estimated deployment nuisance geometry},
  author = {Rajput, Vishal},
  year   = {2026},
  url    = {https://github.com/vishalstark512/matching-pmh}
}

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_pmh-1.4.1.tar.gz (152.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matching_pmh-1.4.1-py3-none-any.whl (89.3 kB view details)

Uploaded Python 3

File details

Details for the file matching_pmh-1.4.1.tar.gz.

File metadata

  • Download URL: matching_pmh-1.4.1.tar.gz
  • Upload date:
  • Size: 152.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-1.4.1.tar.gz
Algorithm Hash digest
SHA256 2c1e2ccfbe108523db284b5af8f91087492ff206b7ca90dc6a761d8f2c8640d4
MD5 8113ccb1d97d5d2c19161ee8111dce6e
BLAKE2b-256 0c71effd2eb62364d65ff05e2216987d3105a4abc84cd3d56ffe1f70d1473d35

See more details on using hashes here.

File details

Details for the file matching_pmh-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: matching_pmh-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 89.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b7168068d461aeff199412c1f5becec3285c7a430b07f89323c7ce535a759fcb
MD5 00bab8092cbd18009b18279e0e0c0355
BLAKE2b-256 224c8fdbca6b650dfaf3bd159363ebb3544c756628b453a12232db0c6a4171ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page