Skip to main content

Matching Principle for ML: estimate deployment nuisance geometry (Sigma_task, D1-D7) and train any encoder with matched PMH on your representations

Project description

matching-pmh

Deployment geometry in. Matched robustness out.

Estimate Sigma_task (D1–D7) · train any encoder with matched PMH · falsify with controls

PyPI Python License: MIT CI

PyPI · GitHub · Walkthroughs · Theory · Quickstart


matching-pmh implements the Matching Principle: make your encoder robust along the directions that actually shift between train and deploy—not every input direction that happens to correlate with labels.

Step What you do What the library does
1. Nuisance Name what can change at deployment without changing the label (site, lighting, sensor, format, …). Registry + suggest_nuisance / nuisance="auto" to pick an estimator family.
2. Geometry Collect source/target (or augmentation) batches that expose that variation. Estimate Σ_task — covariance of label-preserving deployment nuisance — via D1–D7 (shift, augment, sequence, style Gram, …).
3. Training Keep your task loss; point a hook at representations h = φ_θ(x). Add matched PMH: shrink Jacobian sensitivity along Σ_task (matched penalty), not isotropic VAT/CORAL-on-weights alone.

Your stack, your hook. ResNet / ViT (timm, torchvision), GNN mean-pool, Whisper-style encoders, causal LMs with LoRA, or frozen features + PMHMatcher / sklearn. Two phases (estimate once → train every step), one tensor h, no framework lock-in.

Theory: definitions, lemmas, and loss forms in docs/THEORY.md (LaTeX-friendly).

Not a paper reproduction kit — adapt your own pipeline.
Start here: Getting startedChoose your setupGallery


30-second start

pip install matching-pmh
python examples/01_domain_shift_d4.py
pmh-train list-methods
from pmh import PMHMatcher, PMHTrainer, PMHConfig

# NumPy / sklearn frozen features
matcher = PMHMatcher(nuisance="domain_shift", rank=32).fit(x_source, x_target)

# PyTorch — estimate + train in one call
trainer = PMHTrainer(model, hook="backbone", nuisance="auto", pmh_config=PMHConfig.balanced())
trainer.fit(train_loader, source_batches=src_loader, target_batches=tgt_loader, epochs=20)

Getting started · Choose setup · Benchmarks & TDI · Troubleshooting · 18 walkthroughs


Problem, object, repair, unification

Problem ERM uses every input direction that predicts labels—including nuisances harmful at deployment (lighting, site, sensor noise, formatting, renameable identifiers, …).
Object Sigma_task = covariance of label-preserving deployment nuisance n (under law Q_n).
Repair Matched PMH shrinks encoder sensitivity along Sigma_task, not uniformly (isotropic PMH / generic VAT).
Unification CORAL, domain Grams, augmentation stacks, metric-learning directions, adversarial subspaces, and style Grams are different estimators of the same Sigma_task (Lemma D1–D7).

Matched loss (schematic): L = L_task + lambda * Tr(J_phi^T J_phi Sigma') with range(Sigma') covering range(Sigma_task). Details: THEORY.md.


How it fits your codebase

 Phase A (once)              Phase B (every step)
 ----------------              --------------------
 source/target data    ->      x, y from your loader
       |                            |
 encoder (eval)        ->      encoder (train) -> h
       |                            |
 estimate D1-D7        ->      L_task(h, y) + PMHLoss(h, Sigma_hat)
       |
 artifact.pt
You keep Library adds
Model, optimizer, task loss SigmaTaskConfig, estimate_from_config
Data loaders collect_features (optional)
Training loop / Trainer PMHLoss.capped_total or PMHTrainer

Walkthroughs (18 templates)

# Guide Run
1 PyTorch + D4 examples/01_domain_shift_d4.py
2 ResNet + D4 examples/12_resnet_hook_d4.py
3 Office-31 + sklearn examples/06_office31_sklearn.py
4 Multi-layer CNN examples/07_vision_multilayer.py
5 Compositional D5 examples/13_compositional_train_d5.py
6 LLM style D7 examples/08_hf_style_d7.py
7 HF Trainer + DPO examples/11_dpo_lora_style_pmh.py
8 Falsification controls examples/04_falsification_controls.py
9 CLI JSON jobs pmh-train estimate --config ...
10 Lightning examples/09_lightning_module.py
11 Temporal D6 API in guide
12 ViT / CLS + D4 examples/14_vit_cls_d4.py
13 Speech encoder + D4 examples/15_speech_encoder_d4.py
14 QM9 / molecules D5 examples/16_qm9_molecule_d5.py
15 Code / tokens D5 examples/17_code_tokens_d5.py
16 Augmentations D3 examples/18_augmentation_d3.py
17 Compare arms on your pipeline examples/20_compare_training_arms.py
18 PMHTrainer quickstart examples/01_domain_shift_d4.py

Estimators at a glance (D1–D7)

Deployment story Method SigmaTaskConfig
Different site / camera / corpus; P(y given x) stable D4 SigmaTaskConfig.for_domain(rank=32)
Low-rank shift; labels on both domains D1 SigmaTaskConfig.for_subspace(rank=32)
Unstructured sensor / acquisition noise D2 SigmaTaskConfig.for_isotropic(dim, noise_level)
Known augmentation modes (color, blur, crop, …) D3 SigmaTaskConfig.for_augmentation() + aug_deltas
Nuisance on specific coordinates (atoms, tokens) D5 SigmaTaskConfig.for_compositional(indices)
Drift along time within a sequence D6 SigmaTaskConfig.for_temporal()
LLM style / format; semantics fixed D7 SigmaTaskConfig.for_alignment(rank=32)
pmh-train list-methods

Hybrid nuisances: estimate separate Sigma matrices and add separate PMHLoss terms.


Install

pip install matching-pmh
Extra Use case
pip install "matching-pmh[vision]" ResNet / ViT examples
pip install "matching-pmh[hf]" D7 style Gram (Transformers)
pip install "matching-pmh[hf-lora]" LoRA + DPO example
pip install "matching-pmh[sklearn,vision]" Office-31 pipeline
pip install "matching-pmh[lightning]" Lightning callback
pip install "matching-pmh[all]" Development + docs

From source:

git clone https://github.com/vishalstark512/matching-pmh.git
cd matching-pmh && pip install -e ".[dev]" && pytest -q

Documentation

Document Purpose
GETTING_STARTED.md Main adoption guide (start here)
CHOOSE_YOUR_SETUP.md Pick API by stack and data
TROUBLESHOOTING.md Errors, preflight, hook dim
gallery/ Copy-paste: vision / tabular / NLP
hooks.md ResNet, timm, HF hooks
ADAPT_YOUR_PIPELINE.md Integration checklist
walkthroughs/ 18 stack-specific tutorials
THEORY.md Mathematics

Citation

Cite the Grand Unification / Matching Principle manuscript. See CITATION.cff in the repository.

@software{matching_pmh,
  title  = {matching-pmh: Matched PMH training from estimated deployment nuisance geometry},
  author = {Rajput, Vishal},
  year   = {2026},
  url    = {https://github.com/vishalstark512/matching-pmh}
}

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_pmh-1.3.0.tar.gz (126.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matching_pmh-1.3.0-py3-none-any.whl (77.3 kB view details)

Uploaded Python 3

File details

Details for the file matching_pmh-1.3.0.tar.gz.

File metadata

  • Download URL: matching_pmh-1.3.0.tar.gz
  • Upload date:
  • Size: 126.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-1.3.0.tar.gz
Algorithm Hash digest
SHA256 a5aa3577b813cb573b809bb94137f8a33b324727186a27178369b5bca7f6e42b
MD5 c5af57b9b627192459b564fdeb705865
BLAKE2b-256 eea2070db7214b7de0eda93464417c12ef96cda325e2ab230bb59a87a4353933

See more details on using hashes here.

File details

Details for the file matching_pmh-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: matching_pmh-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 77.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a59d3b759a018e6a4c6761514709d6e1ac37b7e86ad6b157639c1186352fab5d
MD5 36f95d934c48c8a6b9188b3efabbb715
BLAKE2b-256 1fc6a24ca0fcbc43995fb05624179c50d110c801308236fe8c276b36056b6724

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page