Skip to main content

Architecture-agnostic matching principle: estimate Sigma_task (D1-D7) and train any encoder with matched PMH penalties

Project description

matching-pmh

Deployment geometry in. Matched robustness out.

Estimate Sigma_task (D1–D7) · train any encoder with matched PMH · falsify with controls

PyPI Python License: MIT CI

PyPI · GitHub · Walkthroughs · Theory · Quickstart


matching-pmh is a research-grade PyTorch library for the Matching Principle:

  1. Name what changes at deployment without changing the label.
  2. Estimate that nuisance geometry Sigma_task (covariance of label-preserving deployment variation).
  3. Add a matched Jacobian penalty on your representations h = phi_theta(x).

Works with your stack: ResNet, ViT, GNN, Whisper-style encoders, causal LMs with LoRA, or frozen features + sklearn. Full math (LaTeX): docs/THEORY.md.

Design goal: two phases, one hook tensor h, no framework lock-in—not a paper reproduction kit.
New users start here: Getting started (adoption guide)Choose your setupGallery templates


30-second start

pip install matching-pmh
python examples/01_domain_shift_d4.py
pmh-train list-methods
from pmh import PMHMatcher, PMHTrainer, PMHConfig

# NumPy / sklearn frozen features
matcher = PMHMatcher(nuisance="domain_shift", rank=32).fit(x_source, x_target)

# PyTorch — estimate + train in one call
trainer = PMHTrainer(model, hook="backbone", nuisance="auto", pmh_config=PMHConfig.balanced())
trainer.fit(train_loader, source_batches=src_loader, target_batches=tgt_loader, epochs=20)

Getting started · Choose setup · Troubleshooting · 18 walkthroughs


Problem, object, repair, unification

Problem ERM uses every input direction that predicts labels—including nuisances harmful at deployment (lighting, site, sensor noise, formatting, renameable identifiers, …).
Object Sigma_task = covariance of label-preserving deployment nuisance n (under law Q_n).
Repair Matched PMH shrinks encoder sensitivity along Sigma_task, not uniformly (isotropic PMH / generic VAT).
Unification CORAL, domain Grams, augmentation stacks, metric-learning directions, adversarial subspaces, and style Grams are different estimators of the same Sigma_task (Lemma D1–D7).

Matched loss (schematic): L = L_task + lambda * Tr(J_phi^T J_phi Sigma') with range(Sigma') covering range(Sigma_task). Details: THEORY.md.


How it fits your codebase

 Phase A (once)              Phase B (every step)
 ----------------              --------------------
 source/target data    ->      x, y from your loader
       |                            |
 encoder (eval)        ->      encoder (train) -> h
       |                            |
 estimate D1-D7        ->      L_task(h, y) + PMHLoss(h, Sigma_hat)
       |
 artifact.pt
You keep Library adds
Model, optimizer, task loss SigmaTaskConfig, estimate_from_config
Data loaders collect_features (optional)
Training loop / Trainer PMHLoss.capped_total or PMHTrainer

Walkthroughs (18 templates)

# Guide Run
1 PyTorch + D4 examples/01_domain_shift_d4.py
2 ResNet + D4 examples/12_resnet_hook_d4.py
3 Office-31 + sklearn examples/06_office31_sklearn.py
4 Multi-layer CNN examples/07_vision_multilayer.py
5 Compositional D5 examples/13_compositional_train_d5.py
6 LLM style D7 examples/08_hf_style_d7.py
7 HF Trainer + DPO examples/11_dpo_lora_style_pmh.py
8 Falsification controls examples/04_falsification_controls.py
9 CLI JSON jobs pmh-train estimate --config ...
10 Lightning examples/09_lightning_module.py
11 Temporal D6 API in guide
12 ViT / CLS + D4 examples/14_vit_cls_d4.py
13 Speech encoder + D4 examples/15_speech_encoder_d4.py
14 QM9 / molecules D5 examples/16_qm9_molecule_d5.py
15 Code / tokens D5 examples/17_code_tokens_d5.py
16 Augmentations D3 examples/18_augmentation_d3.py
17 Compare arms on your pipeline examples/20_compare_training_arms.py
18 PMHTrainer quickstart examples/01_domain_shift_d4.py

Estimators at a glance (D1–D7)

Deployment story Method SigmaTaskConfig
Different site / camera / corpus; P(y given x) stable D4 SigmaTaskConfig.for_domain(rank=32)
Low-rank shift; labels on both domains D1 SigmaTaskConfig.for_subspace(rank=32)
Unstructured sensor / acquisition noise D2 SigmaTaskConfig.for_isotropic(dim, noise_level)
Known augmentation modes (color, blur, crop, …) D3 SigmaTaskConfig.for_augmentation() + aug_deltas
Nuisance on specific coordinates (atoms, tokens) D5 SigmaTaskConfig.for_compositional(indices)
Drift along time within a sequence D6 SigmaTaskConfig.for_temporal()
LLM style / format; semantics fixed D7 SigmaTaskConfig.for_alignment(rank=32)
pmh-train list-methods

Hybrid nuisances: estimate separate Sigma matrices and add separate PMHLoss terms.


Install

pip install matching-pmh
Extra Use case
pip install "matching-pmh[vision]" ResNet / ViT examples
pip install "matching-pmh[hf]" D7 style Gram (Transformers)
pip install "matching-pmh[hf-lora]" LoRA + DPO example
pip install "matching-pmh[sklearn,vision]" Office-31 pipeline
pip install "matching-pmh[lightning]" Lightning callback
pip install "matching-pmh[all]" Development + docs

From source:

git clone https://github.com/vishalstark512/matching-pmh.git
cd matching-pmh && pip install -e ".[dev]" && pytest -q

Documentation

Document Purpose
GETTING_STARTED.md Main adoption guide (start here)
CHOOSE_YOUR_SETUP.md Pick API by stack and data
TROUBLESHOOTING.md Errors, preflight, hook dim
gallery/ Copy-paste: vision / tabular / NLP
hooks.md ResNet, timm, HF hooks
ADAPT_YOUR_PIPELINE.md Integration checklist
walkthroughs/ 18 stack-specific tutorials
THEORY.md Mathematics

Citation

Cite the Grand Unification / Matching Principle manuscript. See CITATION.cff in the repository.

@software{matching_pmh,
  title  = {matching-pmh: Matched PMH training from estimated deployment nuisance geometry},
  author = {Rajput, Vishal},
  year   = {2026},
  url    = {https://github.com/vishalstark512/matching-pmh}
}

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_pmh-1.2.0.tar.gz (116.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matching_pmh-1.2.0-py3-none-any.whl (70.1 kB view details)

Uploaded Python 3

File details

Details for the file matching_pmh-1.2.0.tar.gz.

File metadata

  • Download URL: matching_pmh-1.2.0.tar.gz
  • Upload date:
  • Size: 116.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-1.2.0.tar.gz
Algorithm Hash digest
SHA256 5a1cc79b2db0b9ab89d7d17d56618c4f37616e62c437298c169ed6d047fcbece
MD5 4d3f6d6a0ea6ef41785dab63b7741a0e
BLAKE2b-256 023a86e656c5e8229aa61a5f87d23dc71b70ae028e0d076815625f77095c9664

See more details on using hashes here.

File details

Details for the file matching_pmh-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: matching_pmh-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 70.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4d77c0870552d18c82291240be4d8d2a1972c3e6e60b85038776059863ac8f0
MD5 e05d165c1cc2ba4f8ca4f0c735b6b675
BLAKE2b-256 5555c9f46a939b272b3e786611f5f285d89db658aaac1ef08baf633b8dd4aaaf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page