Skip to main content

Architecture-agnostic matching principle: estimate Sigma_task (D1-D7) and train any encoder with matched PMH penalties

Project description

matching-pmh

Deployment geometry in. Matched robustness out.

Estimate Sigma_task (D1–D7) · train any encoder with matched PMH · falsify with controls

PyPI Python License: MIT CI

PyPI · GitHub · Walkthroughs · Theory · Quickstart


matching-pmh is a research-grade PyTorch library for the Matching Principle:

  1. Name what changes at deployment without changing the label.
  2. Estimate that nuisance geometry Sigma_task (covariance of label-preserving deployment variation).
  3. Add a matched Jacobian penalty on your representations h = phi_theta(x).

Works with your stack: ResNet, ViT, GNN, Whisper-style encoders, causal LMs with LoRA, or frozen features + sklearn. Full math (LaTeX): docs/THEORY.md.

Design goal: two phases, one hook tensor h, no framework lock-in—not a paper reproduction kit. Your data, your architecture, your trainer. Start: Adapt your pipeline.


30-second start

pip install matching-pmh
python examples/01_domain_shift_d4.py
pmh-train list-methods
import pmh
from pmh import SigmaTaskConfig, PMHConfig, PMHLoss, collect_features, estimate_from_config

# Phase A — estimate (frozen encoder)
artifact = estimate_from_config(SigmaTaskConfig.for_domain(rank=32), h_source, h_target)
artifact.save("artifacts/sigma")

# Phase B — train (your loop)
pmh_loss = PMHLoss(artifact, PMHConfig(weight=0.3, cap_ratio=0.3, warmup_epochs=2))
total, _ = pmh_loss.capped_total(task_loss, h)

Adapt your pipeline · Quickstart · 17 walkthrough templates


Problem, object, repair, unification

Problem ERM uses every input direction that predicts labels—including nuisances harmful at deployment (lighting, site, sensor noise, formatting, renameable identifiers, …).
Object Sigma_task = covariance of label-preserving deployment nuisance n (under law Q_n).
Repair Matched PMH shrinks encoder sensitivity along Sigma_task, not uniformly (isotropic PMH / generic VAT).
Unification CORAL, domain Grams, augmentation stacks, metric-learning directions, adversarial subspaces, and style Grams are different estimators of the same Sigma_task (Lemma D1–D7).

Matched loss (schematic): L = L_task + lambda * Tr(J_phi^T J_phi Sigma') with range(Sigma') covering range(Sigma_task). Details: THEORY.md.


How it fits your codebase

 Phase A (once)              Phase B (every step)
 ----------------              --------------------
 source/target data    ->      x, y from your loader
       |                            |
 encoder (eval)        ->      encoder (train) -> h
       |                            |
 estimate D1-D7        ->      L_task(h, y) + PMHLoss(h, Sigma_hat)
       |
 artifact.pt
You keep Library adds
Model, optimizer, task loss SigmaTaskConfig, estimate_from_config
Data loaders collect_features (optional)
Training loop / Trainer PMHLoss.capped_total or PMHTrainer

Walkthroughs (17 templates)

# Guide Run
1 PyTorch + D4 examples/01_domain_shift_d4.py
2 ResNet + D4 examples/12_resnet_hook_d4.py
3 Office-31 + sklearn examples/06_office31_sklearn.py
4 Multi-layer CNN examples/07_vision_multilayer.py
5 Compositional D5 examples/13_compositional_train_d5.py
6 LLM style D7 examples/08_hf_style_d7.py
7 HF Trainer + DPO examples/11_dpo_lora_style_pmh.py
8 Falsification controls examples/04_falsification_controls.py
9 CLI JSON jobs pmh-train estimate --config ...
10 Lightning examples/09_lightning_module.py
11 Temporal D6 API in guide
12 ViT / CLS + D4 examples/14_vit_cls_d4.py
13 Speech encoder + D4 examples/15_speech_encoder_d4.py
14 QM9 / molecules D5 examples/16_qm9_molecule_d5.py
15 Code / tokens D5 examples/17_code_tokens_d5.py
16 Augmentations D3 examples/18_augmentation_d3.py
17 Compare arms on your pipeline examples/20_compare_training_arms.py

Estimators at a glance (D1–D7)

Deployment story Method SigmaTaskConfig
Different site / camera / corpus; P(y given x) stable D4 SigmaTaskConfig.for_domain(rank=32)
Low-rank shift; labels on both domains D1 SigmaTaskConfig.for_subspace(rank=32)
Unstructured sensor / acquisition noise D2 SigmaTaskConfig.for_isotropic(dim, noise_level)
Known augmentation modes (color, blur, crop, …) D3 SigmaTaskConfig.for_augmentation() + aug_deltas
Nuisance on specific coordinates (atoms, tokens) D5 SigmaTaskConfig.for_compositional(indices)
Drift along time within a sequence D6 SigmaTaskConfig.for_temporal()
LLM style / format; semantics fixed D7 SigmaTaskConfig.for_alignment(rank=32)
pmh-train list-methods

Hybrid nuisances: estimate separate Sigma matrices and add separate PMHLoss terms.


Install

pip install matching-pmh
Extra Use case
pip install "matching-pmh[vision]" ResNet / ViT examples
pip install "matching-pmh[hf]" D7 style Gram (Transformers)
pip install "matching-pmh[hf-lora]" LoRA + DPO example
pip install "matching-pmh[sklearn,vision]" Office-31 pipeline
pip install "matching-pmh[lightning]" Lightning callback
pip install "matching-pmh[all]" Development + docs

From source:

git clone https://github.com/vishalstark512/matching-pmh.git
cd matching-pmh && pip install -e ".[dev]" && pytest -q

Documentation

Document Purpose
ADAPT_YOUR_PIPELINE.md Plug into your data, model, trainer
QUICKSTART.md First run in 10 minutes
THEORY.md Full mathematics (LaTeX)
ARCHITECTURES.md Hook points per stack
walkthroughs/ 16 end-to-end guides

Citation

Cite the Grand Unification / Matching Principle manuscript. See CITATION.cff in the repository.

@software{matching_pmh,
  title  = {matching-pmh: Matched PMH training from estimated deployment nuisance geometry},
  author = {Rajput, Vishal},
  year   = {2026},
  url    = {https://github.com/vishalstark512/matching-pmh}
}

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_pmh-0.8.0.tar.gz (96.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matching_pmh-0.8.0-py3-none-any.whl (55.3 kB view details)

Uploaded Python 3

File details

Details for the file matching_pmh-0.8.0.tar.gz.

File metadata

  • Download URL: matching_pmh-0.8.0.tar.gz
  • Upload date:
  • Size: 96.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-0.8.0.tar.gz
Algorithm Hash digest
SHA256 c836cfcaabd3f4fd8866c3f7cf6bfb28e6c3798b75aa161bf12f4cb005a47584
MD5 2c3fae801107e745680377e3b9710a01
BLAKE2b-256 d51e139b9326ccc420b914a12d58d84030f9c5edea3eb938cfc11fe0f819eab7

See more details on using hashes here.

File details

Details for the file matching_pmh-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: matching_pmh-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 55.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 521848e054d8e96af4db0820d40b6aba3f3d0a63474c303691ab634f9c9632af
MD5 92e6e56bf6120234300047582f11d2f8
BLAKE2b-256 9abeda32e9fa72f8e13300cd605806cedbbf4a3e3f9e701967fe3845a54c91b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page