Skip to main content

Architecture-agnostic matching principle: estimate Sigma_task (D1-D7) and train any encoder with matched PMH penalties

Project description

matching-pmh

Deployment geometry in. Matched robustness out.
Estimate Σ_task (D1–D7) · train any encoder with matched PMH · falsify with controls

PyPI Python License CI

PyPI · GitHub · Walkthroughs · Theory · Integration · Quickstart


matching-pmh is a research-grade PyTorch library for the Matching Principle: name what changes at deployment without changing the label, estimate that nuisance geometry $\Sigma_{\mathrm{task}}$, and add a matched Jacobian penalty on your representations $h=\phi_\theta(x)$—ResNet, ViT, GNN, Whisper-style encoders, causal LMs with LoRA, or frozen features + sklearn.

Design goal: two phases, one hook tensor h, no framework lock-in. The paper’s thirteen task blocks are validation examples; this repo is built so a new lab member can integrate in an afternoon.


30-second start

pip install matching-pmh
python examples/01_domain_shift_d4.py          # minimal PyTorch loop
pmh-train list-methods                         # D1–D7 catalog
import pmh
from pmh import SigmaTaskConfig, PMHConfig, PMHLoss, collect_features, estimate_from_config

# Phase A — estimate (frozen encoder)
artifact = estimate_from_config(SigmaTaskConfig.for_domain(rank=32), h_source, h_target)
artifact.save("artifacts/sigma")

# Phase B — train (your loop)
pmh_loss = PMHLoss(artifact, PMHConfig(weight=0.3, cap_ratio=0.3, warmup_epochs=2))
total, _ = pmh_loss.capped_total(task_loss, h)

→ Full path: docs/QUICKSTART.md · Pick your stack: walkthroughs


Problem → object → repair → unification

Problem. ERM uses every input direction that predicts training labels—including nuisances harmful at deployment (lighting, site, sensor noise, answer formatting, renameable identifiers, …).

Object.

$$ \Sigma_{\mathrm{task}} = \mathrm{Cov}_{Q_n}(n) $$

for label-preserving deployment nuisance $n \sim Q_n$.

Repair. Matched PMH shrinks the encoder Jacobian along $\Sigma_{\mathrm{task}}$, not uniformly (isotropic PMH / generic VAT):

$$ \mathcal{L} = \mathcal{L}{\mathrm{task}} + \lambda ,\mathbb{E}x\left[\mathrm{Tr}\left(J\phi(x)^\top J\phi(x),\Sigma'\right)\right], \quad \mathrm{range}(\Sigma') \supseteq \mathrm{range}(\Sigma_{\mathrm{task}}). $$

Unification. CORAL, domain Grams, augmentation stacks, metric-learning directions, adversarial subspaces, and style Grams are estimators of the same object (D1–D7); matched PMH is one loss with $\Sigma' \approx \hat\Sigma_{\mathrm{task}}$.


How it fits your codebase

 Phase A (once)              Phase B (every step)
 ───────────────              ────────────────────
 source/target data    →      x, y ~ your loader
       ↓                            ↓
 encoder (eval)        →      encoder (train) → h
       ↓                            ↓
 estimate D1–D7        →      L_task(h, y) + PMHLoss(h, Σ̂)
       ↓
 artifact.pt
You keep Library adds
Model, optimizer, task loss SigmaTaskConfig, estimate_from_config
Data loaders collect_features (optional)
Training loop / Trainer PMHLoss.capped_total or PMHTrainer

Walkthroughs (16 guides)

# Guide Paper block Run
1 PyTorch + D4 Generic examples/01_domain_shift_d4.py
2 ResNet + D4 Vision examples/12_resnet_hook_d4.py
3 Office-31 + sklearn T1 examples/06_office31_sklearn.py
4 Multi-layer CNN T2 examples/07_vision_multilayer.py
5 Compositional D5 T5 examples/13_compositional_train_d5.py
6 LLM style D7 T7A examples/08_hf_style_d7.py
7 HF Trainer + DPO T7A examples/11_dpo_lora_style_pmh.py
8 Falsification controls All examples/04_falsification_controls.py
9 CLI JSON jobs Repro pmh-train estimate --config …
10 Lightning examples/09_lightning_module.py
11 Temporal D6 T6B API in guide
12 ViT / CLS + D4 T2 ViT examples/14_vit_cls_d4.py
13 Speech encoder + D4 T6A examples/15_speech_encoder_d4.py
14 QM9 / molecules D5 T5A examples/16_qm9_molecule_d5.py
15 Code / tokens D5 T5B examples/17_code_tokens_d5.py
16 Augmentations D3 T2 aug examples/18_augmentation_d3.py

Index: docs/walkthroughs/index.md · Example catalog: examples/README.md


Estimators at a glance (D1–D7)

Story Method SigmaTaskConfig
Domain / site; $P(y\mid x)$ stable D4 for_domain(rank=…)
Low-rank shift + labels D1 for_subspace(rank=…)
Unstructured noise D2 for_isotropic(dim, noise_level)
Known aug modes D3 for_augmentation() + aug_deltas
Nuisance coordinates (atoms, tokens) D5 for_compositional(indices)
Temporal drift in window D6 for_temporal()
LLM style vs fixed content D7 for_alignment(rank=…)
pmh-train list-methods

Install

pip install matching-pmh
Extra Use case
[vision] ResNet / ViT walkthroughs
[hf] D7 style Gram (Transformers)
[hf-lora] LoRA + DPO example
[sklearn,vision] Office-31 pipeline
[lightning] LightningModule callback
[all] Development + docs

From source (contributors):

git clone https://github.com/vishalstark512/matching-pmh.git
cd matching-pmh
pip install -e ".[dev,all]"
pytest -q

Documentation

Document Purpose
QUICKSTART.md First successful run in 10 minutes
THEORY.md $\Sigma_{\mathrm{task}}$, recipe, falsification
ARCHITECTURES.md Hook points per stack
PHILOSOPHY.md Design principles for integrators
walkthroughs/ End-to-end guides
nuisance_types.md Data formats
cli.md pmh-train reference

Citation

If you use this software, cite the Grand Unification / Matching Principle manuscript (CITATION.cff).

@software{matching_pmh,
  title  = {matching-pmh: Matched PMH training from estimated deployment nuisance geometry},
  author = {Rajput, Vishal},
  year   = {2026},
  url    = {https://github.com/vishalstark512/matching-pmh}
}

Contributing

We welcome issues, walkthrough improvements, and estimator integrations. See CONTRIBUTING.md.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_pmh-0.7.0.tar.gz (82.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matching_pmh-0.7.0-py3-none-any.whl (43.9 kB view details)

Uploaded Python 3

File details

Details for the file matching_pmh-0.7.0.tar.gz.

File metadata

  • Download URL: matching_pmh-0.7.0.tar.gz
  • Upload date:
  • Size: 82.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-0.7.0.tar.gz
Algorithm Hash digest
SHA256 b7488a1335d6da9492fd25ae58ca86bbe6ff4cbebbe1c645ca44745518562834
MD5 2b100fe093af6135aee4ee38c8c2ceb6
BLAKE2b-256 5671d1bdeb0e53ce577d81363b5ba848df42bbcf5e61f587ea8f6c6e2db7daed

See more details on using hashes here.

File details

Details for the file matching_pmh-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: matching_pmh-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 43.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2e2a2305ada5c4fabcddf1b357a5ac412c052547289ca464e443bbba312b121e
MD5 f18e00e56fe592d167256d71762e0895
BLAKE2b-256 d3c7b48c50a396cad8392ce4a4be3cbc251dee84408bbeaddee6a06f46dc6a0d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page