Skip to main content

Architecture-agnostic matching principle: estimate Sigma_task (D1-D7) and train any encoder with matched PMH penalties

Project description

matching-pmh

Deployment geometry in. Matched robustness out.

Estimate Sigma_task (D1–D7) · train any encoder with matched PMH · falsify with controls

PyPI Python License: MIT CI

PyPI · GitHub · Walkthroughs · Theory · Quickstart


matching-pmh is a research-grade PyTorch library for the Matching Principle:

  1. Name what changes at deployment without changing the label.
  2. Estimate that nuisance geometry Sigma_task (covariance of label-preserving deployment variation).
  3. Add a matched Jacobian penalty on your representations h = phi_theta(x).

Works with your stack: ResNet, ViT, GNN, Whisper-style encoders, causal LMs with LoRA, or frozen features + sklearn. Full math (LaTeX): docs/THEORY.md.

Design goal: two phases, one hook tensor h, no framework lock-in—not a paper reproduction kit. Your data, your architecture, your trainer. Start: Adapt your pipeline.


30-second start

pip install matching-pmh
python examples/01_domain_shift_d4.py
pmh-train list-methods
import pmh
from pmh import SigmaTaskConfig, PMHConfig, PMHLoss, collect_features, estimate_from_config

# Phase A — estimate (frozen encoder)
artifact = estimate_from_config(SigmaTaskConfig.for_domain(rank=32), h_source, h_target)
artifact.save("artifacts/sigma")

# Phase B — train (your loop)
pmh_loss = PMHLoss(artifact, PMHConfig(weight=0.3, cap_ratio=0.3, warmup_epochs=2))
total, _ = pmh_loss.capped_total(task_loss, h)

Adapt your pipeline · Quickstart · 17 walkthrough templates


Problem, object, repair, unification

Problem ERM uses every input direction that predicts labels—including nuisances harmful at deployment (lighting, site, sensor noise, formatting, renameable identifiers, …).
Object Sigma_task = covariance of label-preserving deployment nuisance n (under law Q_n).
Repair Matched PMH shrinks encoder sensitivity along Sigma_task, not uniformly (isotropic PMH / generic VAT).
Unification CORAL, domain Grams, augmentation stacks, metric-learning directions, adversarial subspaces, and style Grams are different estimators of the same Sigma_task (Lemma D1–D7).

Matched loss (schematic): L = L_task + lambda * Tr(J_phi^T J_phi Sigma') with range(Sigma') covering range(Sigma_task). Details: THEORY.md.


How it fits your codebase

 Phase A (once)              Phase B (every step)
 ----------------              --------------------
 source/target data    ->      x, y from your loader
       |                            |
 encoder (eval)        ->      encoder (train) -> h
       |                            |
 estimate D1-D7        ->      L_task(h, y) + PMHLoss(h, Sigma_hat)
       |
 artifact.pt
You keep Library adds
Model, optimizer, task loss SigmaTaskConfig, estimate_from_config
Data loaders collect_features (optional)
Training loop / Trainer PMHLoss.capped_total or PMHTrainer

Walkthroughs (17 templates)

# Guide Run
1 PyTorch + D4 examples/01_domain_shift_d4.py
2 ResNet + D4 examples/12_resnet_hook_d4.py
3 Office-31 + sklearn examples/06_office31_sklearn.py
4 Multi-layer CNN examples/07_vision_multilayer.py
5 Compositional D5 examples/13_compositional_train_d5.py
6 LLM style D7 examples/08_hf_style_d7.py
7 HF Trainer + DPO examples/11_dpo_lora_style_pmh.py
8 Falsification controls examples/04_falsification_controls.py
9 CLI JSON jobs pmh-train estimate --config ...
10 Lightning examples/09_lightning_module.py
11 Temporal D6 API in guide
12 ViT / CLS + D4 examples/14_vit_cls_d4.py
13 Speech encoder + D4 examples/15_speech_encoder_d4.py
14 QM9 / molecules D5 examples/16_qm9_molecule_d5.py
15 Code / tokens D5 examples/17_code_tokens_d5.py
16 Augmentations D3 examples/18_augmentation_d3.py
17 Compare arms on your pipeline examples/20_compare_training_arms.py

Estimators at a glance (D1–D7)

Deployment story Method SigmaTaskConfig
Different site / camera / corpus; P(y given x) stable D4 SigmaTaskConfig.for_domain(rank=32)
Low-rank shift; labels on both domains D1 SigmaTaskConfig.for_subspace(rank=32)
Unstructured sensor / acquisition noise D2 SigmaTaskConfig.for_isotropic(dim, noise_level)
Known augmentation modes (color, blur, crop, …) D3 SigmaTaskConfig.for_augmentation() + aug_deltas
Nuisance on specific coordinates (atoms, tokens) D5 SigmaTaskConfig.for_compositional(indices)
Drift along time within a sequence D6 SigmaTaskConfig.for_temporal()
LLM style / format; semantics fixed D7 SigmaTaskConfig.for_alignment(rank=32)
pmh-train list-methods

Hybrid nuisances: estimate separate Sigma matrices and add separate PMHLoss terms.


Install

pip install matching-pmh
Extra Use case
pip install "matching-pmh[vision]" ResNet / ViT examples
pip install "matching-pmh[hf]" D7 style Gram (Transformers)
pip install "matching-pmh[hf-lora]" LoRA + DPO example
pip install "matching-pmh[sklearn,vision]" Office-31 pipeline
pip install "matching-pmh[lightning]" Lightning callback
pip install "matching-pmh[all]" Development + docs

From source:

git clone https://github.com/vishalstark512/matching-pmh.git
cd matching-pmh && pip install -e ".[dev]" && pytest -q

Documentation

Document Purpose
ADAPT_YOUR_PIPELINE.md Plug into your data, model, trainer
QUICKSTART.md First run in 10 minutes
THEORY.md Full mathematics (LaTeX)
ARCHITECTURES.md Hook points per stack
walkthroughs/ 16 end-to-end guides

Citation

Cite the Grand Unification / Matching Principle manuscript. See CITATION.cff in the repository.

@software{matching_pmh,
  title  = {matching-pmh: Matched PMH training from estimated deployment nuisance geometry},
  author = {Rajput, Vishal},
  year   = {2026},
  url    = {https://github.com/vishalstark512/matching-pmh}
}

Contributing

See CONTRIBUTING.md.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_pmh-0.7.2.tar.gz (92.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matching_pmh-0.7.2-py3-none-any.whl (51.4 kB view details)

Uploaded Python 3

File details

Details for the file matching_pmh-0.7.2.tar.gz.

File metadata

  • Download URL: matching_pmh-0.7.2.tar.gz
  • Upload date:
  • Size: 92.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-0.7.2.tar.gz
Algorithm Hash digest
SHA256 009c5b4b9c67c2e53a45ed18b5fce992d4aebd3561876d8d258dbe22625250f3
MD5 62f64f8f5e92d8f8826d5b18d309a567
BLAKE2b-256 633c92c5e9e8d9bfe93a86aebcf3649eed5637e3e2e561cc6c3d89aa7f116129

See more details on using hashes here.

File details

Details for the file matching_pmh-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: matching_pmh-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 51.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for matching_pmh-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e2f01487e5fac4c2eea5e68e8b6d2abfe8a02aa52536f7d7af1117d709b7b072
MD5 e538ecf4217d611ea29a7c5288728770
BLAKE2b-256 75730b06d0bbd5b3ebe184343c6e27d660a9d0bf50e2ab9111350f0cadd83972

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page