Skip to main content

Peptide-MHC presentation, cross-reactivity and motif tools on the seqtree substrate

Project description

mhcmatch

PyPI CI Docs Python License

Peptide–MHC presentation, cross-reactivity, and motif tools — the applied peptide–MHC layer on top of the seqtree fuzzy-search substrate. mhcmatch productionizes the reference seqtree.pmhc methodology (anchor-masked TCR-facing homology, presentation-aware E-values, allele guessing) and adds a pseudosequence-based cross-allele diffusion model that rescues rare alleles by borrowing from groove-similar frequent ones.

The mathematical/statistical theory is in appendix/mhcmatch.tex; the development plan is in ROADMAP.md.

What it does (v0)

  1. MHC restriction & presentation — rank presenting alleles for a peptide (single / set / all, human & mouse), flag non-binders, and scan a whole protein for presented peptides.
  2. Large-scale similarity search — find similar peptides across big sets / proteomes, either by same-MHC binding (presentation signature) or similar TCR recognition (anchor-masked, TCR-facing); neoantigen molecular mimicry with per-allele E-values.
  3. Anchor / TCR-facing split — decompose a peptide into anchor and TCR-facing parts (X masks).
  4. Near-exact source lookup — find the self peptide a neoantigen derives from + its parent protein / mutated position, against a reference proteome.
  5. Motif logos — per-allele information-content logos with length distributions.
  6. Pseudosequence diffusion — allele similarity, clustering, and kernel-shrinkage pooling over 34-mer groove pseudosequences (rare-allele rescue).

Install

bash setup.sh            # repo-local .venv + editable install (uses sibling ../seqtree if present)
bash setup.sh --tests    # + pytest
bash setup.sh --logo     # + logomaker/matplotlib for rendering logos

Quickstart

import mhcmatch

# build from the isalgo/pmhc_data table (full or shortlist tier)
store = mhcmatch.Store.from_pmhc("pmhc_full.tsv.gz", species="human")

store.restriction("NLVPMVATV")                  # ranked presenting alleles + binder flags
store.is_binder("NLVPMVATV", "HLA-A*02:01")
store.scan_protein(my_protein, cls="mhc1")       # presented peptides in a protein
store.decompose("NLVPMVATV", cls="mhc1")         # (tcr_facing, presentation) with X masks

# similarity at scale
mhcmatch.search.search("NLVPMVATV", big_peptide_set, mode="tcr")   # TCR-facing homologs
mhcmatch.search.find_mimics("EAAGIGILTV", self_set, bacterial_sets={...})

# near-exact source of a neoantigen
pm = mhcmatch.Proteome.from_fasta("UP000005640_9606.fasta.gz")
pm.find_source("NLVPMVATV", max_subs=1)

# pseudosequence allele similarity + rare-allele diffusion
ps = mhcmatch.Pseudoseq("mhc1")
ps.neighbors("HLA-A*02:01", candidates=store.alleles("mhc1"))

# diffusion-powered forward scorer (rescues rare alleles by borrowing from groove-neighbours)
am = store.anchor_model("mhc1")          # learned anchor weights + bounded-prior shrinkage
am.score("NLVPMVATV", "HLA-A*02:01")     # anchor log-odds; am.score(..., raw=True) disables borrowing

mhcmatch.logo.motif(store, "HLA-A*02:01", "mhc1")

Command line

mhcmatch decompose NLVPMVATV                                  # anchor / TCR-facing split (no data)
set -x MHCMATCH_PMHC /path/to/pmhc_data                       # or pass --pmhc to each command
mhcmatch restriction NLVPMVATV --allele 'A*02:01' --diffuse   # allele name auto-resolved; rare-aware
mhcmatch scan my_protein.fasta --correction bh                # presented windows, BH-FDR controlled
mhcmatch source MKTAYIAKW --proteome UP000005640_9606.fasta.gz
mhcmatch logo 'HLA-A*02:01'

Data

  • Reference ligands: isalgo/pmhc_data (full / shortlist tiers) — pass the path to Store.from_pmhc or set MHCMATCH_PMHC.
  • Pseudosequences: 34-mer groove pseudosequences vendored in src/mhcmatch/data/ (see its PROVENANCE.md).
  • Reference proteomes: not bundled — supply a UniProt reference proteome FASTA (UP000005640 human / UP000000589 mouse) to Proteome.from_fasta.

Status

Alpha (v0). See ROADMAP.md for phased plans (tuned thresholds, learned anchor weights, future stability/affinity/cleavage/immunogenicity predictors, and the NetMHCpan / MixMHCpred benchmark).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhcmatch-0.1.0.tar.gz (997.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mhcmatch-0.1.0-py3-none-any.whl (85.7 kB view details)

Uploaded Python 3

File details

Details for the file mhcmatch-0.1.0.tar.gz.

File metadata

  • Download URL: mhcmatch-0.1.0.tar.gz
  • Upload date:
  • Size: 997.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mhcmatch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2c9025850158aa483f34f1ba80ef61e19f06dcb5583eb2e9b15ab373e34dae5d
MD5 29f95ce82f429cbfc91400c6df32e02a
BLAKE2b-256 bc8f37d74c6a22b7abdde86f619b3790c5968dfcba9c5a63e72715bc48c1b4a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhcmatch-0.1.0.tar.gz:

Publisher: publish.yml on antigenomics/mhcmatch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mhcmatch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mhcmatch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 85.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mhcmatch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 274a55b5af8569f0ddf997f6347a41d4909eba9950c28d73e86675b5fbf9eebf
MD5 fa2e6d198b9ba0c8f970351d334d59d6
BLAKE2b-256 c16742dd54215b2bd7a17e444785c0c7a9fb1b3bea94c12dcfd7abcc16cc97a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for mhcmatch-0.1.0-py3-none-any.whl:

Publisher: publish.yml on antigenomics/mhcmatch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page