Peptide-MHC presentation, cross-reactivity and motif tools on the seqtree substrate
Project description
mhcmatch
Peptide–MHC presentation, cross-reactivity, and motif tools — the applied peptide–MHC layer on top
of the seqtree fuzzy-search substrate. mhcmatch
productionizes the reference seqtree.pmhc methodology (anchor-masked TCR-facing homology,
presentation-aware E-values, allele guessing) and adds a pseudosequence-based cross-allele
diffusion model that rescues rare alleles by borrowing from groove-similar frequent ones.
The mathematical/statistical theory is in appendix/mhcmatch.tex; the
development plan is in ROADMAP.md.
What it does (v0)
- MHC restriction & presentation — rank presenting alleles for a peptide (single / set / all, human & mouse), flag non-binders, and scan a whole protein for presented peptides.
- Large-scale similarity search — find similar peptides across big sets / proteomes, either by same-MHC binding (presentation signature) or similar TCR recognition (anchor-masked, TCR-facing); neoantigen molecular mimicry with per-allele E-values.
- Anchor / TCR-facing split — decompose a peptide into anchor and TCR-facing parts (
Xmasks). - Near-exact source lookup — find the self peptide a neoantigen derives from + its parent protein / mutated position, against a reference proteome.
- Motif logos — per-allele information-content logos with length distributions.
- Pseudosequence diffusion — allele similarity, clustering, and kernel-shrinkage pooling over 34-mer groove pseudosequences (rare-allele rescue).
Install
bash setup.sh # repo-local .venv + editable install (uses sibling ../seqtree if present)
bash setup.sh --tests # + pytest
bash setup.sh --logo # + logomaker/matplotlib for rendering logos
Quickstart
import mhcmatch
# build from the isalgo/pmhc_data table (full or shortlist tier)
store = mhcmatch.Store.from_pmhc("pmhc_full.tsv.gz", species="human")
store.restriction("NLVPMVATV") # ranked presenting alleles + binder flags
store.is_binder("NLVPMVATV", "HLA-A*02:01")
store.scan_protein(my_protein, cls="mhc1") # presented peptides in a protein
store.decompose("NLVPMVATV", cls="mhc1") # (tcr_facing, presentation) with X masks
# similarity at scale
mhcmatch.search.search("NLVPMVATV", big_peptide_set, mode="tcr") # TCR-facing homologs
mhcmatch.search.find_mimics("EAAGIGILTV", self_set, bacterial_sets={...})
# near-exact source of a neoantigen
pm = mhcmatch.Proteome.from_fasta("UP000005640_9606.fasta.gz")
pm.find_source("NLVPMVATV", max_subs=1)
# pseudosequence allele similarity + rare-allele diffusion
ps = mhcmatch.Pseudoseq("mhc1")
ps.neighbors("HLA-A*02:01", candidates=store.alleles("mhc1"))
# diffusion-powered forward scorer (rescues rare alleles by borrowing from groove-neighbours)
am = store.anchor_model("mhc1") # learned anchor weights + bounded-prior shrinkage
am.score("NLVPMVATV", "HLA-A*02:01") # anchor log-odds; am.score(..., raw=True) disables borrowing
mhcmatch.logo.motif(store, "HLA-A*02:01", "mhc1")
Command line
mhcmatch decompose NLVPMVATV # anchor / TCR-facing split (no data)
set -x MHCMATCH_PMHC /path/to/pmhc_data # or pass --pmhc to each command
mhcmatch restriction NLVPMVATV --allele 'A*02:01' --diffuse # allele name auto-resolved; rare-aware
mhcmatch scan my_protein.fasta --correction bh # presented windows, BH-FDR controlled
mhcmatch source MKTAYIAKW --proteome UP000005640_9606.fasta.gz
mhcmatch logo 'HLA-A*02:01'
Data
- Reference ligands:
isalgo/pmhc_data(full / shortlist tiers) — pass the path toStore.from_pmhcor setMHCMATCH_PMHC. - Pseudosequences: 34-mer groove pseudosequences vendored in
src/mhcmatch/data/(see itsPROVENANCE.md). - Reference proteomes: not bundled — supply a UniProt reference proteome FASTA
(UP000005640 human / UP000000589 mouse) to
Proteome.from_fasta.
Status
Alpha (v0). See ROADMAP.md for phased plans (tuned thresholds, learned anchor
weights, future stability/affinity/cleavage/immunogenicity predictors, and the NetMHCpan /
MixMHCpred benchmark).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mhcmatch-0.1.0.tar.gz.
File metadata
- Download URL: mhcmatch-0.1.0.tar.gz
- Upload date:
- Size: 997.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c9025850158aa483f34f1ba80ef61e19f06dcb5583eb2e9b15ab373e34dae5d
|
|
| MD5 |
29f95ce82f429cbfc91400c6df32e02a
|
|
| BLAKE2b-256 |
bc8f37d74c6a22b7abdde86f619b3790c5968dfcba9c5a63e72715bc48c1b4a4
|
Provenance
The following attestation bundles were made for mhcmatch-0.1.0.tar.gz:
Publisher:
publish.yml on antigenomics/mhcmatch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhcmatch-0.1.0.tar.gz -
Subject digest:
2c9025850158aa483f34f1ba80ef61e19f06dcb5583eb2e9b15ab373e34dae5d - Sigstore transparency entry: 1916210284
- Sigstore integration time:
-
Permalink:
antigenomics/mhcmatch@6b8233944f8106ae34e9be5278608b4106fdc408 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/antigenomics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6b8233944f8106ae34e9be5278608b4106fdc408 -
Trigger Event:
release
-
Statement type:
File details
Details for the file mhcmatch-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mhcmatch-0.1.0-py3-none-any.whl
- Upload date:
- Size: 85.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
274a55b5af8569f0ddf997f6347a41d4909eba9950c28d73e86675b5fbf9eebf
|
|
| MD5 |
fa2e6d198b9ba0c8f970351d334d59d6
|
|
| BLAKE2b-256 |
c16742dd54215b2bd7a17e444785c0c7a9fb1b3bea94c12dcfd7abcc16cc97a0
|
Provenance
The following attestation bundles were made for mhcmatch-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on antigenomics/mhcmatch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhcmatch-0.1.0-py3-none-any.whl -
Subject digest:
274a55b5af8569f0ddf997f6347a41d4909eba9950c28d73e86675b5fbf9eebf - Sigstore transparency entry: 1916210778
- Sigstore integration time:
-
Permalink:
antigenomics/mhcmatch@6b8233944f8106ae34e9be5278608b4106fdc408 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/antigenomics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6b8233944f8106ae34e9be5278608b4106fdc408 -
Trigger Event:
release
-
Statement type: