Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors
Project description
mhctools
Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors.
Installation
pip install mhctools
For MHCflurry support, also run:
mhcflurry-downloads fetch
Quick start
from mhctools import NetMHCpan41
predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
# predict() returns a list of PeptideResult — one per peptide
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
for r in results:
if r.affinity:
print(f"{r.peptide} -> {r.affinity.allele} IC50={r.affinity.value:.1f}nM")
Data model
predict() returns a list of PeptideResult — one per peptide. Each
result carries the peptide string and provides accessors for each
prediction kind (affinity, presentation, stability, etc.). Accessors
return None when a predictor doesn't produce that kind.
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]
r.peptide # "SIINFEKL"
r.affinity.value # IC50 in nM
r.affinity.percentile_rank # 0-100, lower = better
r.affinity.allele # best allele for this kind
r.presentation # None if predictor doesn't produce it
Under the hood, each PeptideResult wraps a tuple of Pred objects —
frozen dataclasses, one per allele-kind combination. Everything converts
to DataFrames with consistent column names.
Python API
Predicting peptides
from mhctools import NetMHCpan41
predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]
r.peptide # "SIINFEKL"
r.offset # position in source protein (if scanned)
r.kinds # {"pMHC_affinity", "pMHC_presentation"}
r.alleles # {"HLA-A*02:01", "HLA-B*07:02"}
# best prediction by kind — None when the kind is absent
r.affinity # Pred or None
r.presentation # Pred or None
r.stability # None (predictor doesn't produce it)
if r.affinity:
r.affinity.value # IC50 in nM
r.affinity.percentile_rank # 0-100, lower = better
r.affinity.score # ~0-1, higher = better
r.affinity.allele # best allele for this kind
# by rank instead of score
r.best_affinity_by_rank # Pred with lowest percentile rank, or None
# all predictions
r.preds # tuple of all Pred objects
r.filter(kind="pMHC_affinity")
r.filter(allele="HLA-A*02:01")
NetMHCpan 4.1 automatically emits both pMHC_affinity and pMHC_presentation
predictions per peptide-allele pair.
Scanning proteins
predict_proteins() takes a dictionary of protein sequences and returns
{sequence_name: list[PeptideResult]}:
proteins = predictor.predict_proteins(
{"TP53": "MEEPQSDPSVEPPLSQETFS...", "KRAS": "MTEYKLVVVGAGGVGKS..."},
peptide_lengths=[9, 10],
)
for r in proteins["TP53"]:
if r.affinity and r.affinity.value < 500:
print(f" offset={r.offset} {r.peptide} IC50={r.affinity.value:.0f}")
DataFrames
Every level has a _dataframe variant that flattens to a pandas DataFrame
with consistent columns:
df = predictor.predict_dataframe(["SIINFEKL"], sample_name="pat001")
df = predictor.predict_proteins_dataframe({"TP53": "MEEPQ..."}, sample_name="pat001")
Columns: sample_name, peptide, n_flank, c_flank,
source_sequence_name, offset, predictor_name, predictor_version,
allele, kind, score, value, percentile_rank.
Multi-sample predictions
MultiSample runs a predictor across multiple samples, each with its own
HLA genotype:
from mhctools import MultiSample, NetMHCpan41
ms = MultiSample(
samples={
"pat001": ["HLA-A*02:01", "HLA-B*07:02"],
"pat002": ["HLA-A*01:01", "HLA-B*08:01"],
},
predictor_class=NetMHCpan41,
)
# {sample_name: list[PeptideResult]}
results = ms.predict(["SIINFEKL", "GILGFVFTL"])
# {sample_name: {seq_name: list[PeptideResult]}}
protein_results = ms.predict_proteins({"TP53": "MEEPQ..."})
# flat DataFrames with sample_name column
df = ms.predict_dataframe(["SIINFEKL"])
df = ms.predict_proteins_dataframe({"TP53": "MEEPQ..."})
Measurement kinds
Each Pred has a kind string describing what it measures:
| Kind | Meaning |
|---|---|
pMHC_affinity |
Peptide-MHC binding affinity |
pMHC_presentation |
Likelihood of surface presentation (EL/processing) |
pMHC_stability |
Peptide-MHC complex stability |
immunogenicity |
T-cell immunogenicity |
antigen_processing |
Combined processing score |
proteasome_cleavage |
Proteasomal cleavage score |
tap_transport |
TAP transport score (reserved, not yet used) |
erap_trimming |
ERAP trimming score (reserved, not yet used) |
The Pred object
Every prediction is a frozen, self-contained Pred dataclass:
from mhctools import Pred
pred = Pred(
kind="pMHC_affinity",
score=0.85, # ~0-1, higher = better
peptide="SIINFEKL",
allele="HLA-A*02:01",
value=120.5, # IC50 in nM
percentile_rank=0.8,
source_sequence_name="TP53",
offset=42,
predictor_name="netMHCpan",
predictor_version="4.1",
)
score is always higher-is-better. value is in native units (nM for
affinity, hours for stability). percentile_rank is always optional,
0-100, lower = stronger.
Supported predictors
MHC binding & presentation
| Predictor | Kinds produced | Requires |
|---|---|---|
NetMHCpan / NetMHCpan41 / NetMHCpan42 |
affinity + presentation | NetMHCpan |
NetMHCpan4 |
affinity or presentation | NetMHCpan 4.0 |
NetMHCpan3 / NetMHCpan28 |
affinity | older NetMHCpan |
NetMHC / NetMHC3 / NetMHC4 |
affinity | NetMHC |
NetMHCIIpan / NetMHCIIpan43 |
affinity or presentation | NetMHCIIpan |
NetMHCcons |
affinity | NetMHCcons |
NetMHCstabpan |
stability | NetMHCstabpan |
MHCflurry |
affinity + presentation | pip install mhcflurry + mhcflurry-downloads fetch |
BigMHC |
presentation or immunogenicity | BigMHC clone (set BIGMHC_DIR) |
MixMHCpred |
presentation | MixMHCpred |
IedbNetMHCpan / IedbSMM / IedbNetMHCIIpan |
affinity | IEDB web API |
RandomBindingPredictor |
affinity | (built-in) |
Antigen processing
| Predictor | Kinds produced | Requires |
|---|---|---|
Pepsickle |
proteasome cleavage | pip install pepsickle (paper) |
NetChop |
proteasome cleavage | NetChop |
Processing predictors use configurable scoring to aggregate per-position
cleavage probabilities into peptide-level scores. See ProcessingPredictor
and ProteasomePredictor for details.
Commandline examples
Prediction for user-supplied peptide sequences
mhctools --sequence SIINFEKL SIINFEKLQ --mhc-predictor netmhc --mhc-alleles A0201
Automatically extract peptides as subsequences of specified length
mhctools --sequence AAAQQQSIINFEKL --extract-subsequences --mhc-peptide-lengths 8-10 --mhc-predictor mhcflurry --mhc-alleles A0201
Legacy API
The old predict_peptides() and predict_subsequences() methods still work
and return BindingPredictionCollection objects:
predictor = NetMHCpan(alleles=["A*02:01"])
collection = predictor.predict_subsequences(
{"1L2Y": "NLYIQWLKDGGPSSGRPPPS"},
peptide_lengths=[9],
)
df = collection.to_dataframe()
for bp in collection:
if bp.affinity < 100:
print("Strong binder: %s" % bp)
To convert legacy results to the new types:
preds = collection.to_preds() # list of Pred
pp_list = collection.to_peptide_preds() # list of PeptideResult
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mhctools-3.8.2.tar.gz.
File metadata
- Download URL: mhctools-3.8.2.tar.gz
- Upload date:
- Size: 88.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5d1188c5e6a3922c66dbf22791eedd1567788a5c4dcf9fb9c48241e7975b7b3
|
|
| MD5 |
077d527e5d59f79b1ff11ea4e92002d0
|
|
| BLAKE2b-256 |
d8a6d82cfad250a111ad20b4f02faeb21154b1cd72b52afb49b90f7169f0a719
|
File details
Details for the file mhctools-3.8.2-py3-none-any.whl.
File metadata
- Download URL: mhctools-3.8.2-py3-none-any.whl
- Upload date:
- Size: 82.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c043282056c7154edc5cc0827712861602b2aca581554a364f7ce4a48c9f12e0
|
|
| MD5 |
07ef534ab3f9650b080deb413a8fcf19
|
|
| BLAKE2b-256 |
0c1fe4539e77f546107aba02a71a82d58725ed55eb4a0d918a2baee7cc0e2959
|