Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors
Project description
mhctools
Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors.
Installation
pip install mhctools
For MHCflurry support, also run:
mhcflurry-downloads fetch
Quick start
from mhctools import NetMHCpan41
predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
# predict for specific peptides
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
# results is a list of PeptidePreds — one per peptide
for pp in results:
best = pp.best_affinity
if best:
print(f"{best.peptide} -> {best.allele} IC50={best.value:.1f}nM")
Python API
Predicting peptides
predict() takes a list of peptide sequences and returns a list[PeptidePreds].
Each PeptidePreds contains Pred objects for every allele and measurement
kind the predictor supports.
from mhctools import NetMHCpan41
predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
pp = results[0]
pp.best_affinity # Pred with highest affinity score
pp.best_affinity.allele # "HLA-A*02:01"
pp.best_affinity.value # IC50 in nM
pp.best_affinity.score # higher = better (~0-1)
pp.best_affinity.percentile_rank # lower = better (0-100)
pp.best_affinity_by_rank # Pred with lowest percentile rank
pp.best_presentation # best EL/presentation score
pp.best_presentation_by_rank # best EL percentile rank
pp.best_stability # best pMHC stability (if available)
pp.best_stability_by_rank
# filter by kind or allele
pp.filter(kind=Kind.pMHC_affinity)
pp.filter(allele="HLA-A*02:01")
NetMHCpan 4.1 automatically emits both pMHC_affinity and pMHC_presentation
predictions per peptide-allele pair.
Scanning proteins
predict_proteins() takes a dictionary of protein sequences and returns
{sequence_name: list[PeptidePreds]}:
proteins = predictor.predict_proteins(
{"TP53": "MEEPQSDPSVEPPLSQETFS...", "KRAS": "MTEYKLVVVGAGGVGKS..."},
peptide_lengths=[9, 10],
)
for pp in proteins["TP53"]:
best = pp.best_affinity
if best and best.value < 500:
print(f" offset={best.offset} {best.peptide} IC50={best.value:.0f}")
DataFrames
Every level has a _dataframe variant that flattens to a pandas DataFrame
with consistent columns:
df = predictor.predict_dataframe(["SIINFEKL"], sample_name="pat001")
df = predictor.predict_proteins_dataframe({"TP53": "MEEPQ..."}, sample_name="pat001")
Columns: sample_name, peptide, n_flank, c_flank,
source_sequence_name, offset, predictor_name, predictor_version,
allele, kind, score, value, percentile_rank.
Multi-sample predictions
MultiSample runs a predictor across multiple samples, each with its own
HLA genotype:
from mhctools import MultiSample, NetMHCpan41
ms = MultiSample(
samples={
"pat001": ["HLA-A*02:01", "HLA-B*07:02"],
"pat002": ["HLA-A*01:01", "HLA-B*08:01"],
},
predictor_class=NetMHCpan41,
)
# {sample_name: list[PeptidePreds]}
results = ms.predict(["SIINFEKL", "GILGFVFTL"])
# {sample_name: {seq_name: list[PeptidePreds]}}
protein_results = ms.predict_proteins({"TP53": "MEEPQ..."})
# flat DataFrames with sample_name column
df = ms.predict_dataframe(["SIINFEKL"])
df = ms.predict_proteins_dataframe({"TP53": "MEEPQ..."})
Measurement kinds
The Kind enum describes what biological quantity a Pred measures:
| Kind | Meaning |
|---|---|
pMHC_affinity |
Peptide-MHC binding affinity |
pMHC_presentation |
Likelihood of surface presentation (EL/processing) |
pMHC_stability |
Peptide-MHC complex stability |
immunogenicity |
T-cell immunogenicity |
antigen_processing |
Combined processing score |
proteasome_cleavage |
Proteasomal cleavage score |
tap_transport |
TAP transport score (reserved, not yet used) |
erap_trimming |
ERAP trimming score (reserved, not yet used) |
The Pred object
Every prediction is a frozen, self-contained Pred dataclass:
from mhctools import Pred, Kind
pred = Pred(
kind=Kind.pMHC_affinity,
score=0.85, # ~0-1, higher = better
peptide="SIINFEKL",
allele="HLA-A*02:01",
value=120.5, # IC50 in nM
percentile_rank=0.8,
source_sequence_name="TP53",
offset=42,
predictor_name="netMHCpan",
predictor_version="4.1",
)
score is always higher-is-better. value is in native units (nM for
affinity, hours for stability). percentile_rank is always optional,
0-100, lower = stronger.
Supported predictors
MHC binding & presentation
| Predictor | Kinds produced | Requires |
|---|---|---|
NetMHCpan / NetMHCpan41 / NetMHCpan42 |
affinity + presentation | NetMHCpan |
NetMHCpan4 |
affinity or presentation | NetMHCpan 4.0 |
NetMHCpan3 / NetMHCpan28 |
affinity | older NetMHCpan |
NetMHC / NetMHC3 / NetMHC4 |
affinity | NetMHC |
NetMHCIIpan / NetMHCIIpan43 |
affinity or presentation | NetMHCIIpan |
NetMHCcons |
affinity | NetMHCcons |
NetMHCstabpan |
stability | NetMHCstabpan |
MHCflurry |
affinity + presentation | pip install mhcflurry + mhcflurry-downloads fetch |
BigMHC |
presentation or immunogenicity | BigMHC clone (set BIGMHC_DIR) |
MixMHCpred |
presentation | MixMHCpred |
IedbNetMHCpan / IedbSMM / IedbNetMHCIIpan |
affinity | IEDB web API |
RandomBindingPredictor |
affinity | (built-in) |
Antigen processing
| Predictor | Kinds produced | Requires |
|---|---|---|
Pepsickle |
proteasome cleavage | pip install mhctools[pepsickle] |
NetChop |
proteasome cleavage | NetChop |
Processing predictors use configurable scoring to aggregate per-position
cleavage probabilities into peptide-level scores. See ProcessingPredictor
and ProteasomePredictor for details.
Commandline examples
Prediction for user-supplied peptide sequences
mhctools --sequence SIINFEKL SIINFEKLQ --mhc-predictor netmhc --mhc-alleles A0201
Automatically extract peptides as subsequences of specified length
mhctools --sequence AAAQQQSIINFEKL --extract-subsequences --mhc-peptide-lengths 8-10 --mhc-predictor mhcflurry --mhc-alleles A0201
Legacy API
The old predict_peptides() and predict_subsequences() methods still work
and return BindingPredictionCollection objects:
predictor = NetMHCpan(alleles=["A*02:01"])
collection = predictor.predict_subsequences(
{"1L2Y": "NLYIQWLKDGGPSSGRPPPS"},
peptide_lengths=[9],
)
df = collection.to_dataframe()
for bp in collection:
if bp.affinity < 100:
print("Strong binder: %s" % bp)
To convert legacy results to the new types:
preds = collection.to_preds() # list of Pred
pp_list = collection.to_peptide_preds() # list of PeptidePreds
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mhctools-3.2.0.tar.gz.
File metadata
- Download URL: mhctools-3.2.0.tar.gz
- Upload date:
- Size: 79.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f59a38192dc149ad1ae1441477faf48ff1b85a38804cd2c146f09769fe66a3d6
|
|
| MD5 |
3cc98dc9cabdfcb68671afc1a9a0ba62
|
|
| BLAKE2b-256 |
a1f1fa8f2cefaa75498d169bdc910d105ff17875be01f7786858ec64195cde5a
|
File details
Details for the file mhctools-3.2.0-py3-none-any.whl.
File metadata
- Download URL: mhctools-3.2.0-py3-none-any.whl
- Upload date:
- Size: 79.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c9a9e4f48fabe5f9e053e9dca35fdefd10f2a724d24d7a31b602d62c63df726
|
|
| MD5 |
3f08fe602a70451e6f2835499b9d3aad
|
|
| BLAKE2b-256 |
a72d7b66fc98b087b8fb40b8efb30b8fad22abee4e7e0ec215ec27461ba5e240
|