Skip to main content

Python interface to running command-line and web-based MHC binding predictors

Project description

Tests PyPI

mhctools

Python interface to running command-line and web-based MHC binding predictors.

Installation

pip install mhctools

For MHCflurry support, also run:

mhcflurry-downloads fetch

Quick start

from mhctools import NetMHCpan41

predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])

# predict for specific peptides
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])

# results is a list of PeptidePreds — one per peptide
for pp in results:
    best = pp.best_affinity
    if best:
        print(f"{best.peptide} -> {best.allele} IC50={best.value:.1f}nM")

Python API

Predicting peptides

predict() takes a list of peptide sequences and returns a list[PeptidePreds]. Each PeptidePreds contains Pred objects for every allele and measurement kind the predictor supports.

from mhctools import NetMHCpan41

predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])

pp = results[0]
pp.best_affinity                # Pred with highest affinity score
pp.best_affinity.allele         # "HLA-A*02:01"
pp.best_affinity.value          # IC50 in nM
pp.best_affinity.score          # higher = better (~0-1)
pp.best_affinity.percentile_rank  # lower = better (0-100)

pp.best_affinity_by_rank        # Pred with lowest percentile rank
pp.best_presentation            # best EL/presentation score
pp.best_presentation_by_rank    # best EL percentile rank
pp.best_stability               # best pMHC stability (if available)
pp.best_stability_by_rank

# filter by kind or allele
pp.filter(kind=Kind.pMHC_affinity)
pp.filter(allele="HLA-A*02:01")

NetMHCpan 4.1 automatically emits both pMHC_affinity and pMHC_presentation predictions per peptide-allele pair.

Scanning proteins

predict_proteins() takes a dictionary of protein sequences and returns {sequence_name: list[PeptidePreds]}:

proteins = predictor.predict_proteins(
    {"TP53": "MEEPQSDPSVEPPLSQETFS...", "KRAS": "MTEYKLVVVGAGGVGKS..."},
    peptide_lengths=[9, 10],
)

for pp in proteins["TP53"]:
    best = pp.best_affinity
    if best and best.value < 500:
        print(f"  offset={best.offset} {best.peptide} IC50={best.value:.0f}")

DataFrames

Every level has a _dataframe variant that flattens to a pandas DataFrame with consistent columns:

df = predictor.predict_dataframe(["SIINFEKL"], sample_name="pat001")
df = predictor.predict_proteins_dataframe({"TP53": "MEEPQ..."}, sample_name="pat001")

Columns: sample_name, peptide, n_flank, c_flank, source_sequence_name, offset, predictor_name, predictor_version, allele, kind, score, value, percentile_rank.

Multi-sample predictions

MultiSample runs a predictor across multiple samples, each with its own HLA genotype:

from mhctools import MultiSample, NetMHCpan41

ms = MultiSample(
    samples={
        "pat001": ["HLA-A*02:01", "HLA-B*07:02"],
        "pat002": ["HLA-A*01:01", "HLA-B*08:01"],
    },
    predictor_class=NetMHCpan41,
)

# {sample_name: list[PeptidePreds]}
results = ms.predict(["SIINFEKL", "GILGFVFTL"])

# {sample_name: {seq_name: list[PeptidePreds]}}
protein_results = ms.predict_proteins({"TP53": "MEEPQ..."})

# flat DataFrames with sample_name column
df = ms.predict_dataframe(["SIINFEKL"])
df = ms.predict_proteins_dataframe({"TP53": "MEEPQ..."})

Measurement kinds

The Kind enum describes what biological quantity a Pred measures:

Kind Meaning
pMHC_affinity Peptide-MHC binding affinity
pMHC_presentation Likelihood of surface presentation (EL)
pMHC_stability Peptide-MHC complex stability
cellular_presentation Cross-allele presentation (e.g. MHCflurry)
antigen_processing Combined processing score
proteasome_cleavage Proteasomal cleavage score
tap_transport TAP transport score
erap_trimming ERAP trimming score

The Pred object

Every prediction is a frozen, self-contained Pred dataclass:

from mhctools import Pred, Kind

pred = Pred(
    kind=Kind.pMHC_affinity,
    score=0.85,           # ~0-1, higher = better
    peptide="SIINFEKL",
    allele="HLA-A*02:01",
    value=120.5,          # IC50 in nM
    percentile_rank=0.8,
    source_sequence_name="TP53",
    offset=42,
    predictor_name="netMHCpan",
    predictor_version="4.1",
)

score is always higher-is-better. value is in native units (nM for affinity, hours for stability). percentile_rank is always optional, 0-100, lower = stronger.

Supported predictors

Predictor Kinds produced Requires
NetMHCpan / NetMHCpan41 affinity + presentation NetMHCpan
NetMHCpan4 affinity or presentation NetMHCpan 4.0
NetMHCpan3 / NetMHCpan28 affinity older NetMHCpan
NetMHC / NetMHC3 / NetMHC4 affinity NetMHC
NetMHCIIpan affinity or presentation NetMHCIIpan
NetMHCcons affinity NetMHCcons
NetMHCstabpan stability NetMHCstabpan
MHCflurry affinity pip install mhcflurry + mhcflurry-downloads fetch
MixMHCpred presentation MixMHCpred
RandomBindingPredictor affinity (built-in)
NetChop cleavage NetChop

Commandline examples

Prediction for user-supplied peptide sequences

mhctools --sequence SIINFEKL SIINFEKLQ --mhc-predictor netmhc --mhc-alleles A0201

Automatically extract peptides as subsequences of specified length

mhctools --sequence AAAQQQSIINFEKL --extract-subsequences --mhc-peptide-lengths 8-10 --mhc-predictor mhcflurry --mhc-alleles A0201

Legacy API

The old predict_peptides() and predict_subsequences() methods still work and return BindingPredictionCollection objects:

predictor = NetMHCpan(alleles=["A*02:01"])
collection = predictor.predict_subsequences(
    {"1L2Y": "NLYIQWLKDGGPSSGRPPPS"},
    peptide_lengths=[9],
)
df = collection.to_dataframe()

for bp in collection:
    if bp.affinity < 100:
        print("Strong binder: %s" % bp)

To convert legacy results to the new types:

preds = collection.to_preds()           # list of Pred
pp_list = collection.to_peptide_preds() # list of PeptidePreds

Project details


Release history Release notifications | RSS feed

This version

3.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhctools-3.1.0.tar.gz (78.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mhctools-3.1.0-py3-none-any.whl (79.0 kB view details)

Uploaded Python 3

File details

Details for the file mhctools-3.1.0.tar.gz.

File metadata

  • Download URL: mhctools-3.1.0.tar.gz
  • Upload date:
  • Size: 78.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for mhctools-3.1.0.tar.gz
Algorithm Hash digest
SHA256 da1bd7efce674a79b414c05d48ab8912224a092e9c17c531b0fb0dfbd2e1218a
MD5 c56748bfada4afcfe2e0d86e332f5a9c
BLAKE2b-256 65c81fdd3400cbb72de30dc986a0d8fd2745a550d1cb0021caa0afde6e56e57b

See more details on using hashes here.

File details

Details for the file mhctools-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: mhctools-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 79.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for mhctools-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 02c39c536e3c23ed6eebb59eb57da6e3a916d7830126b8446b22b337e1fdc033
MD5 574b7340a0ff5a9a7d83a71ea0440018
BLAKE2b-256 ac09d95b5666463b37cbc4f0cb04a9d3563a6191d937ce1a17c1fbafd51f272d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page