Skip to main content

Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors

Project description

Tests PyPI

mhctools

Python interface to MHC binding, presentation, immunogenicity, and antigen processing predictors.

Installation

pip install mhctools

For MHCflurry support, also run:

mhcflurry-downloads fetch

Quick start

from mhctools import NetMHCpan41

predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])

# predict() returns a list of PeptideResult — one per peptide
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])

for r in results:
    if r.affinity:
        print(f"{r.peptide} -> {r.affinity.allele} IC50={r.affinity.value:.1f}nM")

Data model

predict() returns a list of PeptideResult — one per peptide. Each result carries the peptide string and provides accessors for each prediction kind (affinity, presentation, stability, etc.). Accessors return None when a predictor doesn't produce that kind.

results = predictor.predict(["SIINFEKL", "GILGFVFTL"])
r = results[0]

r.peptide                    # "SIINFEKL"
r.affinity.value             # IC50 in nM
r.affinity.percentile_rank   # 0-100, lower = better
r.affinity.allele            # best allele for this kind
r.presentation               # None if predictor doesn't produce it

Under the hood, each PeptideResult wraps a tuple of Prediction objects — frozen dataclasses, one per allele-kind combination. Everything converts to DataFrames with consistent column names.

Python API

Predicting peptides

from mhctools import NetMHCpan41

predictor = NetMHCpan41(alleles=["HLA-A*02:01", "HLA-B*07:02"])
results = predictor.predict(["SIINFEKL", "GILGFVFTL"])

r = results[0]
r.peptide                      # "SIINFEKL"
r.offset                       # position in source protein (if scanned)
r.kinds                        # {"pMHC_affinity", "pMHC_presentation"}
r.alleles                      # {"HLA-A*02:01", "HLA-B*07:02"}

# best prediction by kind — None when the kind is absent
r.affinity                     # Prediction or None
r.presentation                 # Prediction or None
r.stability                    # None (predictor doesn't produce it)

if r.affinity:
    r.affinity.value            # IC50 in nM
    r.affinity.percentile_rank  # 0-100, lower = better
    r.affinity.score            # ~0-1, higher = better
    r.affinity.allele           # best allele for this kind

# by rank instead of score
r.best_affinity_by_rank        # Prediction with lowest percentile rank, or None

# all predictions
r.preds                        # tuple of all Prediction objects
r.filter(kind="pMHC_affinity")
r.filter(allele="HLA-A*02:01")

NetMHCpan 4.1 automatically emits both pMHC_affinity and pMHC_presentation predictions per peptide-allele pair.

Scanning proteins

predict_proteins() takes a dictionary of protein sequences and returns {sequence_name: list[PeptideResult]}:

proteins = predictor.predict_proteins(
    {"TP53": "MEEPQSDPSVEPPLSQETFS...", "KRAS": "MTEYKLVVVGAGGVGKS..."},
    peptide_lengths=[9, 10],
)

for r in proteins["TP53"]:
    if r.affinity and r.affinity.value < 500:
        print(f"  offset={r.offset} {r.peptide} IC50={r.affinity.value:.0f}")

DataFrames

Every level has a _dataframe variant that flattens to a pandas DataFrame with consistent columns:

df = predictor.predict_dataframe(["SIINFEKL"], sample_name="pat001")
df = predictor.predict_proteins_dataframe({"TP53": "MEEPQ..."}, sample_name="pat001")

Columns: sample_name, peptide, n_flank, c_flank, source_sequence_name, offset, predictor_name, predictor_version, allele, kind, score, value, percentile_rank.

Multi-sample predictions

MultiSample runs a predictor across multiple samples, each with its own HLA genotype:

from mhctools import MultiSample, NetMHCpan41

ms = MultiSample(
    samples={
        "pat001": ["HLA-A*02:01", "HLA-B*07:02"],
        "pat002": ["HLA-A*01:01", "HLA-B*08:01"],
    },
    predictor_class=NetMHCpan41,
)

# {sample_name: list[PeptideResult]}
results = ms.predict(["SIINFEKL", "GILGFVFTL"])

# {sample_name: {seq_name: list[PeptideResult]}}
protein_results = ms.predict_proteins({"TP53": "MEEPQ..."})

# flat DataFrames with sample_name column
df = ms.predict_dataframe(["SIINFEKL"])
df = ms.predict_proteins_dataframe({"TP53": "MEEPQ..."})

Measurement kinds

Each Prediction has a kind string describing what it measures:

Kind Meaning
pMHC_affinity Peptide-MHC binding affinity
pMHC_presentation Likelihood of surface presentation (EL/processing)
pMHC_stability Peptide-MHC complex stability
immunogenicity T-cell immunogenicity
antigen_processing Combined processing score
proteasome_cleavage Proteasomal cleavage score
tap_transport TAP transport score (reserved, not yet used)
erap_trimming ERAP trimming score (reserved, not yet used)

The Prediction object

Every prediction is a frozen, self-contained Prediction dataclass:

from mhctools import Prediction

pred = Prediction(
    kind="pMHC_affinity",
    score=0.85,           # ~0-1, higher = better
    peptide="SIINFEKL",
    allele="HLA-A*02:01",
    value=120.5,          # IC50 in nM
    percentile_rank=0.8,
    source_sequence_name="TP53",
    offset=42,
    predictor_name="netMHCpan",
    predictor_version="4.1",
)

score is always higher-is-better. value is in native units (nM for affinity, hours for stability). percentile_rank is always optional, 0-100, lower = stronger.

Supported predictors

MHC binding & presentation

Predictor Kinds produced Requires
NetMHCpan / NetMHCpan41 / NetMHCpan42 affinity + presentation NetMHCpan
NetMHCpan4 affinity or presentation NetMHCpan 4.0
NetMHCpan3 / NetMHCpan28 affinity older NetMHCpan
NetMHC / NetMHC3 / NetMHC4 affinity NetMHC
NetMHCIIpan / NetMHCIIpan43 affinity or presentation NetMHCIIpan
NetMHCcons affinity NetMHCcons
NetMHCstabpan stability NetMHCstabpan
MHCflurry affinity + presentation pip install mhcflurry + mhcflurry-downloads fetch
MHCflurry_Affinity affinity pip install mhcflurry + mhcflurry-downloads fetch
BigMHC presentation or immunogenicity BigMHC clone (set BIGMHC_DIR)
MixMHCpred presentation MixMHCpred
IedbNetMHCpan / IedbSMM / IedbNetMHCIIpan affinity IEDB web API
RandomBindingPredictor affinity (built-in)

Antigen processing

Predictor Kinds produced Requires
Pepsickle proteasome cleavage pip install pepsickle (paper)
NetChop proteasome cleavage NetChop

Processing predictors use configurable scoring to aggregate per-position cleavage probabilities into peptide-level scores. See ProcessingPredictor and ProteasomePredictor for details.

Commandline examples

Prediction for user-supplied peptide sequences

mhctools --sequence SIINFEKL SIINFEKLQ --mhc-predictor netmhc --mhc-alleles A0201

Automatically extract peptides as subsequences of specified length

mhctools --sequence AAAQQQSIINFEKL --extract-subsequences --mhc-peptide-lengths 8-10 --mhc-predictor mhcflurry --mhc-alleles A0201

Legacy API

The old predict_peptides() and predict_subsequences() methods still work and return BindingPredictionCollection objects:

predictor = NetMHCpan(alleles=["A*02:01"])
collection = predictor.predict_subsequences(
    {"1L2Y": "NLYIQWLKDGGPSSGRPPPS"},
    peptide_lengths=[9],
)
df = collection.to_dataframe()

for bp in collection:
    if bp.affinity < 100:
        print("Strong binder: %s" % bp)

To convert legacy results to the new types:

preds = collection.to_preds()           # list of Prediction
pp_list = collection.to_peptide_preds() # list of PeptideResult

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhctools-3.13.6.tar.gz (102.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mhctools-3.13.6-py3-none-any.whl (89.1 kB view details)

Uploaded Python 3

File details

Details for the file mhctools-3.13.6.tar.gz.

File metadata

  • Download URL: mhctools-3.13.6.tar.gz
  • Upload date:
  • Size: 102.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for mhctools-3.13.6.tar.gz
Algorithm Hash digest
SHA256 5009346f6411fd1e87fa2494b0cba12b3f6c65274a06a9e21eba50d1a16d3bf9
MD5 146ad3b2bd7b4314baa3f3ef783f5fa1
BLAKE2b-256 ac3d168eb51cd7bc51fea807cca8b9b02e91c8b7dfb487c97dee45d4b4c57095

See more details on using hashes here.

File details

Details for the file mhctools-3.13.6-py3-none-any.whl.

File metadata

  • Download URL: mhctools-3.13.6-py3-none-any.whl
  • Upload date:
  • Size: 89.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for mhctools-3.13.6-py3-none-any.whl
Algorithm Hash digest
SHA256 451b31204fe888bc33ba69d78e2eb7913a7db6d9471bd55badd74aa9324a80f9
MD5 d015abf43d42549c78facce4e1ad8a53
BLAKE2b-256 d04237471f1ba56cd3f42ee1317510943d62591709145473fdd587741b545e04

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page