Skip to main content

MHC binding prediction based on modeled physicochemical properties of peptides

Project description

MHCLovac

MHC binding prediction based on modeled physicochemical properties of peptides

New in version 4.0

  • Training data with 4+ million data samples, binding affinity + eluted ligands.
  • Separate ligand prediction from the previous version is removed.
  • Binding score is reported as the log transformed binding affinity: 1 - log50k(affinity).

Table of content

About

MHCLovac is MHC binding prediction method that focuses on physicochemical properties of peptides responsible for interaction with MHC molecules. This method is based on modeling physicochemical properties of peptides in a way that captures the nearest neighbor effect of amino acid residues. In other words, the method is based on the following assumption: physicochemical properties of adjacent amino acid residues have additive effect on the local properties of the peptide as a whole, and properties of a single residue affect the properties of the peptide at the neighboring positions. Using this approach each peptide is represented by a set of modeled physicochemical profiles (distributions of certain property, figure 1, upper subplots) which are further reduced to predetermined number of discrete data points to obtain discrete physicochemical profiles (figure 1, bottom subplots). Discrete profiles are used as input features for binding prediction models. This method allows for direct comparison of physicochemical profiles of peptides of different sequence lengths.

mhclovac-modeling-method-figure.png

Materials

Training data was obtained from NetMHCPan 4.1 (Reynisson, B. et. al., 2020) website and contains preprocessed binding affinity and eluted ligand data. In addition, binding affinity data was obtained directly from IEDB (Vita R et. al., 2018), and this dataset was used to narrow down the set of physicochemical indexes which are used for prediction. The list of physicochemical properties and corresponding amino acid index data was obtained from the AAindex database (Kawashima, S. et. al, 2008).

Methods

MHCLovac uses a collection of out-of-the-box regression algorithms from scikit-learn python library with mostly default parameters. The prediction model returns binding scores in form of log transformed binding affinity (1 – log50k(affinity)). Input features for prediction models are discrete physicochemical profiles of peptides. Since the AAindex database contains more than 500 entries, to reduce the number of physicochemical properties needed to model, the following selection method is implemented: for each physicochemical property index, the binding model was trained and evaluated using r2 score for each MHC allele, and the average score across all alleles was calculated. The indexes were sorted based on the average score in descending order. Starting from the highest scoring index (selected by default), each next index was compared to the previously selected ones for correlation coefficient. Only if correlation coefficients with all indexes from selection were in range [-0.3, 0.3] the new index was added to the selection. This resulted in total of 9 indexes (table 1) which had high prediction potential for most alleles and were also low-correlated between themselves.

Accession number Title Average r2 score
ROSM880102 Side chain hydropathy, corrected for solvation (Roseman, 1988) 0.2819
ZIMJ680104 Isoelectric point (Zimmerman et al., 1968) 0.2740
OOBM770104 Average non-bonded energy per residue (Oobatake-Ooi, 1977) 0.2498
SNEP660101 Principal component I (Sneath, 1966) 0.2454
ROBB760111 Information measure for C-terminal turn (Robson-Suzuki, 1976) 0.2353
CHAM830102 A parameter defined from the residuals obtained from the best correlation of the Chou-Fasman parameter of beta-sheet (Charton-Charton, 1983) 0.2062
RACS820104 Average relative fractional occurrence in EL(i) (Rackovsky-Scheraga, 1982) 0.1833
WERD780103 Free energy change of alpha(Ri) to alpha(Rh) (Wertz-Scheraga, 1978) 0.1704
KARS160120 Weighted minimum eigenvalue based on the atomic numbers (Karkbara-Knisley, 2016) 0.1413

Results

Prediction is evaluated using FRANK method from NetMHCPan 4.1 paper. Dataset was also obtained from NetMHCPan website and contains some 1600 sequences and corresponding epitopes of which 200 were randomly selected for this benchmark. FRANK score measures the fraction of non-epitopes scoring higher than the epitope (from same sequence), and the best possible score is 0.

mhclovac-4-0-benchmark.png

Installation

pip install mhclovac

Usage

From command line:

mhclovac -f example.fasta -m HLA-B*44:02 -l 11 --sort --n_cpu 6

Programmatically:

from mhclovac import predict
from mhclovac.utils import list_mhc_alleles

alleles = list_mhc_alleles()
# returns list of supported MHC alleles

predictions = predict(
    peptides=['MEIFIEVFSHF', 'LELPTGSLEKS', 'TELTLNMCLEL'], 
    mhc_allele='HLA-B*44:02', 
    sort=True, 
    n_cpu=6
)
# returns pandas DataFrame with prediction results

Example output:

    peptide   mhc_allele  peptide_length           sequence_name  binding_score
MEIFIEVFSHF  HLA-B*44:02              11  MEIFIEVFSHF HLA-B44:02       0.626139
LELPTGSLEKS  HLA-B*44:02              11  MEIFIEVFSHF HLA-B44:02       0.211701
TELTLNMCLEL  HLA-B*44:02              11  MEIFIEVFSHF HLA-B44:02       0.185610
IEVFSHFLLQL  HLA-B*44:02              11  MEIFIEVFSHF HLA-B44:02       0.171749
LEKSLMISSQV  HLA-B*44:02              11  MEIFIEVFSHF HLA-B44:02       0.147054

References

  • Reynisson, B., Alvarez, B., Paul, S., Peters, B., & Nielsen, M. (2020). NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Research. https://doi.org/10.1093/nar/gkaa379
  • Vita R, Mahajan S, Overton JA, Dhanda SK, Martini S, Cantrell JR, Wheeler DK, Sette A, Peters B. The Immune Epitope Database (IEDB): 2018 update. Nucleic Acids Res. 2018 Oct 24. doi: 10.1093/nar/gky1006. [Epub ahead of print] PubMed PMID: 30357391. https://doi.org/10.1093/nar/gky1006
  • Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., and Kanehisa, M.; AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 36, D202-D205 (2008). [PMID:17998252] https://doi.org/10.1093/nar/gkm998

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhclovac-4.0.tar.gz (37.3 MB view details)

Uploaded Source

Built Distribution

mhclovac-4.0-py3-none-any.whl (37.3 MB view details)

Uploaded Python 3

File details

Details for the file mhclovac-4.0.tar.gz.

File metadata

  • Download URL: mhclovac-4.0.tar.gz
  • Upload date:
  • Size: 37.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for mhclovac-4.0.tar.gz
Algorithm Hash digest
SHA256 b43d91ae554150280abffc457f448f8e0a433899f3a36e401af936c172d89399
MD5 12985096a2fbbe3ad16b4f9e41d6f2b4
BLAKE2b-256 88b20f1df4ba4b8b198347db6acf02f79103df213b252ee0877c608a9d6a4c99

See more details on using hashes here.

File details

Details for the file mhclovac-4.0-py3-none-any.whl.

File metadata

  • Download URL: mhclovac-4.0-py3-none-any.whl
  • Upload date:
  • Size: 37.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.1.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.8.5

File hashes

Hashes for mhclovac-4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff5014465045478215befe492e410e7853c52a970502e56fdea5b52697d50462
MD5 0f96f12a40b918f4dec26770c9e27f7c
BLAKE2b-256 65abe5451329c6a0f380147e3eacef6ca8ed626bf70cb81c255ac653f6841ba7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page