MHC class I binding and epitope prediction based on modeled physicochemical properties of peptides
MHC class I binding and epitope prediction based on modeled physicochemical properties of peptides.
MHCLovac is a result of my personal interests and some free time. Although I spent good amount of time researching this topic I would not qualify this project as a thorough scientific research. That said, MHCLovac is not too bad in terms of predictions it makes and I plan to improve it further if possible.
What's new in version 3?
- Epitope prediction in form of epitope probability score.
- Binding score is back. The ic50 predictions from version 2 are replaced with binding score. Higher score means stronger binding.
- Prediction is carried out by a collection of regression and classification algorithms.
- MHCLovac can now be used as python package. See example usage for more info.
MHCLovac is MHC class I binding and epitope prediction tool. It uses physicochemical properties of peptides to predict binding affinity and epitope probability (in form of scores). One of the main challenges with MHC binding prediction, which MHCLovac aims to solve, is that target peptides don't have to be uniform in length. Some alleles allow peptide lengths to span a wide range: H2-Kb epitopes are known to span 7 - 13 residues in length. This poses a challenge when creating numerical feature representation of peptides for prediction algorithms. MHCLovac solves this by modeling each peptide into a linear, wave-like representation of its physicochemical properties. Modeled array can then be scaled up or down to a fixed length allowing MHCLovac to work with a fixed number of features. The downside to this approach is that MHCLovac has to assume that peptides bind in linear conformation which may not always be the case.
MHCLovac is trained on data obtained from two sources: dataset used for retraining the IEDB class I binding prediction tools http://tools.iedb.org/main/datasets/ and IEDB database www.iedb.org. Training results and a list of supported MHC alleles is available in training/results folder. Trained models are benchmarked using ROC-AUC method. Benchmarking method is covered in benchmark folder.
pip install mhclovac
As command line tool:
mhclovac -f example.fasta -m HLA-B*44:02 -l 11
As python package:
from mhclovac import predict from mhclovac.utils import list_mhc_alleles alleles = list_mhc_alleles() # returns list of supported MHC alleles predictions = predict(sequence=['MEIFIEVFSHF', 'ELTLNMCL'], mhc='HLA-B*44:02') # returns pandas DataFrame with prediction results
sequence mhc peptide_length sequence_name binding_score epitope_score combined_score MEIFIEVFSHF HLA-B*44:02 11 MEIFIEVFSHF HLA-B44:02 0.523205 0.965484 1.488688 EIFIEVFSHFL HLA-B*44:02 11 MEIFIEVFSHF HLA-B44:02 0.087188 0.512132 0.599320 IFIEVFSHFLL HLA-B*44:02 11 MEIFIEVFSHF HLA-B44:02 0.039142 0.159362 0.198503 FIEVFSHFLLQ HLA-B*44:02 11 MEIFIEVFSHF HLA-B44:02 0.114877 0.264553 0.379430 IEVFSHFLLQL HLA-B*44:02 11 MEIFIEVFSHF HLA-B44:02 0.317922 0.964168 1.282090
sequence_name- Fasta sequence name or name provided by
mhc- MHC allele
binding_score- Higher score means better binding
epitope_score- Higher score means a better epitope
combined_score- Sum of binding and epitope scores if both are available
Donate to support my work
If you like this project and wish to support my work you can do so by donating Bitcoin. Any amount donated will be appreciated!
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size mhclovac-3.3-py3-none-any.whl (36.6 MB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size mhclovac-3.3.tar.gz (36.6 MB)||File type Source||Python version None||Upload date||Hashes View|