mhcflurry·PyPI

MHC Binding Predictor

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

mhcflurry

Open source neural network models for peptide-MHC binding affinity prediction

The adaptive immune system depends on the presentation of protein fragments by MHC molecules. Machine learning models of this interaction are used in studies of infectious diseases, autoimmune diseases, vaccine development, and cancer immunotherapy.

MHCflurry currently supports allele-specific peptide / MHC class I affinity prediction using two approaches:

Ensembles of predictors trained on random halves of the training data (the default)
Single-model predictors for each allele trained on all data

For both kinds of predictors, you can fit models to your own data or download trained models that we provide.

The downloadable models were trained on data from IEDB and Kim 2014. The ensemble predictors include models trained on data that has been augmented with values imputed from other alleles (see Rubinsteyn 2016).

In validation experiments using presented peptides identified by mass-spec, the ensemble models perform best. We are working on a performance comparison of these models with other predictors such as netMHCpan, which we hope to make available soon.

We anticipate adding additional models, including pan-allele and class II predictors.

Setup

The MHCflurry predictors are implemented in Python using keras. To configure keras you’ll need to set an environment variable in your shell:

export KERAS_BACKEND=theano

If you’re familiar with keras, you may also try using the tensorflow backend. MHCflurry is currently tested using theano, however.

Now install the package:

pip install mhcflurry

Then download our datasets and trained models:

mhcflurry-downloads fetch

From a checkout you can run the unit tests with:

nosetests .

Making predictions from the command-line

$ mhcflurry-predict --alleles HLA-A0201 HLA-A0301 --peptides SIINFEKL SIINFEKD SIINFEKQ
Predicting for 2 alleles and 3 peptides = 6 predictions
allele,peptide,mhcflurry_prediction
HLA-A0201,SIINFEKL,10672.34765625
HLA-A0201,SIINFEKD,26042.716796875
HLA-A0201,SIINFEKQ,26375.794921875
HLA-A0301,SIINFEKL,25532.703125
HLA-A0301,SIINFEKD,24997.876953125
HLA-A0301,SIINFEKQ,28262.828125

You can also specify the input and output as CSV files. Run mhcflurry-predict -h for details.

Making predictions from Python

from mhcflurry import predict
predict(alleles=['A0201'], peptides=['SIINFEKL'])

  Allele   Peptide  Prediction
0  A0201  SIINFEKL  10672.347656

The predictions returned by predict are affinities (KD) in nM.

Training your own models

See the class1_allele_specific_models.ipynb notebook for an overview of the Python API, including predicting, fitting, and scoring single-model predictors. There is also a script called mhcflurry-class1-allele-specific-cv-and-train that will perform cross validation and model selection given a CSV file of training data. Try mhcflurry-class1-allele-specific-cv-and-train --help for details.

The ensemble predictors are trained similarly using the mhcflurry-class1-allele-specific-ensemble-train command.

Details on the downloadable models

The scripts we use to train predictors, including hyperparameter selection using cross validation, are here for the ensemble predictors and here for the single-model predictors.

For the ensemble predictors, we also generate a report that describes the hyperparameters selected and the test performance of each model.

Besides the model weights, the data downloaded when you run mhcflurry-downloads fetch also includes a CSV file giving the hyperparameters used for each predictor. Run mhcflurry-downloads path models_class1_allele_specific_ensemble or mhcflurry-downloads path models_class1_allele_specific_single to get the directory where these files are stored.

Problems and Solutions

undefined symbol

If you get an error like:

ImportError: _CVXcanon.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev

Try installing cvxpy using conda instead of pip.

Environment variables

The path where MHCflurry looks for model weights and data can be set with the MHCFLURRY_DOWNLOADS_DIR environment variable. This directory should contain subdirectories like “models_class1_allele_specific_single”. Setting this variable overrides the other environment variables described below.

If you only want to change the version of the released data used, you can set MHCFLURRY_DOWNLOADS_CURRENT_RELEASE. If you want to change the base directory used for all releases, set MHCFLURRY_DATA_DIR.

By default, MHCFLURRY_DOWNLOADS_DIR is a platform specific application storage directory, MHCFLURRY_DOWNLOADS_CURRENT_RELEASE is the latest release, and MHCFLURRY_DOWNLOADS_DIR is set to $MHCFLURRY_DATA_DIR/$MHCFLURRY_DOWNLOADS_CURRENT_RELEASE.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

2.1.5

Mar 31, 2025

2.1.4

Oct 2, 2024

2.1.2

Jul 28, 2024

2.1.1

Mar 15, 2024

2.1.0

Oct 18, 2023

2.0.6

Jun 8, 2022

2.0.5

Nov 30, 2021

2.0.4

Sep 24, 2021

2.0.3

Sep 24, 2021

2.0.2

Jun 5, 2021

2.0.1

Jul 20, 2020

2.0.0

Jul 13, 2020

1.6.1

May 1, 2020

1.6.0

Mar 23, 2020

1.4.3

Nov 11, 2019

1.4.2

Oct 30, 2019

1.4.1

Oct 29, 2019

1.4.0

Oct 5, 2019

1.3.1

Sep 10, 2019

1.3.0

Sep 10, 2019

1.2.4

Apr 10, 2019

1.2.3

Feb 15, 2019

1.2.2

May 21, 2018

1.2.1

Mar 19, 2018

1.2.0

Feb 26, 2018

1.1.0

Feb 6, 2018

1.0.0

Dec 22, 2017

0.9.2

Oct 13, 2017

0.9.1

Jul 31, 2017

0.9.0

May 25, 2017

This version

0.2.0

Mar 24, 2017

0.0.8

Sep 17, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mhcflurry-0.2.0.tar.gz (67.3 kB view details)

Uploaded Mar 24, 2017 Source

File details

Details for the file mhcflurry-0.2.0.tar.gz.

File metadata

Download URL: mhcflurry-0.2.0.tar.gz
Upload date: Mar 24, 2017
Size: 67.3 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for mhcflurry-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`22c97a4fd2eb217c3424d6a4df5ca1a87c5aed5e416a12e389f5b1115aa35f8f`
MD5	`18d0b55627daef2ef63a255df131110f`
BLAKE2b-256	`3596070fec2dc7f14885c55321725c51e0fd3cf5b0a735fbfafd2550f5dbb1c9`

See more details on using hashes here.

mhcflurry 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mhcflurry

Setup

Making predictions from the command-line

Making predictions from Python

Training your own models

Details on the downloadable models

Problems and Solutions

Environment variables

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes