MHC Binding Predictor
Project description
mhcflurry
MHC I ligand prediction package with competitive accuracy and a fast and documented implementation.
[!IMPORTANT] Version 2.3.0 keeps the same external API as 2.2.0 and ships substantial performance and tooling improvements for users training their own models or running large prediction workloads:
- Device-resident affinity training:
Class1NeuralNetwork.fit()keeps peptides, alleles, targets, and the random-negative pool on the active torch device for the lifetime of one fit, eliminating per-batch host↔device copies.- Multi-GPU prediction by default:
mhcflurry-predict,mhcflurry-predict-scan,mhcflurry-calibrate-percentile-ranks, and the sweep eval script auto-discover visible GPUs and fan out across them.- Orchestrator auto-tuning:
mhcflurry-class1-train-pan-allele-modelsresolves--num-jobs,--max-workers-per-gpu,--dataloader-num-workers, andrandom_negative_pool_epochsfrom the box's hardware so the same recipe runs on a workstation, single-GPU node, or 8×A100 host.--dataloader-num-workersapplies to streaming pretraining; affinity fine-tuning batches from device-resident tensors.torch.compile+ TF32 + matmul-precision are first-class CLI flags on the train commands; the in-process Inductor cache is warmed by a single worker before the production pool launches.If you are upgrading from 2.1.x or 2.2.x, simply
pip install --upgrade mhcflurry. The published pre-trained models are unchanged and will be loaded automatically. Internal refactors (per-fit device-resident training tensors, torch-side peptide encodings) do not affect the public Python or CLI surface.Earlier release: Version 2.2.0 was the first release to use PyTorch as its neural network backend, replacing TensorFlow/Keras. It introduced the Python 3.10+ and
pandas >= 2.0requirements and added Apple Silicon (MPS) support.
MHCflurry implements class I peptide/MHC binding affinity prediction. The current version provides pan-MHC I predictors supporting any MHC allele of known sequence. MHCflurry runs on Python 3.10+ using the PyTorch neural network library. It exposes command-line and Python library interfaces.
MHCflurry also includes two experimental predictors, an "antigen processing" predictor that attempts to model MHC allele-independent effects such as proteosomal cleavage and a "presentation" predictor that integrates processing predictions with binding affinity predictions to give a composite "presentation score." Both models are trained on mass spec-identified MHC ligands.
If you find MHCflurry useful in your research please cite:
T. O'Donnell, A. Rubinsteyn, U. Laserson. "MHCflurry 2.0: Improved pan-allele prediction of MHC I-presented peptides by incorporating antigen processing," Cell Systems, 2020. https://doi.org/10.1016/j.cels.2020.06.010
T. O'Donnell, A. Rubinsteyn, M. Bonsack, A. B. Riemer, U. Laserson, and J. Hammerbacher, "MHCflurry: Open-Source Class I MHC Binding Affinity Prediction," Cell Systems, 2018. https://doi.org/10.1016/j.cels.2018.05.014
Please file an issue if you have questions or encounter problems.
Have a bugfix or other contribution? We would love your help. See our contributing guidelines.
Try it now
You can generate MHCflurry predictions without any setup by running our Google colaboratory notebook.
Installation (pip)
Install the package:
$ pip install mhcflurry
Download our datasets and trained models:
$ mhcflurry-downloads fetch
You can now generate predictions:
$ mhcflurry-predict \
--alleles HLA-A0201 HLA-A0301 \
--peptides SIINFEKL SIINFEKD SIINFEKQ \
--out /tmp/predictions.csv
Wrote: /tmp/predictions.csv
Or scan protein sequences for potential epitopes:
$ mhcflurry-predict-scan \
--sequences MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHS \
--alleles HLA-A*02:01 \
--out /tmp/predictions.csv
Wrote: /tmp/predictions.csv
Unified mhcflurry parent command
Starting in 2.3.0 there is also a single mhcflurry command that dispatches
to every subcommand:
$ mhcflurry predict \
--alleles HLA-A0201 HLA-A0301 \
--peptides SIINFEKL SIINFEKD SIINFEKQ \
--out /tmp/predictions.csv
$ mhcflurry compare-models \
--a results/new_run/ \
--b public \
--out results/comparison/
$ mhcflurry plot-model-comparison --input results/comparison/
Every historical command is reachable as a subcommand
(mhcflurry-predict ↔ mhcflurry predict, mhcflurry-downloads ↔
mhcflurry downloads, mhcflurry-class1-train-pan-allele-models ↔
mhcflurry class1-train-pan-allele-models, etc.). Both forms run the
same underlying entry point; the legacy mhcflurry-* scripts remain
installed as compat shims and are not changing. mhcflurry --help
lists every available subcommand.
The two new-in-2.3.0 model-comparison tools, compare-models and
plot-model-comparison, only have the unified form.
See the documentation for more details.
Development and tests
From a checkout, source develop.sh to create and activate the editable
environment:
$ source develop.sh
For quick feedback, run lint plus a focused unit subset:
$ ./lint.sh
$ pytest -q test/test_amino_acid.py test/test_random_negative_peptides.py
pytest test/ is the full test suite, not a fast unit-only loop. It includes
small end-to-end training runs, command subprocess tests, public-model smoke
tests that require cached MHCflurry download bundles, and speed/regression
checks, so it can take many minutes. Use
pytest -q test -m "not slow and not downloads" for the broad fast tier, and
pytest -q test --durations=25 when auditing slow tests. See the
testing documentation for
the current test tiers.
Docker
You can also try the latest (GitHub master) version of MHCflurry using the Docker image hosted on Dockerhub by running:
$ docker run -p 9999:9999 --rm openvax/mhcflurry:latest
This will start a jupyter notebook server in an
environment that has MHCflurry installed. Go to http://localhost:9999 in a
browser to use it.
To build the Docker image yourself, from a checkout run:
$ docker build -t mhcflurry:latest .
$ docker run -p 9999:9999 --rm mhcflurry:latest
Predicted sequence motifs
Sequence logos for the binding motifs learned by MHCflurry BA are available here.
Common issues and fixes
Problems downloading data and models
Some users have reported HTTP connection issues when using mhcflurry-downloads fetch. As a workaround, you can download the data manually (e.g. using wget) and then use mhcflurry-downloads just to copy the data to the right place.
To do this, first get the URL(s) of the downloads you need using mhcflurry-downloads url:
$ mhcflurry-downloads url models_class1_presentation
https://github.com/openvax/mhcflurry/releases/download/1.6.0/models_class1_presentation.20200205.tar.bz2```
Then make a directory and download the needed files to this directory:
$ mkdir downloads
$ wget --directory-prefix downloads https://github.com/openvax/mhcflurry/releases/download/1.6.0/models_class1_presentation.20200205.tar.bz2```
HTTP request sent, awaiting response... 200 OK
Length: 72616448 (69M) [application/octet-stream]
Saving to: 'downloads/models_class1_presentation.20200205.tar.bz2'
Now call mhcflurry-downloads fetch with the --already-downloaded-dir option to indicate that the downloads should be retrived from the specified directory:
$ mhcflurry-downloads fetch models_class1_presentation --already-downloaded-dir downloads
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mhcflurry-2.3.0rc1.tar.gz.
File metadata
- Download URL: mhcflurry-2.3.0rc1.tar.gz
- Upload date:
- Size: 431.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ea125c2837aacebb420a20df5fcf814a83da4012d645bb91e59196c124a010c
|
|
| MD5 |
6752ef149e5173191d7cf5bbd9c6948e
|
|
| BLAKE2b-256 |
bfb6625b0fbd70a6316014842567961bff3ed2a878b2f04504bffa226ee6db7f
|
Provenance
The following attestation bundles were made for mhcflurry-2.3.0rc1.tar.gz:
Publisher:
release.yml on openvax/mhcflurry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhcflurry-2.3.0rc1.tar.gz -
Subject digest:
3ea125c2837aacebb420a20df5fcf814a83da4012d645bb91e59196c124a010c - Sigstore transparency entry: 1770530095
- Sigstore integration time:
-
Permalink:
openvax/mhcflurry@95d2bc23bea9f4685194e4c13c14840135ea8cfd -
Branch / Tag:
refs/tags/v2.3.0rc1 - Owner: https://github.com/openvax
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@95d2bc23bea9f4685194e4c13c14840135ea8cfd -
Trigger Event:
release
-
Statement type:
File details
Details for the file mhcflurry-2.3.0rc1-py3-none-any.whl.
File metadata
- Download URL: mhcflurry-2.3.0rc1-py3-none-any.whl
- Upload date:
- Size: 332.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf3c9506a92a7672e2bb384fbfa51f2df33fff5ab9a5411eac52fc5e4f2d6157
|
|
| MD5 |
0fc6e82e6e1b5ac3d398e37880ff796c
|
|
| BLAKE2b-256 |
fc9efafc68780fd7c70165d8efe874734b9d61a6582502ebb47d7461db5add50
|
Provenance
The following attestation bundles were made for mhcflurry-2.3.0rc1-py3-none-any.whl:
Publisher:
release.yml on openvax/mhcflurry
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mhcflurry-2.3.0rc1-py3-none-any.whl -
Subject digest:
bf3c9506a92a7672e2bb384fbfa51f2df33fff5ab9a5411eac52fc5e4f2d6157 - Sigstore transparency entry: 1770530221
- Sigstore integration time:
-
Permalink:
openvax/mhcflurry@95d2bc23bea9f4685194e4c13c14840135ea8cfd -
Branch / Tag:
refs/tags/v2.3.0rc1 - Owner: https://github.com/openvax
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@95d2bc23bea9f4685194e4c13c14840135ea8cfd -
Trigger Event:
release
-
Statement type: