Skip to main content

MS²PIP: MS² Peak Intensity Prediction

Project description



GitHub release PyPI GitHub Workflow Status Last commit Last commit GitHub Twitter

MS²PIP: MS² Peak Intensity Prediction - Fast and accurate peptide fragmention spectrum prediction for multiple fragmentation methods, instruments and labeling techniques.



Introduction

MS²PIP is a tool to predict MS² signal peak intensities from peptide sequences. It employs the XGBoost machine learning algorithm and is written in Python.

You can install MS²PIP on your machine by following the instructions below. For a more user friendly experience, go to the MS²PIP web server. There, you can easily upload a list of peptide sequences, after which the corresponding predicted MS² spectra can be downloaded in multiple file formats. The web server can also be contacted through the RESTful API.

To generate a predicted spectral library starting from a FASTA file, we developed a pipeline called fasta2speclib. Usage of this pipeline is described on the fasta2speclib wiki page. Fasta2speclib was developed in collaboration with the ProGenTomics group for the MS²PIP for DIA project.

If you use MS²PIP for your research, please cite the following articles:

  • Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research doi:10.1093/nar/gkz299
  • Degroeve, S., Maddelein, D., & Martens, L. (2015). MS²PIP prediction server: compute and visualize MS² peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Research, 43(W1), W326–W330. doi:10.1093/nar/gkv542
  • Degroeve, S., & Martens, L. (2013). MS²PIP: a tool for MS/MS peak intensity prediction. Bioinformatics (Oxford, England), 29(24), 3199–203. doi:10.1093/bioinformatics/btt544

Please also take note of and mention the MS²PIP-version you used.


Installation

Install with pip

pip install ms2pip

We recommend using a conda or venv virtual environment.

For development

Clone the repository and use pip to install an editable version:

pip install --editable .

Usage

MS²PIP comes with pre-trained models for a variety of fragmentation methods and modifications. These models can easily be applied by configuring MS²PIP in the config file and providing a list of peptides in the form of a PEPREC file. Optionally, MS²PIP predictions can be compared to spectra in an MGF file.

Command line interface

usage: ms2pip [-h] -c CONFIG_FILE [-s MGF_FILE] [-w FEATURE_VECTOR_OUTPUT]
              [-r] [-x] [-t] [-n NUM_CPU]
              <PEPREC file>

positional arguments:
  <PEPREC file>         list of peptides

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG_FILE, --config-file CONFIG_FILE
                        config file
  -s MGF_FILE, --spectrum-file MGF_FILE
                        .mgf MS2 spectrum file (optional)
  -w FEATURE_VECTOR_OUTPUT, --vector-file FEATURE_VECTOR_OUTPUT
                        write feature vectors to FILE.{pkl,h5} (optional)
  -r, --retention-time  add retention time predictions (requires DeepLC python
                        package)
  -x, --correlations    calculate correlations (if MGF is given)
  -t, --tableau         create Tableau Reader file
  -n NUM_CPU, --num-cpu NUM_CPU
                        number of CPUs to use (default: all available)

Input files

Config file

Several MS²PIP options need to be set in this config file.

  • model=X where X is one of the currently supported MS²PIP models (see Specialized prediction models).
  • frag_error=X where is X is the fragmentation spectrum mass tolerance in Da (only relevant if an MGF file is passed).
  • out=X where X is a comma-separated list of a selection of the currently supported output file formats: csv, mgf, msp, spectronaut, or bibliospec (SSL/MS2, also for Skyline). For example: out=csv,msp.
  • ptm=X,Y,opt,Z for every peptide modification where:
    • X is the PTM name and needs to match the names that are used in the PEPREC file). If the --retention_time option is used, PTM names must match the PSI-MOD/Unimod names embedded in DeepLC (see DeepLC documentation).
    • Y is the mass shift in Da associated with the PTM.
    • Z is the one-letter code of the amino acid AA that is modified by the PTM. For N- and C-terminal modifications, Z should be N-term or C-term, respectively.

PEPREC file

To apply the pre-trained models you need to pass only a <PEPREC file> to MS²PIP. This file contains the peptide sequences for which you want to predict peak intensities. The file is space separated and contains at least the following four columns:

  • spec_id: unique id (string) for the peptide/spectrum. This must match the TITLE field in the corresponding MGF file, if given.
  • modifications: Amino acid modifications for the given peptide. Every modification is listed as location|name, separated by a pipe (|) between the location, the name, and other modifications. location is an integer counted starting at 1 for the first AA. 0 is reserved for N-terminal modifications, -1 for C-terminal modifications. name has to correspond to a modification listed in the Config file. Unmodified peptides are marked with a hyphen (-).
  • peptide: the unmodified amino acid sequence.
  • charge: precursor charge state as an integer (without +).

Peptides must be strictly longer than 2 and shorter than 100 amino acids and cannot contain the following amino acid one-letter codes: B, J, O, U, X or Z. Peptides not fulfilling these requirements will be filtered out and will not be reported in the output.

In the conversion_tools folder, we provide a host of Python scripts to convert common search engine output files to a PEPREC file.

To start from a FASTA file, see fasta2speclib.

MGF file (optional)

Optionally, an MGF file with measured spectra can be passed to MS²PIP. In this case, MS²PIP will calculate correlations between the measured and predicted peak intensities. Make sure that the PEPREC spec_id matches the mgf TITLE field. Spectra present in the MGF file, but missing in the PEPREC file (and vice versa) will be skipped.

Examples

Suppose the config file contains the following lines

model=HCD
frag_error=0.02
out=csv,mgf,msp
ptm=Carbamidomethyl,57.02146,opt,C
ptm=Acetyl,42.010565,opt,N-term
ptm=Glyloss,-58.005479,opt,C-term

then the PEPREC file could look like this:

spec_id modifications peptide charge
peptide1 - ACDEK 2
peptide2 2|Carbamidomethyl ACDEFGR 3
peptide3 0|Acetyl|2|Carbamidomethyl ACDEFGHIK 2

In this example, peptide3 is N-terminally acetylated and carries a carbamidomethyl on its second amino acid.

The corresponding (optional) MGF file can contain the following spectrum:

BEGIN IONS
TITLE=peptide1
PEPMASS=283.11849750978325
CHARGE=2+
72.04434967 0.00419513
147.11276245 0.17418982
175.05354309 0.03652963
...
END IONS

Output

The predictions are saved in the output file(s) specified in the config file. Note that the normalization of intensities depends on the output file format. In the CSV file output, intensities are log2-transformed. To "unlog" the intensities, use the following formula: intensity = (2 ** log2_intensity) - 0.001.


Specialized prediction models

MS²PIP contains multiple specialized prediction models, fit for peptide spectra with different properties. These properties include fragmentation method, instrument, labeling techniques and modifications. As all of these properties can influence fragmentation patterns, it is important to match the MS²PIP model to the properties of your experimental dataset.

Currently the following models are supported in MS²PIP: HCD, CID, iTRAQ, iTRAQphospho, TMT, TTOF5600, HCDch2 and CIDch2. The last two "ch2" models also include predictions for doubly charged fragment ions (b++ and y++), next to the predictions for singly charged b- and y-ions.

MS² acquisition information and peptide properties of the models' training datasets

Model Fragmentation method MS² mass analyzer Peptide properties
HCD HCD Orbitrap Tryptic digest
CID CID Linear ion trap Tryptic digest
iTRAQ HCD Orbitrap Tryptic digest, iTRAQ-labeled
iTRAQphospho HCD Orbitrap Tryptic digest, iTRAQ-labeled, enriched for phosphorylation
TMT HCD Orbitrap Tryptic digest, TMT-labeled
TTOF5600 CID Quadrupole Time-of-Flight Tryptic digest
HCDch2 HCD Orbitrap Tryptic digest
CIDch2 CID Linear ion trap Tryptic digest

Models, version numbers, and the train and test datasets used to create each model

Model Current version Train-test dataset (unique peptides) Evaluation dataset (unique peptides) Median Pearson correlation on evaluation dataset
HCD v20190107 MassIVE-KB (1 623 712) PXD008034 (35 269) 0.903786
CID v20190107 NIST CID Human (340 356) NIST CID Yeast (92 609) 0.904947
iTRAQ v20190107 NIST iTRAQ (704 041) PXD001189 (41 502) 0.905870
iTRAQphospho v20190107 NIST iTRAQ phospho (183 383) PXD001189 (9 088) 0.843898
TMT v20190107 Peng Lab TMT Spectral Library (1 185 547) PXD009495 (36 137) 0.950460
TTOF5600 v20190107 PXD000954 (215 713) PXD001587 (15 111) 0.746823
HCDch2 v20190107 MassIVE-KB (1 623 712) PXD008034 (35 269) 0.903786 (+) and 0.644162 (++)
CIDch2 v20190107 NIST CID Human (340 356) NIST CID Yeast (92 609) 0.904947 (+) and 0.813342 (++)

To train custom MS²PIP models, please refer to Training new MS²PIP models on our Wiki pages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms2pip-3.6.2.tar.gz (12.7 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ms2pip-3.6.2-cp38-cp38-manylinux1_x86_64.whl (34.5 MB view details)

Uploaded CPython 3.8

ms2pip-3.6.2-cp38-cp38-macosx_10_14_x86_64.whl (35.0 MB view details)

Uploaded CPython 3.8macOS 10.14+ x86-64

ms2pip-3.6.2-cp37-cp37m-manylinux1_x86_64.whl (34.5 MB view details)

Uploaded CPython 3.7m

ms2pip-3.6.2-cp37-cp37m-macosx_10_14_x86_64.whl (35.0 MB view details)

Uploaded CPython 3.7mmacOS 10.14+ x86-64

ms2pip-3.6.2-cp36-cp36m-manylinux1_x86_64.whl (34.5 MB view details)

Uploaded CPython 3.6m

ms2pip-3.6.2-cp36-cp36m-macosx_10_14_x86_64.whl (35.0 MB view details)

Uploaded CPython 3.6mmacOS 10.14+ x86-64

File details

Details for the file ms2pip-3.6.2.tar.gz.

File metadata

  • Download URL: ms2pip-3.6.2.tar.gz
  • Upload date:
  • Size: 12.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.3

File hashes

Hashes for ms2pip-3.6.2.tar.gz
Algorithm Hash digest
SHA256 13968c2b9af30bfacd628a4d0ab8399ecdc9e5321939f756256999250e89ba4e
MD5 7e46ea5e361b3f5eaf68d18aad7ef33b
BLAKE2b-256 680e375fe897eb34ee1136cfc1eeee2146b42cbfe6edbfd1fa745c6b18934243

See more details on using hashes here.

File details

Details for the file ms2pip-3.6.2-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: ms2pip-3.6.2-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 34.5 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for ms2pip-3.6.2-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f02f8733b1a19a5056c8627d07b7298eae9d5116bc5027731b84c63d9c90fc04
MD5 d6497d23e0c2f5a90459486e962e8e93
BLAKE2b-256 54f8ec3a9dea7f21b716144900c97b3f351703ed8608821b5f696599fc7f00aa

See more details on using hashes here.

File details

Details for the file ms2pip-3.6.2-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: ms2pip-3.6.2-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 35.0 MB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for ms2pip-3.6.2-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f3821960f475f57422dc9a7325c941f3ae5f28de32ef8cd72c60b42a14e2d513
MD5 8386c0075c86c1a588563f5c10517268
BLAKE2b-256 e96d38737f676ddc487c607fa8d18adaa02c46555855d0242a60b9d8fa007208

See more details on using hashes here.

File details

Details for the file ms2pip-3.6.2-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: ms2pip-3.6.2-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 34.5 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for ms2pip-3.6.2-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 d3ae31de0ccc012066415e91f3462eb929ebf4b4091e3df9b9b7f2aafcbac254
MD5 6ae5496ba5ebdecffb93528e05f4cb76
BLAKE2b-256 e7126e4cc22c7bcb19bd3e8e10b5c024be1428c17f3cb33318be805b955246a5

See more details on using hashes here.

File details

Details for the file ms2pip-3.6.2-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: ms2pip-3.6.2-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 35.0 MB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for ms2pip-3.6.2-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e99c4d0b344c97e842303af9bf63c5fe5887668d0ff57aad5db8f1f792c1ddc8
MD5 6945d12cf2917a7db48f637dfdda1cbb
BLAKE2b-256 9b65e4b57f0954a74e54281845c06936afcee19cf66e732ea96e5b2a74ad0b18

See more details on using hashes here.

File details

Details for the file ms2pip-3.6.2-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: ms2pip-3.6.2-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 34.5 MB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for ms2pip-3.6.2-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 bdf5eec41c6140271b78e19ac49f4ea8fef08887a6c75fef91418df57fd44213
MD5 e965d8bdbc782cab345556f995592a88
BLAKE2b-256 1d5a00ea1da7b95fa3fd86367bbfc3b0da583256f3803c6878ce696ce2e0b3cf

See more details on using hashes here.

File details

Details for the file ms2pip-3.6.2-cp36-cp36m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: ms2pip-3.6.2-cp36-cp36m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 35.0 MB
  • Tags: CPython 3.6m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for ms2pip-3.6.2-cp36-cp36m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 e85ed0faca5a2f2d24e73548f8f9fd5143ff9d9eca327d2f004262bfdbed513f
MD5 0f9061e907abfefea45b6478a7493b37
BLAKE2b-256 a48535242bd86ad330afa7e4eb13be085537ff9e570a7fc7380cc485e17528d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page