Helping to integrate Spectral Predictors and Rescoring.

Project description

inSPIRE

in silico Spectral Predictor Informed REscoring

inSPIRE allows easy rescoring of MaxQuant, Mascot or PEAKS DB search results using spectral prediction. inSPIRE is primarily developed to use Prosit predicted spectra but can also use MS²PIP predictions. inSPIRE enables the prediction of MS2 spectra with Prosit without the need of a GPU and it can be run on a standard workstation or laptop.

Set Up

Before Downloading

If you are working on Mac with an M1 chip you will require Miniforge. For all other users, any version of conda will suffice.

Setting up your environment:

To start with create a new conda environment with python version 3.11:

conda create --name inspire python=3.11

Activate this environment

conda activate inspire

You will then need to install the inspire package:

pip install inspirems

To check your installation, run the following command (it is normal for this call to hang for a few seconds on first execution)

inspire -h

If you wish to use Percolator for rescoring rather than Mokapot you will need to install it separately. On Linux, Percolator can be installed via conda with the command below. Otherwise see https://github.com/percolator/percolator.

conda install -c bioconda percolator

Once you have successfully installed inSPIRE you should run it specifying your pipeline and a config file. The core execution of inSPIRE will take the form:

inspire --config_file path-to-config-file --pipeline pipeline-to-execute

where the config file is a yaml file specifying details of the inSPIRE execution and the pipeline is one of the options described below.

Citation

inSPIRE

Please cite the following article if you are using inSPIRE in your research:

Cormican, J. A., Horokhovskyi, Y., Soh, W. T., Mishto, M., and Liepe, J. (2022) inSPIRE: An open-source tool for increased mass spectrometry identification rates using Prosit spectral prediction. Mol Cell Proteomics.
doi.org/10.1016/j.mcpro.2022.100432

Dependencies

Please also cite the relevant publications of the rescoring tools used, see details on https://github.com/percolator/percolator if using Percolator and details on https://github.com/wfondrie/mokapot if using Mokapot.

Please also cite the relevant publications of the spectral predictors, see details on https://www.proteomicsdb.org/prosit if using Prosit and details on https://github.com/compomics/ms2pip_c if using MS²PIP.

If using inSPIRE-affinity please also cite the relevant publications for NetMHCpan binding affinity predictions. See details on https://services.healthtech.dtu.dk/service.php?NetMHCpan-4.1.

Running a small example.

The example folder provides a simple example that you should be able to run immediately.

inspire --pipeline downloadExample

The "core" pipeline which takes search engine and mgf input and produces fully rescored identifications using Prosit prediction on CPU. Execute this with:

inspire --pipeline core --config_file example/config.yml

This will run the rescoring and produce a report upon completion (this should take 2-3 minutes). This uses Mokapot rather than Percolator as it is easier to install together with inSPIRE. If you have Percolator installed and with to use it you can change the "rescoreMethod" in example/config.yml

Plotting Spectra

To use the inSPIRE plotting and see example pair plots comparing the experimental data to prosit predictions call:

inspire --pipeline plotSpectra --config_file example/config.yml

This plots the PSMs specified in example/output/plotData.csv and saves the plots to example/output/spectralPlots.pdf. To generate these plots for a different PSM simply copy the first 4 columns of the relevant line from example/output/finalAssignments.csv into example/output/finalAssignments.csv and rerun the plotSpectra pipeline.

inSPIRE-affinity

The core inSPIRE functionality can be executed via the "core" pipeline which will run rescoring using predicted spectra and provide final results to the user. If you wish to integrate binding affinity prediction you will have to make two modifications.

Firstly, you will need to add:

useBindingAffinity: asFeature

to your config.yml file.

Secondly, the execution will be slightly different. You will have to run two subsections of the "core" pipeline separately.

Firstly running the "prepare" pipeline will provide the input for all predictors. It will have created an mhcpan folder in the output folder with all unique peptides of each length saved as "inputLen{length}.txt". These can then be input to NetMHCpan (see https://services.healthtech.dtu.dk/service.php?NetMHCpan-4.1). The prediction files should be placed in the mhcpan folder as "outputLen{length}preds.txt".

You can then run the "predictSpectra" pipeline, followed by the "rescore" pipeline which will use the predicted binding affinities together with the spectral predictions to produce rescored PSM assignments.

inSPIRE for in vitro protein digestion.

If using inSPIRE for in vitro protein digestion after MSFragger search please use the following settings to your config file. Firstly to ensure that the FASTA headers of your spliced peptide database match those generated by invitroSPI and set this as the accession format.

accessionFormat: invitroSPI

Secondly, set this flag so that inSPIRE will use accession as a feature:

useAccessionStrata: True

Thirdly, please provide the protein sequence in FASTA format which is used for validation within inSPIRE:

proteome: path-to-fasta-file

Finally, for the less diverse dataset of the in vitro digestion, we also recommend using a minimal feature set to avoid overfitting:

useMinimalFeatures: True

Prosit Collision Energy Calibration

Prosit uses the collision energy setting of the mass spectrometer as input for MS² prediction. While you could base this off the machine calibration, best results are expected if you callibrate prosit predictions for high scoring PSMs to the experimental data. Calibration will be executed automatically as part of the "core" pipeline if no collision energy is provided.

inSPIRE Config File

The configuration file is used to control inSPIRE execution using a set of keys and values specified in yaml format. yaml is a relatively simple format for entering many configs. The yaml file should take the form:

---
key1: value1
key2: value2
...

The keys needed are detailed below.

Required Configs

These are the minimal configs required to run inSPIRE.

Key	Description
experimentTitle	A title for the experiment.
searchResults	The file path to the results of your ms search or a list of file paths if using multiple search results.
searchEngine	The search engine used (maxquant, mascot, or peaks).
outputFolder	Specify an output folder location which inSPIRE should write to.
scansFolder	Specify a folder containing the experimental spectra files in mgf or mzML format.
scansFormat	Specify the format of the spectra file (must be either mgf or mzML).

Arguments Required for Prosit

Key	Description
collisionEnergy	The mass spectrometer collision energy setting (run --calibrate pipeline to estimate the optimal setting).
deltaMethod	Recommended to set to "predictor" for immunopeptidome or "ignore" for tryptic proteome digestion. This defaults to ignore if MS2PIP is used.

Arguments Required for MS2PIP Required

Key	Description
ms2pipModel	Specify the MS2PIP model to be used (recommended HCD2021 for tryptic proteome data and Immuno-HCD for immunopeptidome).

Optional Settings (Experiment Specifications, Recommended to Check)

The following settings are set by default but you should check that they are valid for your experimental set up.

Key	Description
spectralPredictor	Either Prosit or MS²PIP (default=prosit).
mzUnits	The units used for the m/z accuracy either Da for Daltons or ppm for Parts Per Million (default=Da).
mzAccuracy	The mz accuracy of the mass spectrometer in Daltons or ppm(default=0.02, default unit is Da).
rescoreMethod	inSPIRE supports either "mokapot" or "percolator" (default=mokapot).
nCores	The number of CPU cores you wish to use in rescoring (default=1).
fixedModifications	You must specify the fixed modifications used in a MaxQuant search.
forceReload	Boolean flag on whether to force models to be redownloaded in case you accidentally change the contents of your inSPIRE model folder.

Additional Options

These setting are optional and are completely dependent on user preference.

Key	Description
falseDiscoveryRate	This is the false discovery rate Percolator optimises for (default=0.01).
excludeFeatures	This specifies any features which you wish to exclude from rescoring (default=empty list).
includeFeatures	This specifies any features which you wish to include from rescoring and ignore all other features (default=empty list, meaning all features are used).
reduce	By default inSPIRE uses only the highest scoring hit per scan (and accession group if specified). If you set reduce to False this will consider all hits (default=True).
reuseInput	Boolean flag on whether to reuse formatted data after the first read in. When using Mascot in particular this may be useful as it reduces the time spend formatting data for input.
filterCysteine	Option to filter cysteins from rescoring if the sample contains unmodified cysteine and Prosit is being used.
dropUnknownPTMs	Whether to drop PSMs containing modifications other than oxidation of methionine and carbamidomethylation of cysteine. (default=True if prosit used, False if ms2pip used)

Additional Configs for Mascot Distiller

If you have a combined mgf file from Mascot Distiller, you must add the following configs.

Key	Description
sourceFileName	If you are using a Mascot search from an mgf file which does not contain the source file name specify it here.
scanTitleFormat	If Distiller used, set this argument to mascotDistiller.
distillerLog	If Distiller used, set this argument to the path to the "table_peptide_int.txt" file from Distiller.
combinedScansFile	Set this to the combined mgf file from Mascot Distiller and put the mgf file in folder specified by scansFolder.

NetMHCpan Configs

NetMHCpan predicts the binding affinity of a peptide for various HLA molecules. inSPIRE can use this as a validation of its predictions or as a feature for rescoring.

Key	Description
useBindingAffinity	Set to asValidation if you want to check the percentage binders in your standard inSPIRE identifications. Set to asFeature if you want to use predicted binding affinity as a rescoring feature.

inSPIRE Pipelines.

This section details all possible pipeline you can run with inSPIRE. The 3 most important pipelines are calibrate, core, and plotSpectra.

inspire --pipeline calibrate

As described above, this pipeline selects the highest scoring PSMs from the original search results and tests collision energy settings in the range 24 to 36 (inclusive) to find the optimal setting to be used. The calibrate collision energy is printed to the terminal.

inspire --pipeline core

The core pipeline reads in and formats the search results file, predicts MS² spectra for the peptides, and runs rescoring to provide final identifications of all peptides.

This pipeline can be executed in a number of sub-sections as detailed below.

inspire --pipeline plotSpectra

This pipeline will plot experimental vs. Prosit predicted spectra for all PSMs specified in the plotData.csv file which should be placed in the output folder. This must at minimum specify the source file, scan number, peptide sequence, and modified sequence as provided in the inSPIRE finalAssignments.csv file.

Subsections of the core pipeline

The "core" pipeline contains a large number of steps, which can be run individually. This can be important when using the binding affinity prediction (see above) but also allows the user to make minor changes to the config file and rerun only the relevant sections of the pipeline.

inspire --pipeline prepare

This is the first step of the "core" functionality. It reads in the search results and formats input for Prosit/MS²PIP predictions and NetMHCpan if required.

inspire --pipeline predictSpectra

This predicts MS² spectra using the specified spectral predictor and writes to msp format.

inspire --pipeline rescore

This option executes all of the remaining steps of the pipeline.

inspire --pipeline featureGeneration

This pipeline generate

inspire --pipeline featureSelection+

This pipeline filters the feature set as required by the config file (default does not apply any filter), runs rescoring, formats the output, and generates a html report with details of performance and comparison to a baseline rescoring without spectral prediction.

Project details

Release history Release notifications | RSS feed

2.0rc7 pre-release

Jul 11, 2024

2.0rc6 pre-release

Jun 9, 2024

2.0rc5 pre-release

Jan 25, 2024

2.0rc4 pre-release

Jan 16, 2024

2.0rc3 pre-release

Jan 16, 2024

2.0rc2 pre-release

Jan 15, 2024

This version

2.0rc1 pre-release

Jan 15, 2024

1.5

Aug 2, 2023

1.4

Dec 14, 2022

1.3

Nov 30, 2022

1.2

Nov 15, 2022

1.1

Nov 4, 2022

1.0

Oct 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

inspirems-2.0rc1-py3-none-any.whl (142.3 kB view hashes)

Uploaded Jan 15, 2024 Python 3

Hashes for inspirems-2.0rc1-py3-none-any.whl

Hashes for inspirems-2.0rc1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc34cc118cd96ec2f3b4cfebbd0bc906af014b7c35be41f9d6538d6eb94d213c`
MD5	`4af901f49b119f6c5ab2d1af63410b8a`
BLAKE2b-256	`b7c98b430128e708a0a97c730b3ceb10541675286bcc59ecf44535451ff90d96`

inspirems 2.0rc1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

inSPIRE

Set Up

Before Downloading

Setting up your environment:

Citation

inSPIRE

Dependencies

Running a small example.

Plotting Spectra

inSPIRE-affinity

inSPIRE for in vitro protein digestion.

Prosit Collision Energy Calibration

inSPIRE Config File

Required Configs

Arguments Required for Prosit

Arguments Required for MS2PIP Required

Optional Settings (Experiment Specifications, Recommended to Check)

Additional Options

Additional Configs for Mascot Distiller

NetMHCpan Configs

inSPIRE Pipelines.

inspire --pipeline calibrate

inspire --pipeline core

inspire --pipeline plotSpectra

Subsections of the core pipeline

inspire --pipeline prepare

inspire --pipeline predictSpectra

inspire --pipeline rescore

inspire --pipeline featureGeneration

inspire --pipeline featureSelection+

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution