A proteomics search engine for LC-MS1 spectra.

These details have not been verified by PyPI

Project description

ms1searchpy - a DirectMS1 proteomics search engine for LC-MS1 spectra

ms1searchpy consumes LC-MS data (mzML) or peptide features (tsv) and performs protein identification and quantitation.

Basic usage

Basic command for protein identification:

ms1searchpy *.mzML -d path_to.FASTA

ms1searchpy *_peptideFeatures.tsv -d path_to.FASTA

Read further for detailed info, including quantitative analysis.

Citing ms1searchpy

Ivanov et al. DirectMS1Quant: Ultrafast Quantitative Proteomics with MS/MS-Free Mass Spectrometry. https://pubs.acs.org/doi/10.1021/acs.analchem.2c02255

Ivanov et al. Boosting MS1-only Proteomics with Machine Learning Allows 2000 Protein Identifications in Single-Shot Human Proteome Analysis Using 5 min HPLC Gradient. https://doi.org/10.1021/acs.jproteome.0c00863

Ivanov et al. DirectMS1: MS/MS-free identification of 1000 proteins of cellular proteomes in 5 minutes. https://doi.org/10.1021/acs.analchem.9b05095

Installation

Using pip:

pip install ms1searchpy

It is recommended to additionally install DeepLC version either 1.1.2 (official) or 1.1.2.2 (unofficial fork with small changes) . Newer version has some issues right now.

pip install deeplc==1.1.2

pip install https://github.com/markmipt/DeepLC/archive/refs/heads/alternative_best_model.zip

This should work on recent versions of Python (3.8-3.10).

Usage tutorial: protein identification

The script used for protein identification is called ms1searchpy. It needs input files (mzML or tsv) and a FASTA database.

Input files

If mzML are provided, ms1searchpy will invoke biosaur2 to generate the features table. You can also use other software like Dinosaur or Biosaur, but biosaur2 is recommended. You can also make it yourself, the table must contain columns 'massCalib', 'rtApex', 'charge' and 'nIsotopes' columns.

How to get mzML files

To get mzML from RAW files, you can use Proteowizard MSConvert...

msconvert path_to_file.raw -o path_to_output_folder --mzML --filter "peakPicking true 1-" --filter "MS2Deisotope" --filter "zeroSamples removeExtra" --filter "threshold absolute 1 most-intense"

...or compomics ThermoRawFileParser, which produces suitable files with default parameters.

RT predictor

For protein identification, ms1searchpy needs a retention time prediction model. The recommended one is DeepLC, but you can also use built-in additive model (default).

Examples

ms1searchpy test.mzML -d sprot_human.fasta -deeplc 1 -ad 1

This command will run ms1searchpy with DeepLC RT predictor available as deeplc (should work if you install DeepLC alongside ms1searchpy. -ad 1 creates a shuffled decoy database for FDR estimation. You should use it only once and just use the created database for other searches.

ms1searchpy test.features.tsv -d sprot_human_shuffled.fasta -deeplc 1

Here, instead of mzML file, a file with peptide features is used.

Output files

ms1searchpy produces several tables:

identified proteins, FDR-filtered (sample.features_proteins.tsv) - this is the main result;
all identified proteins (sample.features_proteins_full.tsv);
all identified proteins based on all PFMs (sample.features_proteins_full_noexclusion.tsv);
all matched peptide match fingerprints, or peptide-feature matches (sample.features_PFMs.tsv);
all PFMs with features prepared for Machnine Learning (sample.features_PFMs_ML.tsv);
number of theoretical peptides per protein (sample.features_protsN.tsv);
log file with estimated mass and RT accuracies (sample.features_log.txt).

Combine results from replicates

You can combine the results from several replicate runs with ms1combine by feeding it _PFMs_ML.tsv tables:

ms1combine sample_rep_*.features_PFMs_ML.tsv

Using Group-specific FDR for metaproteomics

Group-specific FDR for metaproteomics should be used for accurate estimation of protein identified among the different groups. The command ms1groups should be used for that:

 ms1groups F04.features_PFMs_ML.tsv -d F04_top15_shuffled.fasta -out group_statistics_by -fdr 5.0 -groups genus

It produces a table with the number of identified proteins for each group using group-specific FDR. This is basically multiple DirectMS1 searches with small protein databases containing only a single group and combining results all together. However, using the mentioned “ms1groups” script and preliminary DirectMS1 search, two problems are solved: small statistics for all mass/RT/Machine Learning calibration procedures within DirectMS1 workflow for low-populated groups and computational time. Currently supported groups are 'species', 'genus', 'family', 'order', 'class', 'phylum', 'kingdom', 'domain'. The groups are automatically extracted using ete3 Python module and NCBI Taxonomic database. Also, the script supports groups dbname and OX: the the former is a taxonomy in swiss-prot protein name (_HUMAN, _YEAST, etc.) and the latter is the taxonomy provided by 'OX=' from protein description in the fasta file.

Usage tutorial: Quantitation

After obtaining the protein identification results, you can proceed to compare your samples using LFQ.

Using directms1quant

New LFQ method designed specifically for DirectMS1 is invoked like this:

directms1quant -S1 sample1_r{1,2,3}.features_proteins_full.tsv -S2 sample2_r{1,2,3}.features_proteins_full.tsv

It produces a filtered table of significantly changed proteins with p-values and fold changes, as well as the full protein table and a separate file simply listing all IDs of significantly modified proteins (e.g. for easy copy-paste into a StringDB search window).

Multi-condition protein profiling using directms1quantmulti

You can make a quantitation for complex projects using script directms1quantmulti. The example below is shown for our project of time-series profiling of glioblastoma cell line under interferon treatment.

Script takes a tab-separated table (.tsv) with details for all project files. An example of a sample file table is available here in the examples folder. It should contain the following columns:

File Name - filename of raw file. For example, “QEHFX_JB_000379”.

group - sample group of file. In our example, there are K (Control group), IFN30 (treatment with 30 units/ml of interferon) and IFN1000 groups. The first group mentioned in the table will be used as control for pairwise directms1quant runs.

condition - sample subgroup of file. In our example, there are multiple time points after treatment, such as 0h, 30min, 1h, 2h, etc. By default, only the same conditions will be used for pairwise comparisons. For example, IFN30 0h vs K 0h; IFN1000 0h vs K 0h, etc.

vs - column for specific condition comparison. For example, in our case, we did not have control samples at the 30 min time point. Thus, we would like to proceed directms1quant runs for IFN30 30 min vs K 0h; and IFN1000 30 min vs K 0h comparisons. Thus, for the 30 min IFN30 and IFN1000 files we put “0h” in the “vs” column. See example table for details.

replicate - column for replicate number of specific condition and sample group.

BatchMS - column for mass-spectrometry Batch. This parameter is used for extra normalization within a batch.

The script consists of four different stages and you can rerun the script without rerunning previous stages (“-start_stage” option).

Stage 1 is a set of pairwise DirectMS1Quant runs for different interferon treatment conditions versus control samples.

Stage 2 is preparation of peptide LFQ table for all files using the results obtained in the previous step.

Stage 3 is preparation of the protein LFQ table. Only the peptides labeled by DirectMS1Quant as significantly different between samples in at least X pairwise comparisons are used for protein quantitation. The X parameter is controlled by “min_signif_for_pept” option.

Stage 4 is preparation of LFQ profiling figures for proteins specified in the file under “proteins_for_figure” option. The file should be a tsv table with column “dbname” containing protein database names in the swiss-prot format. Any default directms1quant output table with differentially expressed proteins can be used here.

Example of script usage::

directms1quantmulti -db ~/fasta_folder/sprot_human_shuffled.fasta -pdir ~/folder_with_ms1searchpy_results/ -samples ~/samples.csv -min_signif_for_pept 2 -out DQmulti_2024 -pep_min_non_missing_samples 0.75 -start_stage 1 -proteins_for_figure ~/custom_list_of_proteins.tsv -figdir ~/output_figure_folder/

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.8.9

Sep 6, 2024

2.8.5

May 27, 2024

2.8.4

May 27, 2024

2.8.3

May 24, 2024

2.8.2

May 14, 2024

2.7.3

Mar 29, 2024

2.6.6

Jan 15, 2024

2.6.4

Dec 26, 2023

2.6.3

Sep 13, 2023

2.6.2

Sep 7, 2023

2.5.4

Sep 4, 2023

2.5.3

Aug 10, 2023

2.5.2

Aug 10, 2023

2.4.2

Jun 23, 2023

2.4.1

Jun 15, 2023

2.3.21

May 31, 2023

2.3.20

Mar 16, 2023

2.3.13

Oct 13, 2022

2.3.12

Oct 13, 2022

2.3.10

Apr 27, 2022

2.3.7

Apr 8, 2022

2.3.6

Feb 1, 2022

2.3.5

Jan 28, 2022

2.3.4

Jan 28, 2022

2.3.3

Jan 26, 2022

2.3.2

Jan 20, 2022

2.3.1

Dec 29, 2021

2.3.0

Dec 17, 2021

2.2.9

Dec 13, 2021

2.2.8

Dec 13, 2021

2.2.7

Dec 6, 2021

2.2.6

Dec 6, 2021

2.1.7

Nov 8, 2021

2.1.5

Feb 1, 2021

2.1.4

Feb 1, 2021

2.1.0

Jan 15, 2021

2.0.9

Jan 12, 2021

2.0.6

Oct 26, 2020

2.0.5

Oct 26, 2020

2.0.4

Oct 21, 2020

2.0.3

Oct 19, 2020

2.0.2

Oct 19, 2020

1.2.1

Feb 4, 2020

1.1.8

Jan 28, 2020

1.1.4

Sep 7, 2019

1.1.3

Sep 2, 2019

1.1.2

Jun 6, 2019

1.1.1

Jun 6, 2019

1.1.0

Jun 6, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ms1searchpy-2.8.9-py3-none-any.whl (52.4 kB view details)

Uploaded Sep 6, 2024 Python 3

File details

Details for the file ms1searchpy-2.8.9-py3-none-any.whl.

File metadata

Download URL: ms1searchpy-2.8.9-py3-none-any.whl
Upload date: Sep 6, 2024
Size: 52.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ms1searchpy-2.8.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`101d3bd564425dda19c2aa457b24c5b0825347e65f9252b1a61e113648326372`
MD5	`3fcb59054221cce530dacad67dca44a7`
BLAKE2b-256	`7ccf2c7d5dcfef6112751d0148498cf25a65ed1652ea338841c52873cf1dd57a`

See more details on using hashes here.

ms1searchpy 2.8.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ms1searchpy - a DirectMS1 proteomics search engine for LC-MS1 spectra

Basic usage

Citing ms1searchpy

Installation

Usage tutorial: protein identification

Input files

How to get mzML files

RT predictor

Examples

Output files

Combine results from replicates

Using Group-specific FDR for metaproteomics

Usage tutorial: Quantitation

Using directms1quant

Multi-condition protein profiling using directms1quantmulti

Links

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes