Skip to main content

A proteomics search engine for LC-MS1 spectra.

Project description

ms1searchpy - a DirectMS1 proteomics search engine for LC-MS1 spectra

The .tsv (or mzML) and .fasta files are required for basic operation of the script. tsv file is tab-separated text file with peptide features generated by biosaur2 (https://github.com/markmipt/biosaur2) from mzML file. This file can be generated by any other software for peak-picking and must contain columns 'massCalib', 'rtApex', 'charge' and 'nIsotopes' columns. For a сonvenient usage, mzML files can be used directly and the script will run biosaur2.
For an efficient usage of retention time, user can install and use ELUDE prediction algorithm (-elude path_to_elude_binary should be used in parameters). For the most efficient usage of retention time, user can install and use DeepLC prediction algorithm (-deeplc path_to_deeplc_binary should be used in parameters).

Algorithm can be run with following command:

ms1searchpy path_to_MZML -d path_to_fasta

OR

ms1searchpy path_to_peptideFeatures -d path_to_fasta

The script output contains files: all identified proteins with no rule "one peptide - one protein" (filename_proteins_full_noexclusion.tsv), all identified proteins (filename_proteins_full.tsv), filtered proteins (filename_proteins.tsv), all matched peptide match fingerprints (filename_PFMs.tsv), all matched peptide match fingerprints with features prepared for Machnine Learning (filename_PFMs_ML.tsv) and log file with estimated mass and RT accuracies (filename_log.txt).

Citing ms1searchpy

Ivanov et al. Boosting MS1-only Proteomics with Machine Learning Allows 2000 Protein Identifications in Single-Shot Human Proteome Analysis Using 5 min HPLC Gradient. https://doi.org/10.1021/acs.jproteome.0c00863

Ivanov et al. DirectMS1: MS/MS-free identification of 1000 proteins of cellular proteomes in 5 minutes. https://doi.org/10.1021/acs.analchem.9b05095

Installation

Using the pip:

pip install ms1searchpy

Example for full installation and usage:

Convert raw files to mzML:

msconvert path_to_file.raw -o path_to_output_folder --mzML --filter "peakPicking true 1-" --filter "MS2Deisotope" --filter "zeroSamples removeExtra" --filter "threshold absolute 1 most-intense"

There are two suggested ways to install ms1searchpy with all external software (Diffacto, DeepLC) to get the maximum efficiency from ms1searchpy.

First way is suggested for Linux users: to use the Python virtual environment.

  1. “pip3 install virtualenv”
  2. “virtualenv3 --python=python3.8 /home/mark/env_ms1” . Comment: While ms1searchpy and Diffacto support both versions of Python3.8 and Python3.9, DeepLC works stable only with Python3.8. The name and path to virtual environment is not limited to the example above.
  3. “source /home/mark/env_ms1/bin/activate” . Comment: to activate the virtual environment. You need to activate it every time when you are going to work with ms1searchpy.
  4. “pip3 install ms1searchpy” . Comment: to install the latest ms1searchpy from PyPi.
  5. “pip3 install deeplc” . Comment: to install the latest ms1searchpy from PyPi.
  6. “pip3 install https://github.com/statisticalbiotechnology/diffacto/archive/master.zip” . Comment: to install the latest ms1searchpy from github. Note, current PyPi diffacto version is outdated and has a critical bug.
  7. “deactivate” . Comment: to deactivate virtual environment.

Examples of using ms1searchpy from virtual environment:

  1. “source /home/mark/env_ms1/bin/activate”
  2. “ms1searchpy /home/mark/test.mzML -d /home/mark/sprot_human.fasta -deeplc /home/mark/env_ms1/bin/deeplc -ad 1” . Comment: this command will run ms1searchpy with DeepLC RT prediction. “-ad 1” command creates a shuffled decoy database for FDR estimation. You should use it only once and just use the created database for other searches. Or alternative: “ms1searchpy /home/mark/test.features.tsv -d /home/mark/sprot_human_shuffled.fasta -deeplc /home/mark/env_ms1/bin/deeplc” . Comment: Instead of mzML file, a file with peptide features could be used with ms1searchpy. This file will be created automatically by ms1searchpy after the first processing of the mzML file.
  3. “ms1todiffacto -dif /home/mark/env_ms1/bin/diffacto -S1 sample1_r1.proteins.tsv sample1_r2.proteins.tsv sample1_r3.proteins.tsv -S2 sample2_r1.proteins.tsv sample2_r2.proteins.tsv sample2_r3.proteins.tsv -norm median -out diffacto_output.tsv -min_samples 3” . Comment: ms1todiffacto command is used to prepare input file for diffacto from ms1searchpy output and to automatically run diffacto.
  4. “deactivate” . Comment: to finish work with ms1searchpy.

Alternative way to install and use ms1searchpy is by using docker. This method is suggested for Windows users due to multiple difficulties of installing and using DeepLC under Windows.

  1. Install docker. For details: https://docs.docker.com/docker-for-windows/install/
  2. Open terminal. It could be done using Win+R keys combinations and typing “cmd”.
  3. “docker pull abdrakhimov1/ms1searchpy”
  4. “docker tag abdrakhimov1/ms1searchpy name_for_docker_container” . Comment: This is optional to make usage of docker image more convenient.
  5. “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container ms1searchpy data/test.mzML -d data/sprot_human.fasta -deeplc /deeplc/bin/deeplc -ad 1” . Comment: The command to run ms1searchpy using docker is similar to the general ms1searchpy using described in virtualenv section. The main difference is that the command should always start with “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container”. The path to the data_folder allows docker to use data from the Windows system inside the docker container. Note, that DeepLC is already installed in the docker container and the default path (/deeplc/bin/deeplc) should be used. Note, the command contains two types of slashes “/” and “\”.
  6. “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container ms1todiffacto -dif diffacto -S1 data/test_s1_r1.proteins.tsv data/test_s1_r2.proteins.tsv data/test_s1_r3.proteins.tsv -S2 data/test_s2_r1.proteins.tsv data/test_s2_r2.proteins.tsv data/test_s2_r3.proteins.tsv -norm median” . Comment: Example of ms1todiffacto usage with docker. Note, that Diffacto is already installed in the docker container and the default path (diffacto) should be used for as -dif argument.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ms1searchpy-2.2.7-py3-none-any.whl (96.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page