Skip to main content

A proteomics search engine for LC-MS1 spectra.

Project description

ms1searchpy - a DirectMS1 proteomics search engine for LC-MS1 spectra

The .tsv (or mzML) and .fasta files are required for basic operation of the script. tsv file is tab-separated text file with peptide features generated by Dinosaur software (J.Teleman et al., "Dinosaur: A Refined Open-Source Peptide MS Feature Detector", JPR 2016) or Biosaur (https://github.com/abdrakhimov1/Biosaur) from mzML file. This file can be generated by any other software for peak-picking and must contain columns 'massCalib', 'rtApex', 'charge' and 'nIsotopes' columns. For a сonvenient usage, mzML files can be used directly and the script will run an attached version of Dinosaur (installed Java is required).
For an efficient usage of retention time, user can install and use ELUDE prediction algorithm (-elude path_to_elude_binary should be used in parameters). For the most efficient usage of retention time, user can install and use DeepLC prediction algorithm (-deeplc path_to_deeplc_binary should be used in parameters).

Algorithm can be run with following command:

ms1searchpy path_to_MZML -d path_to_fasta

OR

ms1searchpy path_to_peptideFeatures -d path_to_fasta

The script output contains files: all identified proteins (filename_proteins_full.tsv), filtered proteins (filename_proteins.tsv), all matched peptide match fingerprints (filename_PFMs.tsv), all matched peptide match fingerprints with features prepared for Machnine Learning (filename_PFMs_ML.tsv) and log file with estimated mass and RT accuracies (filename_log.txt).

Citing ms1searchpy

Ivanov et al. DirectMS1: MS/MS-free identification of 1000 proteins of cellular proteomes in 5 minutes. https://doi.org/10.1021/acs.analchem.9b05095

Installation

Using the pip:

pip install ms1searchpy

Example for full installation and usage:

Convert raw files to mzML:

msconvert path_to_file.raw -o path_to_output_folder --mzML --filter "peakPicking true 1-" --filter "MS2Deisotope" --filter "zeroSamples removeExtra" --filter "threshold absolute 1 most-intense"

There are two suggested ways to install ms1searchpy with all external software (Diffacto, DeepLC) to get the maximum efficiency from ms1searchpy.

First way is suggested for Linux users: to use the Python virtual environment.

  1. “pip3 install virtualenv”
  2. “virtualenv3 --python=python3.6 /home/mark/env_ms1” . Comment: While ms1searchpy and Diffacto support all versions of Python3.6+, DeepLC works stable only with Python3.6. The name and path to virtual environment is not limited to the example above.
  3. “source /home/mark/env_ms1/bin/activate” . Comment: to activate the virtual environment. You need to activate it every time when you are going to work with ms1searchpy.
  4. “pip3 install ms1searchpy” . Comment: to install the latest ms1searchpy from PyPi.
  5. “pip3 install deeplc” . Comment: to install the latest ms1searchpy from PyPi.
  6. “pip3 install https://github.com/statisticalbiotechnology/diffacto/archive/master.zip” . Comment: to install the latest ms1searchpy from github. Note, current PyPi diffacto version is outdated and has a critical bug.
  7. “deactivate” . Comment: to deactivate virtual environment.

Examples of using ms1searchpy from virtual environment:

  1. “source /home/mark/env_ms1/bin/activate”
  2. “ms1searchpy /home/mark/test.mzML -d /home/mark/sprot_human.fasta -deeplc /home/mark/env_ms1/bin/deeplc -ad 1” . Comment: this command will run ms1searchpy with DeepLC RT prediction. “-ad 1” command creates a shuffled decoy database for FDR estimation. You should use it only once and just use the created database for other searches. Or alternative: “ms1searchpy /home/mark/test.features.tsv -d /home/mark/sprot_human_shuffled.fasta -deeplc /home/mark/env_ms1/bin/deeplc” . Comment: Instead of mzML file, a file with peptide features could be used with ms1searchpy. This file will be created automatically by ms1searchpy after the first processing of the mzML file.
  3. “ms1todiffacto -dif /home/mark/env_ms1/bin/diffacto -S1 sample1_r1.proteins.tsv sample1_r2.proteins.tsv sample1_r3.proteins.tsv -S2 sample2_r1.proteins.tsv sample2_r2.proteins.tsv sample2_r3.proteins.tsv -norm median -out diffacto_output.tsv -min_samples 3” . Comment: ms1todiffacto command is used to prepare input file for diffacto from ms1searchpy output and to automatically run diffacto.
  4. “deactivate” . Comment: to finish work with ms1searchpy.

Alternative way to install and use ms1searchpy is by using docker. This method is suggested for Windows users due to multiple difficulties of installing and using DeepLC under Windows.

  1. Install docker. For details: https://docs.docker.com/docker-for-windows/install/
  2. Open terminal. It could be done using Win+R keys combinations and typing “cmd”.
  3. “docker pull abdrakhimov1/ms1searchpy”
  4. “docker tag abdrakhimov1/ms1searchpy name_for_docker_container” . Comment: This is optional to make usage of docker image more convenient.
  5. “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container ms1searchpy data/test.mzML -d data/sprot_human.fasta -deeplc /deeplc/bin/deeplc -ad 1” . Comment: The command to run ms1searchpy using docker is similar to the general ms1searchpy using described in virtualenv section. The main difference is that the command should always start with “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container”. The path to the data_folder allows docker to use data from the Windows system inside the docker container. Note, that DeepLC is already installed in the docker container and the default path (/deeplc/bin/deeplc) should be used. Note, the command contains two types of slashes “/” and “\”.

Dependencies

  • pyteomics
  • numpy
  • scipy
  • sklearn
  • lightgbm
  • pandas
  • biosaur

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms1searchpy-2.1.4.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

ms1searchpy-2.1.4-py3-none-any.whl (13.7 MB view details)

Uploaded Python 3

File details

Details for the file ms1searchpy-2.1.4.tar.gz.

File metadata

  • Download URL: ms1searchpy-2.1.4.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for ms1searchpy-2.1.4.tar.gz
Algorithm Hash digest
SHA256 e7d485062f19e704cecdfacd005fb9612206a354d5a20dfefda9ff828230a988
MD5 bf1f75b671a8dbffbfb5cc43e53c0144
BLAKE2b-256 14e26d8d3360d7d7521e5caf1bd411c88941b80cf6147724ad5d22ed86ab141c

See more details on using hashes here.

File details

Details for the file ms1searchpy-2.1.4-py3-none-any.whl.

File metadata

  • Download URL: ms1searchpy-2.1.4-py3-none-any.whl
  • Upload date:
  • Size: 13.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for ms1searchpy-2.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b4a07d00f7d6986d8051c943f4f894af7572d6014fb3a1e08c0e2c32c231d6e8
MD5 8a1d474a2f680c9b5f330084b13e7d3a
BLAKE2b-256 92e77f1bc2593a23c43d99a41084b15fc8f00901695b1d0dcba60f0bb2dabae5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page