Skip to main content

A proteomics search engine for LC-MS1 spectra.

Project description

ms1searchpy - a DirectMS1 proteomics search engine for LC-MS1 spectra

The .tsv (or mzML) and .fasta files are required for basic operation of the script. tsv file is tab-separated text file with peptide features generated by Dinosaur software (J.Teleman et al., "Dinosaur: A Refined Open-Source Peptide MS Feature Detector", JPR 2016) or Biosaur (https://github.com/abdrakhimov1/Biosaur) from mzML file. This file can be generated by any other software for peak-picking and must contain columns 'massCalib', 'rtApex', 'charge' and 'nIsotopes' columns. For a сonvenient usage, mzML files can be used directly and the script will run an attached version of Dinosaur (installed Java is required).
For an efficient usage of retention time, user can install and use ELUDE prediction algorithm (-elude path_to_elude_binary should be used in parameters). For the most efficient usage of retention time, user can install and use DeepLC prediction algorithm (-deeplc path_to_deeplc_binary should be used in parameters).

Algorithm can be run with following command:

ms1searchpy path_to_MZML -d path_to_fasta

OR

ms1searchpy path_to_peptideFeatures -d path_to_fasta

The script output contains files: all identified proteins (filename_proteins_full.tsv), filtered proteins (filename_proteins.tsv), all matched peptide match fingerprints (filename_PFMs.tsv), all matched peptide match fingerprints with features prepared for Machnine Learning (filename_PFMs_ML.tsv) and log file with estimated mass and RT accuracies (filename_log.txt).

Citing ms1searchpy

Ivanov et al. DirectMS1: MS/MS-free identification of 1000 proteins of cellular proteomes in 5 minutes. https://doi.org/10.1021/acs.analchem.9b05095

Installation

Using the pip:

pip install ms1searchpy

Example for full installation and usage:

Convert raw files to mzML:

msconvert path_to_file.raw -o path_to_output_folder --mzML --filter "peakPicking true 1-" --filter "MS2Deisotope" --filter "zeroSamples removeExtra" --filter "threshold absolute 1 most-intense"

There are two suggested ways to install ms1searchpy with all external software (Diffacto, DeepLC) to get the maximum efficiency from ms1searchpy.

First way is suggested for Linux users: to use the Python virtual environment.

  1. “pip3 install virtualenv”
  2. “virtualenv3 --python=python3.6 /home/mark/env_ms1” . Comment: While ms1searchpy and Diffacto support all versions of Python3.6+, DeepLC works stable only with Python3.6. The name and path to virtual environment is not limited to the example above.
  3. “source /home/mark/env_ms1/bin/activate” . Comment: to activate the virtual environment. You need to activate it every time when you are going to work with ms1searchpy.
  4. “pip3 install ms1searchpy” . Comment: to install the latest ms1searchpy from PyPi.
  5. “pip3 install deeplc” . Comment: to install the latest ms1searchpy from PyPi.
  6. “pip3 install https://github.com/statisticalbiotechnology/diffacto/archive/master.zip” . Comment: to install the latest ms1searchpy from github. Note, current PyPi diffacto version is outdated and has a critical bug.
  7. “deactivate” . Comment: to deactivate virtual environment.

Examples of using ms1searchpy from virtual environment:

  1. “source /home/mark/env_ms1/bin/activate”
  2. “ms1searchpy /home/mark/test.mzML -d /home/mark/sprot_human.fasta -deeplc /home/mark/env_ms1/bin/deeplc -ad 1” . Comment: this command will run ms1searchpy with DeepLC RT prediction. “-ad 1” command creates a shuffled decoy database for FDR estimation. You should use it only once and just use the created database for other searches. Or alternative: “ms1searchpy /home/mark/test.features.tsv -d /home/mark/sprot_human_shuffled.fasta -deeplc /home/mark/env_ms1/bin/deeplc” . Comment: Instead of mzML file, a file with peptide features could be used with ms1searchpy. This file will be created automatically by ms1searchpy after the first processing of the mzML file.
  3. “ms1todiffacto -dif /home/mark/env_ms1/bin/diffacto -S1 sample1_r1.proteins.tsv sample1_r2.proteins.tsv sample1_r3.proteins.tsv -S2 sample2_r1.proteins.tsv sample2_r2.proteins.tsv sample2_r3.proteins.tsv -norm median -out diffacto_output.tsv -min_samples 3” . Comment: ms1todiffacto command is used to prepare input file for diffacto from ms1searchpy output and to automatically run diffacto.
  4. “deactivate” . Comment: to finish work with ms1searchpy.

Alternative way to install and use ms1searchpy is by using docker. This method is suggested for Windows users due to multiple difficulties of installing and using DeepLC under Windows.

  1. Install docker. For details: https://docs.docker.com/docker-for-windows/install/
  2. Open terminal. It could be done using Win+R keys combinations and typing “cmd”.
  3. “docker pull abdrakhimov1/ms1searchpy”
  4. “docker tag abdrakhimov1/ms1searchpy name_for_docker_container” . Comment: This is optional to make usage of docker image more convenient.
  5. “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container ms1searchpy data/test.mzML -d data/sprot_human.fasta -deeplc /deeplc/bin/deeplc -ad 1” . Comment: The command to run ms1searchpy using docker is similar to the general ms1searchpy using described in virtualenv section. The main difference is that the command should always start with “docker run -it -v C:\Users\mark\data_folder:/data name_for_docker_container”. The path to the data_folder allows docker to use data from the Windows system inside the docker container. Note, that DeepLC is already installed in the docker container and the default path (/deeplc/bin/deeplc) should be used. Note, the command contains two types of slashes “/” and “\”.

Dependencies

  • pyteomics
  • numpy
  • scipy
  • sklearn
  • lightgbm
  • pandas
  • biosaur

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms1searchpy-2.1.5.tar.gz (25.7 kB view details)

Uploaded Source

Built Distribution

ms1searchpy-2.1.5-py3-none-any.whl (13.7 MB view details)

Uploaded Python 3

File details

Details for the file ms1searchpy-2.1.5.tar.gz.

File metadata

  • Download URL: ms1searchpy-2.1.5.tar.gz
  • Upload date:
  • Size: 25.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for ms1searchpy-2.1.5.tar.gz
Algorithm Hash digest
SHA256 b9b0bb073c0f5cf49222d929a6fa6fa1b90086b7892b501b879e067549a0ecd0
MD5 9bdac06e301efb41b31e4929ba8768dd
BLAKE2b-256 54dc16e45967e7088fac427a06c43e4d0d6294a896a39085ec4232191fadd746

See more details on using hashes here.

File details

Details for the file ms1searchpy-2.1.5-py3-none-any.whl.

File metadata

  • Download URL: ms1searchpy-2.1.5-py3-none-any.whl
  • Upload date:
  • Size: 13.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.9.1

File hashes

Hashes for ms1searchpy-2.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 9e7166601596166b5d9c13cfeafa58b140a811177b7d88071627e25048285d9a
MD5 8f7a7f07dc18503877eef66729dc5f15
BLAKE2b-256 a76e35069c274d7b58c149b91aeaa4533151148c347500b8ba9c17e0d4b17ee6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page