Skip to main content

Proteomics post-search algorithm

Project description

Scavager - a proteomics post-search validation tool

The pepXML or MzIdentML files are required for basic operation of the script. Currently supported search engines: IdentiPy, X!Tandem, Comet, MSFragger, MSGF+, Morpheus.

FASTA file is required for calculation of NSAF (label-free quantitation index), protein sequence coverage and amino acid statistics.

For MSGF+ and Morpheus search engines it is desirable to provide cleavage rules used in search (these search engines do not report number of missed cleavages for peptides).

The output of Scavager contains:

  • tab-separated table with unfiltered peptide-spectrum matches (ends with _PSMs_full.tsv)
  • tab-separated table with identified peptide-spectrum matches at 1% PSM FDR (ends with _PSMs.tsv)
  • tab-separated table with identified peptides at 1% peptide FDR (ends with _peptides.tsv)
  • tab-separated table with identified proteins without grouping at 1% protein FDR (ends with _proteins.tsv)
  • tab-separated table with identified protein groups at 1% protein FDR (ends with _protein_groups.tsv)
  • PNG figure with PSM, peptide and protein features distributions

Citing Scavager

Ivanov et al. Scavager: A Versatile Postsearch Validation Algorithm for Shotgun Proteomics Based on Gradient Boosting. doi: 10.1002/pmic.201800280

Installation

Using pip:

pip install Scavager

Usage

Algorithm can be run with following command (works with Python2.7/Python3+):

scavager path_to_pepXML/MZID

OR

scavager -h

Protein grouping using DirectMS1 results

Protein groups can be generated using parsimony principle combined with information from MS1 spectra:

scavager path_to_pepXML/MZID -ms1 path_to_DirectMS1_proteins_full_noexclusion.tsv

Details on combination of parsimony principle and MS1 information are available at: https://github.com/markmipt/protein_inference_using_DirectMS1

Protein grouping for indistinguishable proteins

By default, when multiple proteins have the same sets of peptides, the Scavager choose protein group leader using alphabetical order. However, it is possible to choose group leader randomly by using "-sr" option. The same option can be used with MS1 spectra information if multiple proteins have both same sets of MS/MS identifications and DirectMS1 scores.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Scavager-0.2.12.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

Scavager-0.2.12-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file Scavager-0.2.12.tar.gz.

File metadata

  • Download URL: Scavager-0.2.12.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for Scavager-0.2.12.tar.gz
Algorithm Hash digest
SHA256 817a652e601b63f4bf7f618330c3a4a5dd94a2b16f1d1b235795c68832b5859f
MD5 ebc65b605542c8bf821ef5914f27106c
BLAKE2b-256 4cda2fd75dfebb1dafb33980c8e2170433e42359d75c7a318d428e797af9df6d

See more details on using hashes here.

File details

Details for the file Scavager-0.2.12-py3-none-any.whl.

File metadata

  • Download URL: Scavager-0.2.12-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for Scavager-0.2.12-py3-none-any.whl
Algorithm Hash digest
SHA256 5097260370a5d2c26e95f1637fbe2736d23fe834b1314b3409bc43f650ae32eb
MD5 eb431a10f9aab62bccf370dddeaa1d08
BLAKE2b-256 2bd1693f583d0c5557da0efc86febc125f458e818ef059bef362d9738f9cf46f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page