Skip to main content

Proteomics post-search algorithm

Project description

Scavager - a proteomics post-search validation tool

The pepXML or MzIdentML files are required for basic operation of the script. Currently supported search engines: IdentiPy, X!Tandem, Comet, MSFragger, MSGF+, Morpheus.

FASTA file is required for calculation of NSAF (label-free quantitation index), protein sequence coverage and amino acid statistics.

For MSGF+ and Morpheus search engines it is desirable to provide cleavage rules used in search (these search engines do not report number of missed cleavages for peptides).

The output of Scavager contains:

  • tab-separated table with unfiltered peptide-spectrum matches (ends with _PSMs_full.tsv)

  • tab-separated table with identified peptide-spectrum matches at 1% PSM FDR (ends with _PSMs.tsv)

  • tab-separated table with identified peptides at 1% peptide FDR (ends with _peptides.tsv)

  • tab-separated table with identified proteins without grouping at 1% protein FDR (ends with _proteins.tsv)

  • tab-separated table with identified protein groups at 1% protein FDR (ends with _protein_groups.tsv)

  • PNG figure with PSM, peptide and protein features distributions

Citing Scavager

Ivanov et al. Scavager: A Versatile Postsearch Validation Algorithm for Shotgun Proteomics Based on Gradient Boosting. doi: 10.1002/pmic.201800280

Installation

Using pip:

pip install Scavager

Usage

Algorithm can be run with following command (works with Python2.7/Python3+):

scavager path_to_pepXML/MZID

OR

scavager -h

Protein grouping using DirectMS1 results

Protein groups can be generated using parsimony principle combined with information from MS1 spectra:

scavager path_to_pepXML/MZID -ms1 path_to_DirectMS1_proteins_full_noexclusion.tsv

Details on combination of parsimony principle and MS1 information are available at: https://github.com/markmipt/protein_inference_using_DirectMS1

Protein grouping for indistinguishable proteins

By default, when multiple proteins have the same sets of peptides, the Scavager choose protein group leader using alphabetical order. However, it is possible to choose group leader randomly by using “-sr” option. The same option can be used with MS1 spectra information if multiple proteins have both same sets of MS/MS identifications and DirectMS1 scores.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scavager-0.2.17.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scavager-0.2.17-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file scavager-0.2.17.tar.gz.

File metadata

  • Download URL: scavager-0.2.17.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for scavager-0.2.17.tar.gz
Algorithm Hash digest
SHA256 1c03a73e150705bb36950716595b043bb5f99348c5ef7e6d40750c10fcf2f798
MD5 71645c6086054a76f55c0719a7bb9e0d
BLAKE2b-256 69964f61a9bbc1b7922dc77d90413cc7c9feb40522e5c5607e0294d034381e79

See more details on using hashes here.

File details

Details for the file scavager-0.2.17-py3-none-any.whl.

File metadata

  • Download URL: scavager-0.2.17-py3-none-any.whl
  • Upload date:
  • Size: 28.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for scavager-0.2.17-py3-none-any.whl
Algorithm Hash digest
SHA256 f397d83ed0a10e3bb5883900a30cc3a4910a32df05466699c23df3e3b8b868ce
MD5 16f5f8f1e676f87f36463242aa5bc1cb
BLAKE2b-256 ac3a61f6bd5ea85bea690cbb9a3f22a5fbe4899e180ece750e345b00da4bf7f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page