Skip to main content

A self-denoising and machine-learning based statistical generalized peakgroup scoring algorithm for SWATH-MS data

Project description

GPS: Machine learning based generalized peakgroup scoring

This is a python package for scoring SWATH-MS data using static generalizable machine learning models trained on large curated datasets. Current support is for OpenSwath, but could be expanded to other tools quite easily.

Installation

The recommended way to install GPS is into a virtual environment to make sure that all dependencies work correctly.

This can be done using your method of choice. The following demonstration will be using miniconda

conda create -n gpsenv -c conda-forge python=3.10 pip

conda activate gpsenv

With your environment activated, you can then install via pip

pip install gps-ms

GPS is now installed and ready to use!

Usage

Scoring individual files

GPS is very easy to use. To get started scoring a processed file, you simple run the score command:

gps score --input extracted_peakgroups.osw --output extracted_peakgroups.scored.tsv

This command will take in output from OpenSwath, score the extracted peakgroups, and write a tsv file with the q-values, scores, probabilities, etc.

To increase the number of identifications at a particular q-value cutoff, you can enable PIT estimation and correction. This will use a novel denoising algorithm to estimate the false target probability distribution of the target labels, and weight the decoys during q-value calculation.

gps score --input extracted_peakgroups.osw --output extracted_peakgroups.scored.tsv --estimate-pit

You can also make use of multiple cores using the --threads option

gps score --input extracted_peakgroups.osw --output extracted_peakgroups.scored.tsv --estimate-pit --threads 10

Controlling global peptide and protein FDR

Once all individual files are scored using GPS, it is very straightforward to build models to control the global levels of peptides and proteins in the analysis.

You can specify the level of the model to build using the --level cli option

gps build --level peptide --input *.scored.tsv --output peptide.model --estimate-pit

The above command will take all scored files in at once using wildcard command line options, build a peptide level model, and estimate the global PIT for q-value correction

To build a protein level model, you only need to change the --level option.

gps build --level protein --input *.scored.tsv --output protein.model --estimate-pit

Combining scored files into a quantitative matrix for downstream analysis

Once all files have been scored and the global models have been built, GPS can combined all files into a quantiative matrix for convinient downstream analysis.

gps combine --input-files *.scored.tsv --peptide-model peptide.model --protein-model protein.model --output quantitative_matrix.tsv --max-peakgroup-q-value 0.01

The maximum q-value for the included precursors can be indicated if you would like to be more, or less, lenient on the identifications that you include in your final quantitative_matrix

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gps-ms-0.0.1a0.tar.gz (263.9 kB view details)

Uploaded Source

Built Distribution

gps_ms-0.0.1a0-py3-none-any.whl (277.0 kB view details)

Uploaded Python 3

File details

Details for the file gps-ms-0.0.1a0.tar.gz.

File metadata

  • Download URL: gps-ms-0.0.1a0.tar.gz
  • Upload date:
  • Size: 263.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for gps-ms-0.0.1a0.tar.gz
Algorithm Hash digest
SHA256 25b593767c1b7a3816764c995cc580ab752c565cd50c1a02601e55c2b1cc1cc8
MD5 a7c89bdcc1789287b051cfda3fdcdec3
BLAKE2b-256 5c70256f9818f4d3a03743ed0fec3baae4f48a08f96063baada402e8fe8d4005

See more details on using hashes here.

File details

Details for the file gps_ms-0.0.1a0-py3-none-any.whl.

File metadata

  • Download URL: gps_ms-0.0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 277.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for gps_ms-0.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 63f41a53ff18b96fb7300a461532b6a904c5eb43e7a44d9b4db7066f9d198f47
MD5 3fdbd0721de136f535c76f31748028ec
BLAKE2b-256 93d332b4be65a21c75d00b3c56ae1d8bbada5bd5aeb24839db2a989e619592d4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page