A self-denoising and machine-learning based statistical generalized peakgroup scoring algorithm for SWATH-MS data
Project description
GPS: Machine learning based generalized peakgroup scoring
This is a python package for scoring SWATH-MS data using static generalizable machine learning models trained on large curated datasets. Current support is for OpenSwath, but could be expanded to other tools quite easily.
Installation
The recommended way to install GPS is into a virtual environment to make sure that all dependencies work correctly.
This can be done using your method of choice. The following demonstration will be using miniconda
conda create -n gpsenv -c conda-forge python=3.10 pip
conda activate gpsenv
With your environment activated, you can then install via pip
pip install gps-ms
GPS is now installed and ready to use!
Usage
Scoring individual files
GPS is very easy to use. To get started scoring a processed file, you simple run the score command:
gps score --input extracted_peakgroups.osw --output extracted_peakgroups.scored.tsv
This command will take in output from OpenSwath, score the extracted peakgroups, and write a tsv file with the q-values, scores, probabilities, etc.
To increase the number of identifications at a particular q-value cutoff, you can enable PIT estimation and correction. This will use a novel denoising algorithm to estimate the false target probability distribution of the target labels, and weight the decoys during q-value calculation.
gps score --input extracted_peakgroups.osw --output extracted_peakgroups.scored.tsv --estimate-pit
You can also make use of multiple cores using the --threads
option
gps score --input extracted_peakgroups.osw --output extracted_peakgroups.scored.tsv --estimate-pit --threads 10
Controlling global peptide and protein FDR
Once all individual files are scored using GPS, it is very straightforward to build models to control the global levels of peptides and proteins in the analysis.
You can specify the level of the model to build using the --level
cli option
gps build --level peptide --input *.scored.tsv --output peptide.model --estimate-pit
The above command will take all scored files in at once using wildcard command line options, build a peptide level model, and estimate the global PIT for q-value correction
To build a protein level model, you only need to change the --level
option.
gps build --level protein --input *.scored.tsv --output protein.model --estimate-pit
Combining scored files into a quantitative matrix for downstream analysis
Once all files have been scored and the global models have been built, GPS can combined all files into a quantiative matrix for convinient downstream analysis.
gps combine --input-files *.scored.tsv --peptide-model peptide.model --protein-model protein.model --output quantitative_matrix.tsv --max-peakgroup-q-value 0.01
The maximum q-value for the included precursors can be indicated if you would like to be more, or less, lenient on the identifications that you include in your final quantitative_matrix
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gps-ms-0.0.1a0.tar.gz
.
File metadata
- Download URL: gps-ms-0.0.1a0.tar.gz
- Upload date:
- Size: 263.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25b593767c1b7a3816764c995cc580ab752c565cd50c1a02601e55c2b1cc1cc8 |
|
MD5 | a7c89bdcc1789287b051cfda3fdcdec3 |
|
BLAKE2b-256 | 5c70256f9818f4d3a03743ed0fec3baae4f48a08f96063baada402e8fe8d4005 |
File details
Details for the file gps_ms-0.0.1a0-py3-none-any.whl
.
File metadata
- Download URL: gps_ms-0.0.1a0-py3-none-any.whl
- Upload date:
- Size: 277.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63f41a53ff18b96fb7300a461532b6a904c5eb43e7a44d9b4db7066f9d198f47 |
|
MD5 | 3fdbd0721de136f535c76f31748028ec |
|
BLAKE2b-256 | 93d332b4be65a21c75d00b3c56ae1d8bbada5bd5aeb24839db2a989e619592d4 |