Skip to main content

Python package for quality control of proteomics datasets, based on multiqc package

Project description

pmultiqc

Python application Upload Python Package

A library for proteomics QC report based on MultiQC framework. The library generates a QC report for the proteomicsLFQ pipeline. The library read the input of the proteomicsLFQ pipeline, with the following structure:

  • consensus_ids : Identification results from ConsesusId tool in OpenMS
  • dbs : Database used for the peptide/protein identification step.
  • ids : Identification results from each search engine.
  • logs : Log files for each independent step
  • pipeline_info : Pipeline info.
  • proteomics_lfq : Final results of the pipeline
    • out.consensusXML : Feature map output of OpenMS including non-id features.
    • out.mzTab : mzTab with results of the identification
    • out_msstats.csv : Input of MSstats software
    • out_triqler.tsv : Input of Triqler software
  • raw_ids : Identification results from search + percolator

Usage

multiqc --exp_design/sdrf {expdesign_file/sdrf file} --mzMLs {mzMLs file dir} --raw_ids {raw identification dir} {proteomicslfq result dir} -o {output dir}

example: multiqc --exp_design ./UPS1/experimental_design.tsv --mzMLs ./UPS1/shared-peptides-star-align-stricter-pep-protein-FDR/mzMLs --raw_ids ./UPS1/shared-peptides-star-align-stricter-pep-protein-FDR/raw_ids ./UPS1/shared-peptides-star-align-stricter-pep-protein-FDR/proteomics_lfq -o ./shared-peptides-star-align-stricter-pep-protein-FDR-statistics

parameters

  • --exp_design: The experimental design file path, the most entries can be derived from the sdrf file
  • --sdrf: Sample and Data Relationship Format file path
  • --raw: Keep filenames in experimental design output as raw when exp_design file is provided
  • --condition: Create conditions from provided (e.g., factor) columns when exp_design file is provided
  • --quant_method: quantification method (e.g lfq or tmt. default lfq)
  • --mzMLs: mzMLs file directory
  • --raw_ids: raw identification file dir
  • --remove_decoy: Whether to remove the decoy peptides when counting

An example report can be found in multiqc_report.html

Most of the metrics are compute based on the out.mzTab and the consensus_ids which contains the filtered peptides and protein identifications.

Metrics

General report

Results tables

Two tables are shown to the user with the first 500 peptides in the mzTab and the first 500 PSMs. This tables enable to show some of the most relevant peptide and PSMs in the experiment.

Identification Statistics

A table called Spectra Tracking summarize the Identification results by mzML file. The table capture the following numbers:

  • MS1_num: Number of MS1 in the mzML
  • MS2_num: Number of MS2 in the mzML
  • MSGF: Number of Peptides identified using the MSGF+ search engine
  • Comet: Number of Peptides identified using the Comet search engine
  • Final result of Spectra: Final number of PSMs reported in the mzTab?
  • Final result of Peptides: Final number of Peptides identified in the mzTab

Peak Intensity Distribution

The Peak Intensity Distribution aims to show the Peak instensity in the MS2 spectra for all the experiment but also for the identified spectra. The plot split the intesity in chunks of 0-10, 10-100, 100-300, ... 6k-10k, >10k.

This is a histogram representing the ion intensity vs. the frequency for all MS2 spectra in a whole given experiment. It is possible to filter the information for all, identified and unidentified spectra. This plot can give a general estimation of the noise level of the spectra. Generally, one should expect to have a high number of low intensity noise peaks with a low number of high intensity signal peaks. A disproportionate number of high signal peaks may indicate heavy spectrum pre-filtering or potential experimental problems. In the case of data reuse this plot can be useful in identifying the requirement for pre-processing of the spectra prior to any downstream analysis. The quality of the identifications is not linked to this data as most search engines perform internal spectrum pre-processing before matching the spectra. Thus, the spectra reported are not necessarily pre-processed since the search engine may have applied the pre-processing step internally. This pre-processing is not necessarily reported in the experimental metadata.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pmultiqc-0.0.10.tar.gz (741.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pmultiqc-0.0.10-py3-none-any.whl (749.9 kB view details)

Uploaded Python 3

File details

Details for the file pmultiqc-0.0.10.tar.gz.

File metadata

  • Download URL: pmultiqc-0.0.10.tar.gz
  • Upload date:
  • Size: 741.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for pmultiqc-0.0.10.tar.gz
Algorithm Hash digest
SHA256 2b39d7b4445c20e2eac567d24ea2b5f6224ba1d4edb9bf2b7829ac8a9696b5ee
MD5 1f54de78061f1388a0526655e2967ff3
BLAKE2b-256 33584b6940516d92de084e373829153b2c017ccd002c108cf1913c1ebabbf614

See more details on using hashes here.

File details

Details for the file pmultiqc-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: pmultiqc-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 749.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for pmultiqc-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 2035541a5efc1fce530b3b4e24e8ed24f55b2b1d8a8c74db4e6ada8c7b8e5b21
MD5 02d214c4db5b9ebfa1318ace34f8477d
BLAKE2b-256 67e0db443d0968d5f6b2fd74268c0dde1aa1b58080ac9fdf048d51a0bfdf51d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page