Skip to main content

Python scripts and helpers for the quantMS workflow

Project description

quantms-utils

Python application Python package Codacy Badge PyPI version License: MIT

Python package with scripts and functions for the quantms workflow for the analysis of quantitative proteomics data.

The package is available on PyPI: quantms-utils

pip install quantms-utils

Available Scripts

The following functionalities are available in the package:

Diann scripts

  • dianncfg - Create a configuration file for Diann including enzymes, modifications, and other parameters.
  • diann2msstats - Convert DIA-NN output to MSstats format. The output is used for quality control and downstream analysis in quantms.

SDRF scripts

  • openms2sample - Extra sample information from OpenMS experimental design file. An example of OpenMS experimental design file is available here.
  • checksamplesheet - Check the sample sheet for errors and inconsistencies. The experimental design coult be an OpenMS experimental design file or and SDRF file.

Other scripts

  • psmconvert - The convert_psm function converts peptide spectrum matches (PSMs) from an idXML file to a parquet file, optionally filtering out decoy matches. It extracts and processes data from both the idXML and an associated spectra file, handling multiple search engines and scoring systems.
  • mzmlstats - The mzmlstats processes .mzML mass spectrometry data files to extract and compile statistics about the spectra. It supports generating detailed parquet files with spectrum metadata and MS2 peak data.

mzml statistics

quantms-utils have multiple scripts to generate mzML stats. These files are used by multiple tools and packages within quantms ecosystem for quality control, mzTab generation, etc. Here are some details about the formats, the fields they contain and gow they are computed.

MS info and details

mzmlstats allows the user to produce a file containing all features for every signal in the MS/MS experiment. The produced file is a parquet file, with the original name of the file plus the following postfix {file_name}_ms_info.parquet. Here, the definition of each column and how they are estimated and used:

  • scan: The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g controllerType=0 controllerNumber=1 scan=43920). We tried to find the definition of quantms.io.
  • ms_level: The MS level of the signal, 1 for MS and 2 for MS/MS.
  • num_peaks: The number of peaks in the MS. Compute with pyopenms with spectrum.get_peaks().
  • base_peak_intensity: The max intensity in the spectrum (MS or MS/MS).
  • summed_peak_intensities: The sum of all intensities in the spectrum (MS or MS/MS).
  • rt: The retention time of the spectrum, capture with pyopenms with spectrum.getRT().

For MS/MS signals, we have the following additional columns:

  • precursor_charge: The charge of the precursor ion, if the signal is MS/MS. Capture with pyopenms with spectrum.getPrecursors()[0].getCharge().
  • precursor_mz: The m/z of the precursor ion, if the signal is MS/MS. Capture with pyopenms with spectrum.getPrecursors()[0].getMZ().
  • precursor_intensity: The intensity of the precursor ion, if the signal is MS/MS. Capture with pyopenms with spectrum.getPrecursors()[0].getIntensity(). If the precursor is not annotated (present), we use the purity object to get the information; see note below.
  • precursor_rt: The retention time of the precursor ion, if the signal is MS/MS. See note below.
  • precursor_total_intensity: The total intensity of the precursor ion, if the signal is MS/MS. See note below.

NOTE: For all the precursor-related information, we are using the first precursor in the spectrum. The following columns intensity (if not annotated), precursor_rt, and precursor_total_intensity we use the following pyopnems code:

precursor_spectrum = mzml_exp.getSpectrum(precursor_spectrum_index)
precursor_rt = precursor_spectrum.getRT()
purity = oms.PrecursorPurity().computePrecursorPurity(precursor_spectrum, precursor, 100, True)
precursor_intensity = purity.target_intensity
total_intensity = purity.total_intensity
MS2 info and details

mzmlstats allows the user to produce a file containing all the MS2 spectra including the intesities and masses of every peak. The produced file is a parquet file, with the original name of the file plus the following postfix {file_name}_ms2_info.parquet. Here, the definition of each column and how they are estimated and used:

  • scan: The scan accession for each MS and MS/MS signal in the mzML, depending on the manufacturer, the scan will have different formats. Example, for thermo (e.g controllerType=0 controllerNumber=1 scan=43920). We tried to find the definition of quantms.io.
  • ms_level: The MS level of the signal, all of them will be 2.
  • mz_array: The m/z array of the peaks in the MS/MS signal. Capture with pyopenms with mz_array, intensity_array = spectrum.get_peaks().
  • intensity_array: The intensity array of the peaks in the MS/MS signal. Capture with pyopenms with mz_array, intensity_array = spectrum.get_peaks().
MS1 Feature Maps

We use the FeatureFinderMultiplexAlgorithm from OpenMS to extract the features from the MS1 spectra. We use an algorithm based on the original implementation by Andy Lin. The output of this algorithm is a feature map, which contains the following information:

  • feature_mz: The m/z of the feature.
  • feature_rt: The retention time of the feature.
  • feature_intensity: The intensity of the feature.
  • feature_charge: The charge of the feature.
  • feature_quality: The quality of the feature.
  • feature_percentile_tic: The percentile of the feature in the total ion current.
  • feature_id: The unique identifier of the feature generated by OpenMS.
  • feature_min_rt: The minimum retention time of the feature within the feature map.
  • feature_min_mz: The minimum m/z of the feature within the feature map.
  • feature_max_rt: The maximum retention time of the feature within the feature map.
  • feature_max_mz: The maximum m/z of the feature within the feature map.
  • feature_num_scans: The number of scans that the feature is present in the feature map.
  • feature_scans: The scans where the feature is present in the feature map.

The tool will generate a gzip compressed parquet file with the extension {file_name}_ms1_feature_info.parquet.

Contributions and issues

Contributions and issues are welcome. Please, open an issue in the GitHub repository or PR in the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantms_utils-0.0.28.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quantms_utils-0.0.28-py2.py3-none-any.whl (26.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file quantms_utils-0.0.28.tar.gz.

File metadata

  • Download URL: quantms_utils-0.0.28.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for quantms_utils-0.0.28.tar.gz
Algorithm Hash digest
SHA256 e81f40c8dbfeedf427bbf9d71cb2f4b6440fe06ba294909d1f35d281620ab586
MD5 cbdb08e3597c4f0c98cd36d4656f75a9
BLAKE2b-256 b9cd7992851bbfe6727e66fb82057d43cef2ded8784a92b27a65ce9334b1e060

See more details on using hashes here.

File details

Details for the file quantms_utils-0.0.28-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for quantms_utils-0.0.28-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 5c0fffbbee9834f1e350b2494d6c5d9a1803866d98dbfe95bd8a373a30cfa1ae
MD5 94aa6c99d219d6473c342f30d2518616
BLAKE2b-256 80adfe6f7f9c8db95ea24f723a64cf6de2f990eaa285c903056eb3c243aebd4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page