Skip to main content

Targeted PRM glycomics analysis from mzML data

Project description

glycanPRMQuant

glycanPRMQuant is a Python package for targeted PRM glycomics analysis from .mzML data. It extracts MS2 spectra, matches precursor ions to N-glycan compositions, generates theoretical fragments from IUPAC structures, resolves likely structures, plots chromatograms/spectra, and quantifies glycan signal by AUC.

The package can be run from a Tkinter GUI for batch processing or called programmatically from Python.

What It Does

  • Reads vendor-converted .mzML files with pyteomics.
  • Matches MS1 precursor m/z values against glycan compositions.
  • Calculates precursor neutral masses from the bundled N_glycan_db.csv using glypy, grouped once per Composition.
  • Generates theoretical MS2 fragments from each candidate Condensed IUPAC structure for a matched numerical composition.
  • Scores candidate IUPAC structures and returns the most likely structure with the numerical composition.
  • Supports configurable fragment ion series, maximum cleavage count, m/z tolerances, intensity thresholds, smoothing, and AUC boundary logic.
  • Produces per-glycan MS2 CSV files, chromatograms, spectra, AUC tables, and optional Skyline transition lists.
  • Runs one file or many files in parallel.

Repository Layout

  • glycanPRMQuant/processmzML.py
    Single-file end-to-end pipeline: extraction, MS1 matching, MS2 matching, plotting, AUC, and optional Skyline export.
  • glycanPRMQuant/parallelProcess.py
    Parallel multi-file runner used by the GUI and programmatic batch workflows.
  • glycanPRMQuant/pipelineGUI.py
    Tkinter GUI for selecting input files, output folder, matching parameters, plotting options, DB overrides, and batch execution.
  • glycanPRMQuant/matchMS1.py
    Precursor matching. Uses the N-glycan database by default and calculates neutral masses from grouped IUPAC compositions.
  • glycanPRMQuant/matchMS2.py
    Fragment matching. Generates fragments from IUPAC candidates, matches observed fragments, and selects the best IUPAC structure.
  • glycanPRMQuant/fragment_structure.py
    glypy-based theoretical glycan fragmentation.
  • glycanPRMQuant/calculateAUC.py
    Peak picking, integration windows, smoothing, and AUC summarization.
  • glycanPRMQuant/plotFragmentIntensity.py and plotMS2spectrum.py
    Chromatogram and spectrum plotting utilities.
  • glycanPRMQuant/database/N_glycan_db.csv
    Default structure database with Condensed IUPAC, Composition, and Numerical Composition columns.

Installation

Install from PyPI:

python -m venv .venv

Activate the environment:

# Windows
.venv\Scripts\activate

# macOS/Linux
source .venv/bin/activate

Install:

python -m pip install --upgrade pip
pip install glycanprmquant

Check the command-line entry point and bundled database:

glycan-prmquant --help
python -c "from glycanPRMQuant.constants import DEFAULT_PRECURSOR_DB; import os; print(os.path.exists(DEFAULT_PRECURSOR_DB), DEFAULT_PRECURSOR_DB)"

The package expects Python >=3.12.

Development Install

For local development, clone the repository and install it in editable mode:

git clone https://github.com/Elquimico09/GlycanPRMQuant.git
cd GlycanPRMQuant
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

On Windows, activate the environment with:

.venv\Scripts\activate

Dependencies

Installed from pyproject.toml:

  • numpy
  • pandas
  • scipy
  • matplotlib
  • seaborn
  • statsmodels
  • scikit-learn
  • openpyxl
  • scienceplots
  • pyteomics
  • glypy
  • lxml

External requirement:

  • Input data must be in .mzML format. Convert vendor files with ProteoWizard msconvert before running the pipeline.

Development Checks

Install the development extra and run the tests:

pip install -e ".[dev]"
python -m pytest
python -m build
python -m twine check dist/*

Quick Start: GUI

Run:

glycan-prmquant gui

In the GUI:

  1. Select one or more .mzML files.
  2. Select an output folder.
  3. Optionally provide custom precursor/structure DB files. Leave blank to use the bundled N_glycan_db.csv.
  4. Set MS1/MS2 tolerances and intensity thresholds.
  5. Set fragment options:
    • Fragment ion series: any combination of A, B, C, X, Y, Z. Default: ABCXYZ.
    • Max cleavages: maximum number of cleavages used during theoretical fragmentation. Default: 2.
  6. Choose output options and run.

You can also launch the GUI as a module:

python -m glycanPRMQuant.pipelineGUI

Quick Start: Command Line

Process one file:

glycan-prmquant run path/to/sample.mzML path/to/output_dir \
  --ppm-ms1-tol 10 \
  --ppm-ms2-tol 10 \
  --mz-tol 0.02 \
  --fragment-ion-series BY \
  --fragment-max-cleavages 2

Process a folder of .mzML files:

glycan-prmquant batch \
  --input-dir path/to/mzml_folder \
  --output-root path/to/results \
  --workers 4

Process specific files:

glycan-prmquant batch \
  --input-files path/to/file1.mzML path/to/file2.mzML \
  --output-root path/to/results \
  --workers 2

Useful CLI flags:

  • --precursor-db-path and --structure-db-path override the bundled N_glycan_db.csv.
  • --skyline-transition writes Skyline transition lists.
  • --disable-smoothing disables chromatogram/AUC smoothing.
  • --quiet shows warnings/errors only.
  • -v and -vv increase logging verbosity.

Quick Start: Single File

from glycanPRMQuant.processmzML import process_mzml_pipeline

process_mzml_pipeline(
    mzml_file="path/to/sample.mzML",
    output_dir="path/to/output_dir",
    ppm_ms1_tol=10,
    mz_min=400,
    mz_max=2000,
    intensity_threshold=1e2,
    ppm_ms2_tol=10,
    mz_tol=0.02,
    fragment_ion_series="BY",
    fragment_max_cleavages=2,
)

Quick Start: Multiple Files

On Windows, keep the if __name__ == "__main__" guard for multiprocessing.

import multiprocessing
from glycanPRMQuant.parallelProcess import run_parallel_pipeline

if __name__ == "__main__":
    multiprocessing.freeze_support()
    run_parallel_pipeline(
        input_files=[
            r"path\to\file1.mzML",
            r"path\to\file2.mzML",
        ],
        output_root=r"path\to\results",
        n_workers=4,
        ppm_ms1_tol=10,
        ppm_ms2_tol=10,
        mz_tol=0.02,
        fragment_ion_series="ABCXYZ",
        fragment_max_cleavages=2,
    )

Custom Databases

By default, both MS1 and MS2 use the bundled N_glycan_db.csv.

You can override the database paths:

process_mzml_pipeline(
    mzml_file="path/to/sample.mzML",
    output_dir="path/to/output_dir",
    precursor_db_path="path/to/N_glycan_db.csv",
    structure_db_path="path/to/N_glycan_db.csv",
)

The N-glycan structure database should include:

  • Condensed IUPAC
  • Composition
  • Numerical Composition

matchMS1 groups by Composition and calculates mass once per composition. matchMS2 groups by Numerical Composition and fragments each candidate IUPAC structure for that composition.

Matching Details

MS1

matchMS1 calculates neutral masses from the first parsable IUPAC structure for each unique Composition, then generates precursor adduct m/z values:

  • 2H
  • 3H
  • 4H
  • H+NH4
  • 2NH4

The output includes:

  • precursor_mz
  • Glycan using the numerical composition ID when available
  • Adduct
  • database_mz
  • ppm_error

MS2

matchMS2 uses the matched numerical composition to find all candidate IUPAC structures, generates theoretical fragments, and matches observed fragments by m/z tolerance. It scores candidate structures by:

  1. Total matched fragment count
  2. Unique matched fragment count
  3. Total matched fragment intensity
  4. Mean absolute ppm error

The returned rows are restricted to the selected best-scoring IUPAC and include:

  • Glycan
  • NumericalComposition
  • Composition
  • IUPAC
  • Fragment
  • FragmentType
  • fragment_mz
  • fragment_intensity
  • Charge
  • Adduct
  • IUPAC_match_count
  • IUPAC_unique_fragments
  • IUPAC_total_intensity

Important Parameters

  • ppm_ms1_tol: precursor matching tolerance in ppm.
  • mz_min, mz_max: precursor m/z search range.
  • mz_offset: offset applied to calculated precursor adduct m/z values.
  • mass_offset: offset applied to neutral masses before precursor adduct calculation.
  • intensity_threshold: minimum MS2 fragment intensity used during extraction and matching.
  • ppm_ms2_tol: tolerance used to associate MS2 scans with matched precursors.
  • mz_tol: fragment m/z tolerance in Da.
  • fragment_ion_series: allowed theoretical fragment ion series. Use any combination of A, B, C, X, Y, Z.
  • fragment_max_cleavages: maximum number of cleavages during theoretical fragmentation.
  • smoothing_window: smoothing strength/window for chromatograms and AUC.
  • smoothing_method: gaussian or savgol.
  • rel_height: AUC boundary relative height.
  • rel_height_mode: prominence or height.
  • skyline_transition: write a Skyline transition list when True.

Outputs

Each sample output directory can include:

  • ms1_results.csv
    Matched precursor assignments.
  • ms2_<glycan>.csv
    Matched MS2 rows for a numerical glycan composition, including selected IUPAC structure information.
  • <sample>_auc_values.csv
    Glycan-level total AUC.
  • <sample>_auc_values_by_adduct.csv
    Per-adduct AUC values.
  • <sample>_skyline_transitions.xlsx
    Optional Skyline transition export.
  • images/*.pdf
    Fragment chromatograms, precursor-adduct chromatograms, total chromatograms, shaded AUC plots, and averaged MS2 spectra.

For multi-file runs:

  • combined_auc_values.csv is written at the output root when more than one file is processed.

Notes For Packaging

Default database paths are resolved through glycanPRMQuant.resources, which supports both source-tree execution and PyInstaller-style bundled resources. When building an executable, include glycanPRMQuant/database/ as bundled data.

Data Availability

Development and benchmarking data are available through MassIVE: MSV000101208.

The package is archived on Zenodo: DOI

License

License: MIT Python 3.12+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glycanprmquant-1.2.0.tar.gz (84.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

glycanprmquant-1.2.0-py3-none-any.whl (87.3 kB view details)

Uploaded Python 3

File details

Details for the file glycanprmquant-1.2.0.tar.gz.

File metadata

  • Download URL: glycanprmquant-1.2.0.tar.gz
  • Upload date:
  • Size: 84.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for glycanprmquant-1.2.0.tar.gz
Algorithm Hash digest
SHA256 bb49bd4e225750d27358b138e05256c2daf38345cad37bd3dd4e7dedc6f273ff
MD5 2c00a1ded57f148dff3bfdf0d9bf0d3a
BLAKE2b-256 40caf499f8a52c338d2b92e4cd8c81f34f20da2132f99b6b4248d6f9129b8eaf

See more details on using hashes here.

File details

Details for the file glycanprmquant-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: glycanprmquant-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 87.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for glycanprmquant-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1285bd114051ae2390d30e7aba8859315ebd6c55b1a05bd707a17c8535d5fb0
MD5 c8ff61ce668b834593b639ee078618a6
BLAKE2b-256 d3f28fc6572f6611f605df3d029176224dbbddf28e190afbd6e27c4b16447f88

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page