Targeted PRM glycomics analysis from mzML data
Project description
glycanPRMQuant
glycanPRMQuant is a Python package for targeted PRM glycomics analysis from
.mzML data. It extracts MS2 spectra, matches precursor ions to N-glycan
compositions, generates theoretical fragments from IUPAC structures, resolves
likely structures, plots chromatograms/spectra, and quantifies glycan signal by
AUC.
The package can be run from a Tkinter GUI for batch processing or called programmatically from Python.
What It Does
- Reads vendor-converted
.mzMLfiles withpyteomics. - Matches MS1 precursor m/z values against glycan compositions.
- Calculates precursor neutral masses from the bundled
N_glycan_db.csvusingglypy, grouped once perComposition. - Generates theoretical MS2 fragments from each candidate
Condensed IUPACstructure for a matched numerical composition. - Scores candidate IUPAC structures and returns the most likely structure with the numerical composition.
- Supports configurable fragment ion series, maximum cleavage count, m/z tolerances, intensity thresholds, smoothing, and AUC boundary logic.
- Produces per-glycan MS2 CSV files, chromatograms, spectra, AUC tables, and optional Skyline transition lists.
- Runs one file or many files in parallel.
Repository Layout
glycanPRMQuant/processmzML.py
Single-file end-to-end pipeline: extraction, MS1 matching, MS2 matching, plotting, AUC, and optional Skyline export.glycanPRMQuant/parallelProcess.py
Parallel multi-file runner used by the GUI and programmatic batch workflows.glycanPRMQuant/pipelineGUI.py
Tkinter GUI for selecting input files, output folder, matching parameters, plotting options, DB overrides, and batch execution.glycanPRMQuant/matchMS1.py
Precursor matching. Uses the N-glycan database by default and calculates neutral masses from grouped IUPAC compositions.glycanPRMQuant/matchMS2.py
Fragment matching. Generates fragments from IUPAC candidates, matches observed fragments, and selects the best IUPAC structure.glycanPRMQuant/fragment_structure.py
glypy-based theoretical glycan fragmentation.glycanPRMQuant/calculateAUC.py
Peak picking, integration windows, smoothing, and AUC summarization.glycanPRMQuant/plotFragmentIntensity.pyandplotMS2spectrum.py
Chromatogram and spectrum plotting utilities.glycanPRMQuant/database/N_glycan_db.csv
Default structure database withCondensed IUPAC,Composition, andNumerical Compositioncolumns.
Installation
Install from PyPI:
python -m venv .venv
Activate the environment:
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate
Install:
python -m pip install --upgrade pip
pip install glycanprmquant
Check the command-line entry point and bundled database:
glycan-prmquant --help
python -c "from glycanPRMQuant.constants import DEFAULT_PRECURSOR_DB; import os; print(os.path.exists(DEFAULT_PRECURSOR_DB), DEFAULT_PRECURSOR_DB)"
The package expects Python >=3.12.
Development Install
For local development, clone the repository and install it in editable mode:
git clone https://github.com/Elquimico09/GlycanPRMQuant.git
cd GlycanPRMQuant
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
On Windows, activate the environment with:
.venv\Scripts\activate
Dependencies
Installed from pyproject.toml:
numpypandasscipymatplotlibseabornstatsmodelsscikit-learnopenpyxlscienceplotspyteomicsglypylxml
External requirement:
- Input data must be in
.mzMLformat. Convert vendor files with ProteoWizardmsconvertbefore running the pipeline.
Development Checks
Install the development extra and run the tests:
pip install -e ".[dev]"
python -m pytest
python -m build
python -m twine check dist/*
Quick Start: GUI
Run:
glycan-prmquant gui
In the GUI:
- Select one or more
.mzMLfiles. - Select an output folder.
- Optionally provide custom precursor/structure DB files. Leave blank to use
the bundled
N_glycan_db.csv. - Set MS1/MS2 tolerances and intensity thresholds.
- Set fragment options:
Fragment ion series: any combination ofA,B,C,X,Y,Z. Default:ABCXYZ.Max cleavages: maximum number of cleavages used during theoretical fragmentation. Default:2.
- Choose output options and run.
You can also launch the GUI as a module:
python -m glycanPRMQuant.pipelineGUI
Quick Start: Command Line
Process one file:
glycan-prmquant run path/to/sample.mzML path/to/output_dir \
--ppm-ms1-tol 10 \
--ppm-ms2-tol 10 \
--mz-tol 0.02 \
--fragment-ion-series BY \
--fragment-max-cleavages 2
Process a folder of .mzML files:
glycan-prmquant batch \
--input-dir path/to/mzml_folder \
--output-root path/to/results \
--workers 4
Process specific files:
glycan-prmquant batch \
--input-files path/to/file1.mzML path/to/file2.mzML \
--output-root path/to/results \
--workers 2
Useful CLI flags:
--precursor-db-pathand--structure-db-pathoverride the bundledN_glycan_db.csv.--skyline-transitionwrites Skyline transition lists.--disable-smoothingdisables chromatogram/AUC smoothing.--quietshows warnings/errors only.-vand-vvincrease logging verbosity.
Quick Start: Single File
from glycanPRMQuant.processmzML import process_mzml_pipeline
process_mzml_pipeline(
mzml_file="path/to/sample.mzML",
output_dir="path/to/output_dir",
ppm_ms1_tol=10,
mz_min=400,
mz_max=2000,
intensity_threshold=1e2,
ppm_ms2_tol=10,
mz_tol=0.02,
fragment_ion_series="BY",
fragment_max_cleavages=2,
)
Quick Start: Multiple Files
On Windows, keep the if __name__ == "__main__" guard for multiprocessing.
import multiprocessing
from glycanPRMQuant.parallelProcess import run_parallel_pipeline
if __name__ == "__main__":
multiprocessing.freeze_support()
run_parallel_pipeline(
input_files=[
r"path\to\file1.mzML",
r"path\to\file2.mzML",
],
output_root=r"path\to\results",
n_workers=4,
ppm_ms1_tol=10,
ppm_ms2_tol=10,
mz_tol=0.02,
fragment_ion_series="ABCXYZ",
fragment_max_cleavages=2,
)
Custom Databases
By default, both MS1 and MS2 use the bundled N_glycan_db.csv.
You can override the database paths:
process_mzml_pipeline(
mzml_file="path/to/sample.mzML",
output_dir="path/to/output_dir",
precursor_db_path="path/to/N_glycan_db.csv",
structure_db_path="path/to/N_glycan_db.csv",
)
The N-glycan structure database should include:
Condensed IUPACCompositionNumerical Composition
matchMS1 groups by Composition and calculates mass once per composition.
matchMS2 groups by Numerical Composition and fragments each candidate IUPAC
structure for that composition.
Matching Details
MS1
matchMS1 calculates neutral masses from the first parsable IUPAC structure for
each unique Composition, then generates precursor adduct m/z values:
2H3H4HH+NH42NH4
The output includes:
precursor_mzGlycanusing the numerical composition ID when availableAdductdatabase_mzppm_error
MS2
matchMS2 uses the matched numerical composition to find all candidate IUPAC
structures, generates theoretical fragments, and matches observed fragments by
m/z tolerance. It scores candidate structures by:
- Total matched fragment count
- Unique matched fragment count
- Total matched fragment intensity
- Mean absolute ppm error
The returned rows are restricted to the selected best-scoring IUPAC and include:
GlycanNumericalCompositionCompositionIUPACFragmentFragmentTypefragment_mzfragment_intensityChargeAdductIUPAC_match_countIUPAC_unique_fragmentsIUPAC_total_intensity
Important Parameters
ppm_ms1_tol: precursor matching tolerance in ppm.mz_min,mz_max: precursor m/z search range.mz_offset: offset applied to calculated precursor adduct m/z values.mass_offset: offset applied to neutral masses before precursor adduct calculation.intensity_threshold: minimum MS2 fragment intensity used during extraction and matching.ppm_ms2_tol: tolerance used to associate MS2 scans with matched precursors.mz_tol: fragment m/z tolerance in Da.fragment_ion_series: allowed theoretical fragment ion series. Use any combination ofA,B,C,X,Y,Z.fragment_max_cleavages: maximum number of cleavages during theoretical fragmentation.smoothing_window: smoothing strength/window for chromatograms and AUC.smoothing_method:gaussianorsavgol.rel_height: AUC boundary relative height.rel_height_mode:prominenceorheight.skyline_transition: write a Skyline transition list whenTrue.
Outputs
Each sample output directory can include:
ms1_results.csv
Matched precursor assignments.ms2_<glycan>.csv
Matched MS2 rows for a numerical glycan composition, including selected IUPAC structure information.<sample>_auc_values.csv
Glycan-level total AUC.<sample>_auc_values_by_adduct.csv
Per-adduct AUC values.<sample>_skyline_transitions.xlsx
Optional Skyline transition export.images/*.pdf
Fragment chromatograms, precursor-adduct chromatograms, total chromatograms, shaded AUC plots, and averaged MS2 spectra.
For multi-file runs:
combined_auc_values.csvis written at the output root when more than one file is processed.
Notes For Packaging
Default database paths are resolved through glycanPRMQuant.resources, which
supports both source-tree execution and PyInstaller-style bundled resources.
When building an executable, include glycanPRMQuant/database/ as bundled data.
Data Availability
Development and benchmarking data are available through MassIVE: MSV000101208.
The package is archived on Zenodo:
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file glycanprmquant-1.2.0.tar.gz.
File metadata
- Download URL: glycanprmquant-1.2.0.tar.gz
- Upload date:
- Size: 84.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb49bd4e225750d27358b138e05256c2daf38345cad37bd3dd4e7dedc6f273ff
|
|
| MD5 |
2c00a1ded57f148dff3bfdf0d9bf0d3a
|
|
| BLAKE2b-256 |
40caf499f8a52c338d2b92e4cd8c81f34f20da2132f99b6b4248d6f9129b8eaf
|
File details
Details for the file glycanprmquant-1.2.0-py3-none-any.whl.
File metadata
- Download URL: glycanprmquant-1.2.0-py3-none-any.whl
- Upload date:
- Size: 87.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1285bd114051ae2390d30e7aba8859315ebd6c55b1a05bd707a17c8535d5fb0
|
|
| MD5 |
c8ff61ce668b834593b639ee078618a6
|
|
| BLAKE2b-256 |
d3f28fc6572f6611f605df3d029176224dbbddf28e190afbd6e27c4b16447f88
|