Skip to main content

Package for predicting glycan structure from LC-MS/MS data

Project description

CandyCrunch

DOI License

What is CandyCrunch?

CandyCrunch is a package for predicting glycan structure from LC-MS/MS data. It contains the CandyCrunch model, along with the rest of the inference pipeline and and downstream spectrum processing tools. These are further described in our manuscript Urban et al. (2023)Predicting glycan structure from tandem mass spectrometry via deep learning on bioRxiv.

Install CandyCrunch

Development version:

pip install git+https://github.com/BojarLab/CandyCrunch.git

Development version bundled with GlycoDraw:

Note The Operating System specific installations for GlycoDraw are still required, read more in the GlycoDraw installation guide

pip install 'CandyCrunch[draw] @ git+https://github.com/Bojarlab/CandyCrunch

PyPI:

pip install CandyCrunch

CandyCrunch.ipynb Open In Colab

If you are looking for a convenient and easy-to-run version of the code that does not require any local installations, we have also created a Google Colaboratory notebook.
The notebook contains an example pipeline ready to run, which can be copied, executed, and customised in any way.
The example file included in the notebook is the same as in examples/ and is ready for use in the notebook workflow.

Using CandyCrunch – LC-MS/MS glycan annotation

wrap_inference (in CandyCrunch.prediction)

Wrapper function to predict glycan structures from raw LC-MS/MS spectra using CandyCrunch

Requires at a minimum:

  • a filepath to an mzML/mzXML file or a .xlsx file containing their extracted spectra
  • the glycan class measured ("N", "O", "lipid", "free")
annotated_spectra_df = wrap_inference(C:/myfiles/my_spectra.mzML, glycan_class)

This is what a truncated example of annotated_spectra_df would look like

predictions composition num_spectra charge RT top_fragments adduct evidence
384.157 [('Gal(b1-3)GalNAc', 0.9625)] {'Hex': 1, 'HexNAc': 1} 8 -1 6.75 [204.0202, 222.1731, 156.0888, 179.031, 160.7594, ...] nan strong
425.036 [('GalNAc(a1-3)GalNAc', 0.7947394540942927), ('GlcNAc(b1-3)GalNAc', 0.17965260545905706), ('HexNAc(?1-3)GalNAc', 0.025607940446650122)] {'HexNAc': 2} 2 -1 38.88 [381.005, 389.9802, 406.871, 326.8488, 212.01, ...] nan strong
... ... ... ... ... ... ... ... ...

Using CandyCrumbs – MS2 fragment annotation

CandyCrumbs (in CandyCrunch.analysis)

Wrapper function to annotate MS2 fragments using CandyCrumbs

Requires at a minimum:

  • a hypothesized glycan structure
  • a list of peak m/z values
  • a mass threshold
condensed_iupac_glycan = 'Gal(a1-3)Gal(b1-4)GlcNAc(b1-6)[GalNAc(b1-4)GlcNAc(b1-3)]Gal(b1-4)Glc'
ms2_fragment_masses = [425.07,443.07,546.19,1216.32]
annotated_fragments_dict = CandyCrumbs(condensed_iupac_glycan,fragment_masses=ms2_fragment_masses,mass_threshold=1)

This is what annotated_fragments_dict would look like

{425.07: {'Theoretical fragment masses': [425.12955],
  'Domon-Costello nomenclatures': [['02A_3_Alpha', 'M_H2O']],
  'Fragment charges': [-1]},
 443.07: {'Theoretical fragment masses': [443.1401],
  'Domon-Costello nomenclatures': [['02A_3_Alpha']],
  'Fragment charges': [-1]},
 546.19: {'Theoretical fragment masses': [546.18775],
  'Domon-Costello nomenclatures': [['Y_3_Beta', 'Y_2_Alpha']],
  'Fragment charges': [-1]},
 1216.32: {'Theoretical fragment masses': [1216.43105],
  'Domon-Costello nomenclatures': [['M_C2H4O2']],
  'Fragment charges': [-1]}}

It isn't always easy to quickly visualise the Domon-Costello nomenclature. Here is an example of how we can use GlycoDraw to visualise one of the outputs:

#This will calculate where on the glycan the fragments occured and return a valid GlycoDraw input
fragment_iupac = domon_costello_to_fragIUPAC('Gal(a1-3)Gal(b1-4)GlcNAc(b1-6)[GalNAc(b1-4)GlcNAc(b1-3)]Gal(b1-4)Glc',['Y_3_Beta', 'Y_2_Alpha'])

#Then we can simply draw the result with GlycoDraw
GlycoDraw(fragment_iupac)

Modules

prediction

  • Includes all functions used in wrap_inference.
  • Contains process_mzML_stack and process_mzXML_stack to extract spectra from .mzML and .mzXML files

analysis

  • Includes all functions used in CandyCrumbs.
  • Contains functions to analyze and compare averaged spectra
  • Contains other functions to manipulate glycan string representations, e.g., domon_costello_to_fragIUPAC

model

  • Includes code for model definition, dataset handling, and data augmentation; only used in the back-end

examples

  • Includes the extracted spectra of an example mzML file from Kouka et al. 2022

Citation

If you use CandyCrunch or any of our datasets in your work, please cite Urban et al., bioRxiv 2023.
The data used to train CandyCrunch can be found at Zenodo, under doi:10.5281/zenodo.7940047

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CandyCrunch-0.2.2.tar.gz (45.9 MB view hashes)

Uploaded Source

Built Distribution

CandyCrunch-0.2.2-py3-none-any.whl (45.9 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page