Package for predicting glycan structure from LC-MS/MS data
Project description
CandyCrunch
What is CandyCrunch?
CandyCrunch is a package for predicting glycan structure from LC-MS/MS data. It contains the CandyCrunch model, along with the rest of the inference pipeline and and downstream spectrum processing tools. These are further described in our manuscript Urban et al. (2023) – Predicting glycan structure from tandem mass spectrometry via deep learning on bioRxiv.
Install CandyCrunch
Development version:
pip install git+https://github.com/BojarLab/CandyCrunch.git
Development version bundled with GlycoDraw:
Note The Operating System specific installations for GlycoDraw are still required, read more in the GlycoDraw installation guide
pip install 'CandyCrunch[draw] @ git+https://github.com/Bojarlab/CandyCrunch
PyPI:
pip install CandyCrunch
CandyCrunch.ipynb
If you are looking for a convenient and easy-to-run version of the code that does not require any local installations, we have also created a Google Colaboratory notebook.
The notebook contains an example pipeline ready to run, which can be copied, executed, and customised in any way.
The example file included in the notebook is the same as in examples/
and is ready for use in the notebook workflow.
Using CandyCrunch – LC-MS/MS glycan annotation
wrap_inference
(in CandyCrunch.prediction
)
Wrapper function to predict glycan structures from raw LC-MS/MS spectra using CandyCrunch
Requires at a minimum:
- a filepath to an mzML/mzXML file or a .xlsx file containing their extracted spectra
- the glycan class measured ("N", "O", "lipid", "free")
annotated_spectra_df = wrap_inference(C:/myfiles/my_spectra.mzML, glycan_class)
This is what a truncated example of annotated_spectra_df
would look like
annotated_spectra_df
would look likepredictions | composition | num_spectra | charge | RT | top_fragments | adduct | evidence | |
---|---|---|---|---|---|---|---|---|
384.157 | [('Gal(b1-3)GalNAc', 0.9625)] | {'Hex': 1, 'HexNAc': 1} | 8 | -1 | 6.75 | [204.0202, 222.1731, 156.0888, 179.031, 160.7594, ...] | nan | strong |
425.036 | [('GalNAc(a1-3)GalNAc', 0.7947394540942927), ('GlcNAc(b1-3)GalNAc', 0.17965260545905706), ('HexNAc(?1-3)GalNAc', 0.025607940446650122)] | {'HexNAc': 2} | 2 | -1 | 38.88 | [381.005, 389.9802, 406.871, 326.8488, 212.01, ...] | nan | strong |
... | ... | ... | ... | ... | ... | ... | ... | ... |
Using CandyCrumbs – MS2 fragment annotation
CandyCrumbs
(in CandyCrunch.analysis
)
Wrapper function to annotate MS2 fragments using CandyCrumbs
Requires at a minimum:
- a hypothesized glycan structure
- a list of peak m/z values
- a mass threshold
condensed_iupac_glycan = 'Gal(a1-3)Gal(b1-4)GlcNAc(b1-6)[GalNAc(b1-4)GlcNAc(b1-3)]Gal(b1-4)Glc'
ms2_fragment_masses = [425.07,443.07,546.19,1216.32]
annotated_fragments_dict = CandyCrumbs(condensed_iupac_glycan,fragment_masses=ms2_fragment_masses,mass_threshold=1)
This is what annotated_fragments_dict
would look like
annotated_fragments_dict
would look like{425.07: {'Theoretical fragment masses': [425.12955], 'Domon-Costello nomenclatures': [['02A_3_Alpha', 'M_H2O']], 'Fragment charges': [-1]}, 443.07: {'Theoretical fragment masses': [443.1401], 'Domon-Costello nomenclatures': [['02A_3_Alpha']], 'Fragment charges': [-1]}, 546.19: {'Theoretical fragment masses': [546.18775], 'Domon-Costello nomenclatures': [['Y_3_Beta', 'Y_2_Alpha']], 'Fragment charges': [-1]}, 1216.32: {'Theoretical fragment masses': [1216.43105], 'Domon-Costello nomenclatures': [['M_C2H4O2']], 'Fragment charges': [-1]}}
It isn't always easy to quickly visualise the Domon-Costello nomenclature. Here is an example of how we can use GlycoDraw to visualise one of the outputs:
#This will calculate where on the glycan the fragments occured and return a valid GlycoDraw input
fragment_iupac = domon_costello_to_fragIUPAC('Gal(a1-3)Gal(b1-4)GlcNAc(b1-6)[GalNAc(b1-4)GlcNAc(b1-3)]Gal(b1-4)Glc',['Y_3_Beta', 'Y_2_Alpha'])
#Then we can simply draw the result with GlycoDraw
GlycoDraw(fragment_iupac)
Modules
prediction
- Includes all functions used in
wrap_inference
. - Contains
process_mzML_stack
andprocess_mzXML_stack
to extract spectra from .mzML and .mzXML files
analysis
- Includes all functions used in
CandyCrumbs
. - Contains functions to analyze and compare averaged spectra
- Contains other functions to manipulate glycan string representations, e.g.,
domon_costello_to_fragIUPAC
model
- Includes code for model definition, dataset handling, and data augmentation; only used in the back-end
examples
- Includes the extracted spectra of an example mzML file from Kouka et al. 2022
Citation
If you use CandyCrunch
or any of our datasets in your work, please cite Urban et al., bioRxiv 2023.
The data used to train CandyCrunch
can be found at Zenodo, under doi:10.5281/zenodo.7940047
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for CandyCrunch-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83ba51eec82a9c38650f3928b25883a085c2fccedb7699db888d6c798f6e081b |
|
MD5 | 2a974bc35e916837033d20c7c72bda36 |
|
BLAKE2b-256 | bd95d3186d6b2d2fbbd8659b6c433f52af5c1638f69bef83a3d096598adbe2fa |