Annotate LC-MS1 data, MS imaging data or pseudo MS/MS spectra using reference MS/MS libraries
Project description
ms1_id
Full-scan MS data from both LC-MS and MS imaging capture multiple ion forms, including their in/post-source fragments. Here we leverage such fragments to structurally annotate full-scan data from LC-MS or MS imaging by matching against MS/MS spectral libraries.
ms1_id is a Python package that annotates full-scan MS data using tandem MS libraries, specifically:
- annotate pseudo MS/MS spectra: mgf files
- annotate LC-MS data: mzML or mzXML files
- annotate MS imaging data: imzML and ibd files
- build indexed MS/MS libraries from mgf or msp files (see Flash entropy for more details)
Workflow
Example annotations
Installation
pip install ms1_id
Python 3.9+ is required. It has been tested on macOS (14.6, M2 Max) and Linux (Ubuntu 20.04).
Usage
Note: Indexed libraries are needed for the workflow. You can download the indexed GNPS library here.
wget https://github.com/Philipbear/ms1_id/releases/latest/download/indexed_gnps_libs.zip
unzip indexed_gnps_libs.zip -d db
Annotate pseudo MS/MS spectra
If you have pseudo MS/MS spectra in mgf format, you can directly annotate them:
ms1_id annotate --input_file pseudo_msms.mgf --libs db/gnps.pkl db/gnps_k10.pkl --min_score 0.7 --min_matched_peak 3
Here, two indexed libraries are searched against, and the result tsv files will be saved in the same directory as the input file.
For more options, run:
ms1_id annotate --help
Annotate LC-MS data
To annotate LC-MS data, here is an example command:
ms1_id lcms --project_dir lc_ms --sample_dir data --ms1_id_libs db/gnps.pkl db/gnps_k10.pkl --ms2_id_lib db/gnps.pkl
Here, lc_ms is the project directory. Raw mzML or mzXML files are stored in the lc_ms/data folder. Both MS1 and MS/MS annotations will be performed, and the results can be accessed from aligned_feature_table.tsv.
For more options, run:
ms1_id lcms --help
Expected runtime is <3 min for a single LC-MS file. If it takes longer than 10 min, please increase the --mass_detect_int_tol parameter (default: 2e5 for Orbitraps, 5e2 for QTOFs).
Annotate MS imaging data
To annotate MS imaging data, here is an example command:
ms1_id msi --project_dir msi --libs db/gnps.pkl db/gnps_k10.pkl --n_cores 12
Here, msi is the project directory. Raw imzML and ibd files are stored in the msi folder, and 12 cores will be used for parallel processing. Annotation results can be accessed from ms1_id_annotations_derep.tsv
For more options, run:
ms1_id msi --help
Expected runtime <5 min for a single MS imaging dataset.
Build indexed MS/MS libraries
To build your own indexed library, run:
ms1_id index --ms2db library.msp --peak_scale_k 10 --peak_intensity_power 0.5
For more options, run:
ms1_id index --help
Demo
We provide a demo script to prepare the environment, download libraries, download LC-MS data and run the annotation workflow.
bash run.sh
Citation
Shipei Xing, Vincent Charron-Lamoureux, Yasin El Abiead, Huaxu Yu, Oliver Fiehn, Theodore Alexandrov, Pieter C. Dorrestein. Annotating full-scan MS data using tandem MS libraries. bioRxiv 2024.
Data
- GNPS MS/MS library
- ALL_GNPS_NO_PROPOGATED.msp, downloaded on July 17, 2024
- Indexed version available here
- LC-MS data
- Pooled chemical standards (GNPS/MassIVE MSV000095789)
- NIST human feces (Q Exactive) (GNPS/MassIVE MSV000095787)
- IBD dataset (Q Exactive) (original paper, data)
- Mouse feces (lipidomics, Q-TOF) (GNPS/MassIVE MSV000095868)
- Mouse bone tissue (lipidomics, Q Exactive) (GNPS/MassIVE MSV000096539)
- Komagataella phaffii (yeast, Q Exactive) (GNPS/MassIVE MSV000090053)
- Bacterial isolates (Q Exactive) (GNPS/MassIVE MSV000085024)
- Sea water DOM (Q Exactive) (GNPS/MassIVE MSV000094338)
- Foam DOM (Q Exactive) (GNPS/MassIVE MSV000083888)
- Ocean DOM (Q Exactive) (GNPS/MassIVE MSV000083632)
- Psychotria plant extracts (Q-TOF) (GNPS/MassIVE MSV000078931)
- Bidens sulphurea and Bidens gardneri plant extracts (Q-TOF) (GNPS/MassIVE MSV000078727)
- MS imaging data
- Mouse brain (original paper, data)
- Mouse body (METASPACE dataset)
- Hepatocytes (METASPACE dataset)
License
This project is licensed under the Apache 2.0 License (Copyright 2024 Shipei Xing).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ms1_id-0.1.9.tar.gz.
File metadata
- Download URL: ms1_id-0.1.9.tar.gz
- Upload date:
- Size: 183.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79a89d8126f8993f9c4eaf37484681f3128212195ef9391006a67204b77c3a65
|
|
| MD5 |
18aae137dd3829d455b14efd3674f711
|
|
| BLAKE2b-256 |
036dbf3e55748d5c9f5bd394c96a653590581af2e02852a42172e0e5a887ed3a
|
File details
Details for the file ms1_id-0.1.9-py3-none-any.whl.
File metadata
- Download URL: ms1_id-0.1.9-py3-none-any.whl
- Upload date:
- Size: 236.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
752acc73633ec39c6054a41dff260a1241ea94396015ec76128277490ab56f88
|
|
| MD5 |
da6eef617bf420fdbd3741d44b230ce9
|
|
| BLAKE2b-256 |
bb4b6019f0d2398f243c7d526a47e2ef9c0c7c69ffa11b65c4c27c3370f0b488
|