Skip to main content

Annotate LC-MS1 data, MS imaging data or pseudo MS/MS spectra using reference MS/MS libraries

Project description

ms1_id

Developer PyPI License Python

Full-scan MS data from both LC-MS and MS imaging capture multiple ion forms, including their in/post-source fragments. Here we leverage such fragments to structurally annotate full-scan data from LC-MS or MS imaging by matching against MS/MS spectral libraries.

ms1_id is a Python package that annotates full-scan MS data using tandem MS libraries, specifically:

  • annotate pseudo MS/MS spectra: mgf files
  • annotate LC-MS data: mzML or mzXML files
  • annotate MS imaging data: imzML and ibd files
  • build indexed MS/MS libraries from mgf or msp files (see Flash entropy for more details)

Workflow

Annotation workflow

Example annotations

Example annotation

Installation

pip install ms1_id

Python 3.9+ is required. It has been tested on macOS (14.6, M2 Max) and Linux (Ubuntu 20.04).

Usage

Note: Indexed libraries are needed for the workflow. You can download the indexed GNPS library here.

# For LC-MS data
wget https://github.com/Philipbear/ms1_id/releases/latest/download/gnps.zip
unzip gnps.zip -d db

# For MS imaging data (fragments with mz < 100 are removed, as they are not usually included in MS imaging data)
wget https://github.com/Philipbear/ms1_id/releases/latest/download/gnps_minmz100.zip
unzip gnps_minmz100.zip -d db

Annotate pseudo MS/MS spectra

If you have pseudo MS/MS spectra in mgf format, you can directly annotate them:

ms1_id annotate --input_file pseudo_msms.mgf --libs db/gnps.pkl db/gnps_k10.pkl --min_score 0.7 --min_matched_peak 3

Here, two indexed libraries are searched against, and the result tsv files will be saved in the same directory as the input file.

For more options, run:

ms1_id annotate --help

Annotate LC-MS data

To annotate LC-MS data, here is an example command:

ms1_id lcms --project_dir lc_ms --sample_dir data --ms1_id_libs db/gnps.pkl db/gnps_k10.pkl --ms2_id_lib db/gnps.pkl

Here, lc_ms is the project directory. Raw mzML or mzXML files are stored in the lc_ms/data folder. Both MS1 and MS/MS annotations will be performed. For MS1 annotation, both gnps.pkl and gnps_k10.pkl libraries are used. For MS/MS annotation, the gnps.pkl library is used. Results can be accessed from aligned_feature_table.tsv.

For more options, run:

ms1_id lcms --help

Expected runtime is ~5-7 min for a single LC-MS file. If it takes longer than 10 min, please increase the --mass_detect_int_tol parameter (default: 2e5 for Orbitraps, 5e2 for QTOFs).


Annotate MS imaging data

To annotate MS imaging data, here is an example command:

ms1_id msi --input_dir msi --libs db/gnps_minmz100.pkl db/gnps_minmz100_k10.pkl --n_cores 12

Here, msi is the input directory consisting of the imzML and ibd files. All the imzML files in the directory will be annotated individually. Two libraries are used simultaneously, and 12 cores will be used for parallel processing. Annotation results can be accessed from ms1_id_annotations_derep.tsv

For more options, run:

ms1_id msi --help

Expected runtime is ~3-20 min for a single MS imaging dataset if at least 12 cores are available.


Build indexed MS/MS libraries

To build your own indexed library, run:

ms1_id index --ms2db library.msp --peak_scale_k 10 --peak_intensity_power 0.5

For more options, run:

ms1_id index --help

Demo

We provide a demo script to prepare the environment, download libraries, download LC-MS data and run the annotation workflow.

bash run.sh

Citation

Shipei Xing, Vincent Charron-Lamoureux, Yasin El Abiead, Huaxu Yu, Oliver Fiehn, Theodore Alexandrov, Pieter C. Dorrestein. Annotating full-scan MS data using tandem MS libraries. bioRxiv 2024.

Data

Data type Dataset Link Instrument
LC-MS Pooled chemical standards MSV000095789 Q Exactive
LC-MS NIST human feces MSV000095787 Q Exactive
LC-MS IBD dataset PR000639 Q Exactive
LC-MS Mouse feces (lipidomics) MSV000095868 Q-TOF
LC-MS Komagataella phaffii (yeast) MSV000090053 Q Exactive
LC-MS Bacterial isolates MSV000085024 Q Exactive
LC-MS Odontotaenius disjunctus microbe isolates MSV000090030 Q Exactive
LC-MS Environmental fungal strains MSV000090000 Q Exactive
LC-MS Sea water DOM MSV000094338 Q Exactive
LC-MS Foam DOM MSV000083888 Q Exactive
LC-MS Ocean DOM MSV000083632 Q Exactive
LC-MS Plant extracts MSV000090975 Q Exactive
LC-MS 32 plant species MSV000090968 Q Exactive
MS imaging Mouse liver with spotted standards METASPACE MALDI-Orbitrap
MS imaging Mouse brain MTBLS313 MALDI-FTICR
MS imaging Mouse body METASPACE MALDI-FTICR
MS imaging Hepatocytes METASPACE project MALDI-Orbitrap
MS imaging Populus trichocarpa root METASPACE MALDI-timsTOF
MS imaging Human liver METASPACE MALDI-TOF
MS imaging Human kidney METASPACE MALDI-timsTOF
MS imaging Mouse kidney METASPACE MALDI-FTICR
MS imaging Mouse brain (TOF) METASPACE MALDI-TOF

License

This project is licensed under the Apache 2.0 License (Copyright 2024 Shipei Xing).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms1_id-0.2.1.tar.gz (204.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ms1_id-0.2.1-py3-none-any.whl (264.2 kB view details)

Uploaded Python 3

File details

Details for the file ms1_id-0.2.1.tar.gz.

File metadata

  • Download URL: ms1_id-0.2.1.tar.gz
  • Upload date:
  • Size: 204.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for ms1_id-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3d6fea3ed4e424895eeb375e39e5b3b1340c8a8e9ce133d048673fc9b775a1a6
MD5 e6604223714810d8273c5e050f647bd0
BLAKE2b-256 ee5cdd3f51cb11c057f15b0bcb4c3c56c711da4b737915a0335cd7ba61c60367

See more details on using hashes here.

File details

Details for the file ms1_id-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: ms1_id-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 264.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for ms1_id-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7c018bfbbda70937b39e4a71dafb32f81402e7942a5cdc89e9c9044def1cd27b
MD5 f96f4b59b0659b0679c1f13af5fd0e18
BLAKE2b-256 545a50fdf472441bd5af72610af6e288c40b4fc6a578e6b60bddfe4ae26eecc5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page