Skip to main content

Deep learning similarity measure for comparing MS/MS spectra.

Project description

GitHub PyPI GitHub Workflow Status SonarCloud Quality Gate SonarCloud Coverage
DOI fair-software.eu

ms2deepscore

ms2deepscore provides a Siamese neural network that is trained to predict molecular structural similarities (Tanimoto scores) from pairs of mass spectrometry spectra.

The library provides an intuitive classes to prepare data, train a siamese model, and compute similarities between pairs of spectra.

In addition to the prediction of a structural similarity, MS2DeepScore can also make use of Monte-Carlo dropout to assess the model uncertainty.

Reference

If you use MS2DeepScore for your research, please cite the following:

"MS2DeepScore - a novel deep learning similarity measure to compare tandem mass spectra" Florian Huber, Sven van der Burg, Justin J.J. van der Hooft, Lars Ridder, 13, Article number: 84 (2021), Journal of Cheminformatics, doi: https://doi.org/10.1186/s13321-021-00558-4

If you use MS2Deepscore 2.0 or higher please also cite: Reliable cross-ion mode chemical similarity prediction between MS2 spectra Niek de Jonge, David Joas, Lem-Joe Truong, Justin J.J. van der Hooft, Florian Huber bioRxiv 2024.03.25.586580; doi: https://doi.org/10.1101/2024.03.25.586580

Setup

Requirements

Python 3.9, 3.10, 3.11 (higher will likely work but is not tested systematically).

Installation

Simply install using pip: pip install ms2deepscore

Prepare environment

We recommend to create an Anaconda environment with

conda create --name ms2deepscore python=3.9
conda activate ms2deepscore
pip install ms2deepscore

Alternatively, simply install in the environment of your choice by .

Or, to also include the full matchms functionality, including rdkit:

conda create --name ms2deepscore python=3.9
conda activate ms2deepscore
pip install ms2deepscore[chemistry]

Or, via conda:

conda create --name ms2deepscore python=3.9
conda activate ms2deepscore
conda install --channel bioconda --channel conda-forge matchms
pip install ms2deepscore

Getting started: How to prepare data, train a model, and compute similarities.

See notebooks/MS2DeepScore_tutorial.ipynb for a more extensive fully-working example on test data. If you are not familiar with matchms yet, then we also recommand our tutorial on how to get started using matchms.

There are two different ways to use MS2DeepScore to compute spectral similarities. You can train a new model on a dataset of your choice. That, however, should preferentially contain a substantial amount of spectra to learn relevant features, say > 10,000 spectra of sufficiently diverse types. The second way is much simpler: Use a model that was pretrained on a large dataset.

1) Use a pretrained model to compute spectral similarities

We provide a model which was trained on > 200,000 MS/MS spectra from GNPS, which can simply be downloaded from zenodo here. Only the ms2deepscore_model.pt is needed. To then compute the similarities between spectra of your choice you can run something like:

from matchms import calculate_scores
from matchms.importing import load_from_msp
from ms2deepscore import MS2DeepScore
from ms2deepscore.models import load_model

# Import data
references = load_from_msp("my_reference_spectra.msp")
queries = load_from_msp("my_query_spectra.msp")

# Load pretrained model
model = load_model("ms2deepscore_model.pt")

similarity_measure = MS2DeepScore(model)
# Calculate scores and get matchms.Scores object
scores = calculate_scores(references, queries, similarity_measure)

If you want to calculate all-vs-all spectral similarities, e.g. to build a network, than you can run:

scores = calculate_scores(references, references, similarity_measure, is_symmetric=True)

To use Monte-Carlo Dropout to also get a uncertainty measure with each score, run the following:

from matchms import calculate_scores()
from matchms.importing import load_from_msp
from ms2deepscore import MS2DeepScoreMonteCarlo
from ms2deepscore.models import load_model

# Import data
references = load_from_msp("my_reference_spectra.msp")
queries = load_from_msp("my_query_spectra.msp")

# Load pretrained model
model = load_model("ms2deepscore_model.pt")

similarity_measure = MS2DeepScoreMonteCarlo(model, n_ensembles=10)
# Calculate scores and get matchms.Scores object
scores = calculate_scores(references, queries, similarity_measure)

In that scenario, scores["score"] contains the similarity scores (median of the ensemble of 10x10 scores) and scores["uncertainty"] give an uncertainty estimate (interquartile range of ensemble of 10x10 scores.

2) Train an own MS2DeepScore model

Training your own model is only recommended if you have some familiarity with machine learning. To train your own model you can run the code below. Please first ensure cleaning your spectra. We recommend using the cleaning pipeline in matchms.

from ms2deepscore.SettingsMS2Deepscore import
    SettingsMS2Deepscore
from ms2deepscore.wrapper_functions.training_wrapper_functions import
    train_ms2deepscore_wrapper

settings = SettingsMS2Deepscore(**{"epochs": 300,
                                 "base_dims": (1000, 1000, 1000),
                                 "embedding_dim": 500,
                                 "ionisation_mode": "positive",
                                 "batch_size": 32,
                                 "learning_rate": 0.00025,
                                 "patience": 30,
                                 })
train_ms2deepscore_wrapper(
    spectra_file_path=#add your path,
    model_settings=settings,
    validation_split_fraction=20
)

Contributing

We welcome contributions to the development of ms2deepscore! Have a look at the contribution guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms2deepscore-2.3.0.tar.gz (69.4 kB view details)

Uploaded Source

Built Distribution

ms2deepscore-2.3.0-py3-none-any.whl (90.2 kB view details)

Uploaded Python 3

File details

Details for the file ms2deepscore-2.3.0.tar.gz.

File metadata

  • Download URL: ms2deepscore-2.3.0.tar.gz
  • Upload date:
  • Size: 69.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for ms2deepscore-2.3.0.tar.gz
Algorithm Hash digest
SHA256 9c16363bb5363aac2d4afe6dcb7b48d3e5a1b6e5316f6ef34c94eac8270f4182
MD5 f986516ac7aeed5b4a33d834773264d9
BLAKE2b-256 5c484c0c1bf8bd30607c86dcc4e91f9dd7cc8fdcf313a65147c20e9b6caff3f6

See more details on using hashes here.

File details

Details for the file ms2deepscore-2.3.0-py3-none-any.whl.

File metadata

  • Download URL: ms2deepscore-2.3.0-py3-none-any.whl
  • Upload date:
  • Size: 90.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for ms2deepscore-2.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4928371b40adbd93f8809a710c94969d5877acad18a9a07391ee37d5369a63b5
MD5 7c8f44a84b018f1a0d0c670ba616835d
BLAKE2b-256 16bd7ee1932266bcb9cd5d4d820d06172eab9475c13595c4c728b06b04593e18

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page