Skip to main content

Tools for mass spectrometry data analysis

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

ms-toolkit

Tools for mass spectrometry (MS) library searching and model training.

This library provides a pipeline for vectorizing spectra, training Word2Vec models, preselecting candidates using clustering/GMM, and searching using weighted cosine or embedding similarity. Portions of the code are adapted from the Spec2Vec project.

Features

  • Parse MS library text files with optional progress UI
  • Create SpectrumDocument objects for Word2Vec training
  • Train and load Word2Vec models (w2v.py)
  • Vectorize spectra and perform similarity search (preprocessing.py, similarity.py)
  • Preselect candidates using KMeans or Gaussian Mixture Models (preselector.py)
  • High-level MSToolkit facade wrapping the workflow (api.py)

Installation

Install with pip using the included setup.py:

pip install .

Dependencies include numpy, joblib, gensim, and scikit-learn. Optional UI features require customtkinter or PySide6.

Usage Example

from ms_toolkit.api import MSToolkit

# Initialize toolkit
ms = MSToolkit(library_txt="NIST14.txt", cache_json="library.json")

# Load library (shows progress UI by default)
ms.load_library()

# Vectorize and train models
ms.vectorize_library()
ms.train_preselector()
ms.train_w2v()

# Search using a query spectrum
query = [(100, 0.5), (150, 1.0), (200, 0.8)]
results = ms.search_w2v(query)
for compound, score in results:
    print(compound, score)

License

This project is licensed under the Apache License 2.0. See LICENSE for details. The NOTICE file explains that some code derives from Spec2Vec, which is also Apache 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms_toolkit_nrel-0.1.1.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ms_toolkit_nrel-0.1.1-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file ms_toolkit_nrel-0.1.1.tar.gz.

File metadata

  • Download URL: ms_toolkit_nrel-0.1.1.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ms_toolkit_nrel-0.1.1.tar.gz
Algorithm Hash digest
SHA256 11b4d82042f5f437746bcc9bb4774868f6293fcf020bc44cc93c78b6a333b0e7
MD5 554e2fefd0a2f8bef7d19bd1b0a3b1db
BLAKE2b-256 f4c087a40409b58c0887ce82db2ceedffdda9c174ba16df935d4132ac87f286e

See more details on using hashes here.

Provenance

The following attestation bundles were made for ms_toolkit_nrel-0.1.1.tar.gz:

Publisher: python-publish.yml on calebcoatney/ms-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ms_toolkit_nrel-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ms_toolkit_nrel-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 68133000b827c2f6a4d0f935b24642c19945d3728a35d7d457d7b179bffc614e
MD5 f4637ed723a75893bcbcfbbaafe07993
BLAKE2b-256 94a41265922eaca4edbaa00ec378592e5cad905e61349b381f0ad3b883af2dd7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ms_toolkit_nrel-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on calebcoatney/ms-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page