Skip to main content

Tools for mass spectrometry data analysis

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

ms-toolkit

Tools for mass spectrometry (MS) library searching and model training.

This library provides a pipeline for vectorizing spectra, training Word2Vec models, preselecting candidates using clustering/GMM, and searching using weighted cosine or embedding similarity. Portions of the code are adapted from the Spec2Vec project.

Features

  • Parse MS library text files with optional progress UI
  • Create SpectrumDocument objects for Word2Vec training
  • Train and load Word2Vec models (w2v.py)
  • Vectorize spectra and perform similarity search (preprocessing.py, similarity.py)
  • Preselect candidates using KMeans or Gaussian Mixture Models (preselector.py)
  • High-level MSToolkit facade wrapping the workflow (api.py)

Installation

Install with pip using the included setup.py:

pip install .

Dependencies include numpy, joblib, gensim, and scikit-learn. Optional UI features require customtkinter or PySide6.

Usage Example

from ms_toolkit.api import MSToolkit

# Initialize toolkit
ms = MSToolkit(library_txt="NIST14.txt", cache_json="library.json")

# Load library (shows progress UI by default)
ms.load_library()

# Vectorize and train models
ms.vectorize_library()
ms.train_preselector()
ms.train_w2v()

# Search using a query spectrum
query = [(100, 0.5), (150, 1.0), (200, 0.8)]
results = ms.search_w2v(query)
for compound, score in results:
    print(compound, score)

License

This project is licensed under the Apache License 2.0. See LICENSE for details. The NOTICE file explains that some code derives from Spec2Vec, which is also Apache 2.0 licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ms_toolkit_nrel-0.1.0.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ms_toolkit_nrel-0.1.0-py3-none-any.whl (22.4 kB view details)

Uploaded Python 3

File details

Details for the file ms_toolkit_nrel-0.1.0.tar.gz.

File metadata

  • Download URL: ms_toolkit_nrel-0.1.0.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ms_toolkit_nrel-0.1.0.tar.gz
Algorithm Hash digest
SHA256 14582e06c32b9019b6106e94d56617b85ced60b21503fc17395469b6cc4daeb6
MD5 51cdea4305fda66661681bd03d09277d
BLAKE2b-256 79fc657ebc4fc90ccd08da69e0a2eaca86aba8f0126f136742e5ae354fa753d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for ms_toolkit_nrel-0.1.0.tar.gz:

Publisher: python-publish.yml on calebcoatney/ms-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ms_toolkit_nrel-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ms_toolkit_nrel-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 257aa149a74a7eda33324729dad7ac9b1e039807e2ed0c68bd11613fbf1b669d
MD5 f321a53a539ad89cc2ea74814f56d658
BLAKE2b-256 d1c37d4e3c10a089943664345db055096fe91d1a7dc0c086455bdd8e1311f99a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ms_toolkit_nrel-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on calebcoatney/ms-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page