Tools for mass spectrometry data analysis
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
ms-toolkit
Tools for mass spectrometry (MS) library searching and model training.
This library provides a pipeline for vectorizing spectra, training Word2Vec models, preselecting candidates using clustering/GMM, and searching using weighted cosine or embedding similarity. Portions of the code are adapted from the Spec2Vec project.
Features
- Parse MS library text files with optional progress UI
- Create
SpectrumDocumentobjects for Word2Vec training - Train and load Word2Vec models (
w2v.py) - Vectorize spectra and perform similarity search (
preprocessing.py,similarity.py) - Preselect candidates using KMeans or Gaussian Mixture Models (
preselector.py) - High-level
MSToolkitfacade wrapping the workflow (api.py)
Installation
Install with pip using the included setup.py:
pip install .
Dependencies include numpy, joblib, gensim, and scikit-learn. Optional UI
features require customtkinter or PySide6.
Usage Example
from ms_toolkit.api import MSToolkit
# Initialize toolkit
ms = MSToolkit(library_txt="NIST14.txt", cache_json="library.json")
# Load library (shows progress UI by default)
ms.load_library()
# Vectorize and train models
ms.vectorize_library()
ms.train_preselector()
ms.train_w2v()
# Search using a query spectrum
query = [(100, 0.5), (150, 1.0), (200, 0.8)]
results = ms.search_w2v(query)
for compound, score in results:
print(compound, score)
License
This project is licensed under the Apache License 2.0. See LICENSE for details.
The NOTICE file explains that some code derives from Spec2Vec, which is also
Apache 2.0 licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ms_toolkit_nrel-0.1.0.tar.gz.
File metadata
- Download URL: ms_toolkit_nrel-0.1.0.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14582e06c32b9019b6106e94d56617b85ced60b21503fc17395469b6cc4daeb6
|
|
| MD5 |
51cdea4305fda66661681bd03d09277d
|
|
| BLAKE2b-256 |
79fc657ebc4fc90ccd08da69e0a2eaca86aba8f0126f136742e5ae354fa753d6
|
Provenance
The following attestation bundles were made for ms_toolkit_nrel-0.1.0.tar.gz:
Publisher:
python-publish.yml on calebcoatney/ms-toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ms_toolkit_nrel-0.1.0.tar.gz -
Subject digest:
14582e06c32b9019b6106e94d56617b85ced60b21503fc17395469b6cc4daeb6 - Sigstore transparency entry: 230412588
- Sigstore integration time:
-
Permalink:
calebcoatney/ms-toolkit@f921e63b766cb8e4f8c08e77dcaa693f5b668798 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/calebcoatney
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@f921e63b766cb8e4f8c08e77dcaa693f5b668798 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ms_toolkit_nrel-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ms_toolkit_nrel-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
257aa149a74a7eda33324729dad7ac9b1e039807e2ed0c68bd11613fbf1b669d
|
|
| MD5 |
f321a53a539ad89cc2ea74814f56d658
|
|
| BLAKE2b-256 |
d1c37d4e3c10a089943664345db055096fe91d1a7dc0c086455bdd8e1311f99a
|
Provenance
The following attestation bundles were made for ms_toolkit_nrel-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on calebcoatney/ms-toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ms_toolkit_nrel-0.1.0-py3-none-any.whl -
Subject digest:
257aa149a74a7eda33324729dad7ac9b1e039807e2ed0c68bd11613fbf1b669d - Sigstore transparency entry: 230412590
- Sigstore integration time:
-
Permalink:
calebcoatney/ms-toolkit@f921e63b766cb8e4f8c08e77dcaa693f5b668798 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/calebcoatney
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@f921e63b766cb8e4f8c08e77dcaa693f5b668798 -
Trigger Event:
release
-
Statement type: