Spectrum2Structure Transformer Ranker
Project description
SSTR: Spectrum2Structure Transformer Ranker
The Spectrum2Structure Transformer Ranker (SSTR) is a tool designed to rank and generate chemical structures based on MS/MS spectrum data. It leverages transformers to perform de novo chemical structure generation and ranking of candidate molecules.
Installation
We recommend using conda to create a virtual environment and install the dependencies.
conda create -n SSTR python=3.10
conda activate SSTR
To install the package, run the following command:
pip install SSTR
To test the installation, run the following command:
sstr --help
If the installation is successful, you should see the help message for the SSTR CLI.
Usage Instructions
We provide example data in the example
directory to demonstrate the usage of SSTR.
You can use the example data to test the CLI commands.
De Novo Generation of Chemical Structures
To start the de novo generation, you must provide an MS/MS spectrum in either an MSP or MGF file format. The file must have a .msp or .mgf extension. One file can only contain one spectrum.
The essential properties required in the file are:
- FORMULA: Molecular formula of the compound.
- IONMODE: Ionization mode (positive or negative).
- PRECURSOR_MZ: Precursor mass/charge ratio.
- ADDUCT: The adduct form.
One example of such a file is provided in the example/lipid.mgf
file.
To annotate the molecular formula, you can use external tools like Buddy or SIRIUS.
To generate one structure, run the following command:
sstr generate <path_to_msp_or_mgf_file>
You can enable stream mode to actually see the generation process:
sstr generate --stream <path_to_msp_or_mgf_file>
To generate 10 structures using beam search, run the following command:
sstr propose --beam 10 <path_to_msp_or_mgf_file>
Ranking Candidate Structures
To rank candidate chemical structures, provide the MS/MS spectrum file along with a file containing candidate structures in SMILES format.
A recommended approach is to annotate the molecular formula first, then retrieve the candidate structures with the same molecular formula from a database like PubChem.
The candidate SMILES should be stored in a .txt file, with one SMILES string per line.
One example of the candidate SMILES file is provided in the example/isomers.txt
file.
To rank the candidate structures based on the MS/MS spectrum:
sstr rank <path_to_msp_or_mgf_file> --candidates <path_to_candidate_smiles_file>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sstr-1.0.0.tar.gz
.
File metadata
- Download URL: sstr-1.0.0.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f211dafdb65aa25d7e62660771f46b4e0ff8217d93fe40f258ff5df20b9c003c |
|
MD5 | ddd302fb02f59be8a1f004e2161293f2 |
|
BLAKE2b-256 | 907746e4248da24f8cae1053508ca89b82fb7d491bf4efeb56561dd1483a1e90 |
File details
Details for the file sstr-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: sstr-1.0.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe580deb64dd2d7b693b56c9872b303cf98da3c1f894a97847c23e532eb9f558 |
|
MD5 | c459e37342a2296021450abd7c079b7b |
|
BLAKE2b-256 | d82657c920714fb857cb5dd84e892e189d94dd031c1c66f2ef1dcfe0a1e6dd09 |