Skip to main content

Spectrum2Structure Transformer Ranker

Project description

SSTR: Spectrum2Structure Transformer Ranker

The Spectrum2Structure Transformer Ranker (SSTR) is a tool designed to rank and generate chemical structures based on MS/MS spectrum data. It leverages transformers to perform de novo chemical structure generation and ranking of candidate molecules.

Installation

We recommend using conda to create a virtual environment and install the dependencies.

conda create -n SSTR python=3.10
conda activate SSTR

To install the package, run the following command:

pip install SSTR

To test the installation, run the following command:

sstr --help

If the installation is successful, you should see the help message for the SSTR CLI.

Usage Instructions

We provide example data in the example directory to demonstrate the usage of SSTR. You can use the example data to test the CLI commands.

De Novo Generation of Chemical Structures

To start the de novo generation, you must provide an MS/MS spectrum in either an MSP or MGF file format. The file must have a .msp or .mgf extension. One file can only contain one spectrum.

The essential properties required in the file are:

  • FORMULA: Molecular formula of the compound.
  • IONMODE: Ionization mode (positive or negative).
  • PRECURSOR_MZ: Precursor mass/charge ratio.
  • ADDUCT: The adduct form.

One example of such a file is provided in the example/lipid.mgf file.

To annotate the molecular formula, you can use external tools like Buddy or SIRIUS.

To generate one structure, run the following command:

sstr generate <path_to_msp_or_mgf_file>

You can enable stream mode to actually see the generation process:

sstr generate --stream <path_to_msp_or_mgf_file>

To generate 10 structures using beam search, run the following command:

sstr propose --beam 10 <path_to_msp_or_mgf_file>

Ranking Candidate Structures

To rank candidate chemical structures, provide the MS/MS spectrum file along with a file containing candidate structures in SMILES format.

A recommended approach is to annotate the molecular formula first, then retrieve the candidate structures with the same molecular formula from a database like PubChem.

The candidate SMILES should be stored in a .txt file, with one SMILES string per line.

One example of the candidate SMILES file is provided in the example/isomers.txt file.

To rank the candidate structures based on the MS/MS spectrum:

sstr rank <path_to_msp_or_mgf_file> --candidates <path_to_candidate_smiles_file>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sstr-1.0.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

sstr-1.0.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file sstr-1.0.0.tar.gz.

File metadata

  • Download URL: sstr-1.0.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.2.0

File hashes

Hashes for sstr-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f211dafdb65aa25d7e62660771f46b4e0ff8217d93fe40f258ff5df20b9c003c
MD5 ddd302fb02f59be8a1f004e2161293f2
BLAKE2b-256 907746e4248da24f8cae1053508ca89b82fb7d491bf4efeb56561dd1483a1e90

See more details on using hashes here.

File details

Details for the file sstr-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: sstr-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.5 Darwin/23.2.0

File hashes

Hashes for sstr-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fe580deb64dd2d7b693b56c9872b303cf98da3c1f894a97847c23e532eb9f558
MD5 c459e37342a2296021450abd7c079b7b
BLAKE2b-256 d82657c920714fb857cb5dd84e892e189d94dd031c1c66f2ef1dcfe0a1e6dd09

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page