3DMolMS: prediction of tandem mass spectra from 3D molecular conformations
Project description
3DMolMS
3D Molecular Network for Mass Spectra Prediction (3DMolMS) is a deep neural network model to predict the MS/MS spectra of compounds from their 3D conformations. This model's molecular representation, learned through MS/MS prediction tasks, can be further applied to enhance performance in other molecular-related tasks, such as predicting retention times (RT) and collision cross sections (CCS).
Read paper in Bioinformatics | Try online service at GNPS | Try model on Konia | Install from PyPI
🆕 3DMolMS v1.1.10 is now available for inference on Konia, GNPS, and PyPI!
The changes log can be found at [CHANGE_LOG.md].
Installation
3DMolMS is available on PyPI (molnetpack). You can install the latest version using pip:
pip install molnetpack
# PyTorch must be installed separately.
# Please check the official website of PyTorch for the proper version:
# https://pytorch.org/get-started/locally/
# e.g.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
3DMolMS can also be installed through source codes:
git clone https://github.com/JosieHong/3DMolMS.git
cd 3DMolMS
pip install .
Usage
To get started quickly, you can instantiate a MolNet and load a CSV or MGF file for MS/MS prediction as:
import torch
from molnetpack import MolNet, plot_msms
# Set the device to CPU for CPU-only usage:
device = torch.device("cpu")
# For GPU usage, set the device as follows (replace '0' with your desired GPU index):
# gpu_index = 0
# device = torch.device(f"cuda:{gpu_index}")
# Instantiate a MolNet object
molnet_engine = MolNet(device, seed=42) # The random seed can be any integer.
# Load input data (here we use a CSV file as an example)
molnet_engine.load_data(path_to_test_data='./test/input_msms.csv')
"""Load data from the specified path.
Args:
path_to_test_data (str): Path to the test data file. Supported formats are 'csv', 'mgf', and 'pkl'.
Returns:
None
"""
# Predict MS/MS
pred_spectra_df = molnet_engine.pred_msms(instrument='qtof')
"""Predict MS/MS spectra.
Args:
path_to_results (Optional[str]): Path to save the prediction results. Supports '.mgf' or '.csv' formats. If None, the results won't be saved.
path_to_checkpoint (Optional[str]): Path to the model checkpoint. If None, the model will be downloaded from a default URL.
instrument (str): Type of instrument used ('qtof' or 'orbitrap').
Returns:
pd.DataFrame: DataFrame containing the predicted MS/MS results.
"""
We also implement a function to plot the predicted results.
# Plot the predicted MS/MS with 3D molecular conformation
plot_msms(pred_spectra_df, dir_to_img='./img/')
The sample input files, a CSV and an MGF, are located at ./test/demo_input.csv and ./test/demo_input.mgf, respectively. It's important to note that during the data loading phase, any input formats that are not supported will be automatically excluded. Below is a table outlining the types of input data that are supported:
| Item | Supported input |
|---|---|
| Atom number | <=300 |
| Atom types | 'C', 'O', 'N', 'H', 'P', 'S', 'F', 'Cl', 'B', 'Br', 'I', 'Na' |
| Precursor types | '[M+H]+', '[M-H]-', '[M+H-H2O]+', '[M+Na]+', '[M+2H]2+' |
| Collision energy | any number |
Below is an example of a predicted MS/MS spectrum plot.
A more detailed documentation for various tasks using molnetpack or source code can be found in the docs/ directory, which includes the following:
- ./docs/
- PROP_USAGE.md: Guide on using
molnetpackfor RT prediction, CCS prediction, and molecular embedding. - MSMS_PRED.md: Instructions for using 3DMolMS to predict MS/MS spectra from your own CSV files via the source code. The training details can be found in the next section.
- GEN_REFER_LIB.md: Instructions for using 3DMolMS to generate MS/MS reference libraries from small molecule databases, such as HMDB and RefMet, via the source code.
- PROP_PRED.md: Instructions for training and testing 3DMolMS on RT and CCS prediction via the source code.
- PRETRAIN.md: Instructions for pretraining 3DMolMS on the QM9 dataset via the source code.
- PROP_USAGE.md: Guide on using
Train your own model
Step 0: Clone the Repository and Set Up the Environment
Clone the 3DMolMS repository and install the required packages using the following commands:
git clone https://github.com/JosieHong/3DMolMS.git
cd 3DMolMS
# Please install the packages if you have not installed them yet.
pip install .
Step 1: Obtain the Pretrained Model
Download the pretrained model (molnet_pre_etkdgv3.pt.zip) from Releases. You can also train the model from scratch. For details on pretraining the model on the QM9 dataset, refer to PRETRAIN.md.
Step 2: Prepare the Datasets
Download and organize the datasets into the ./data/ directory. The current version uses four datasets:
- Agilent DPCL, provided by Agilent Technologies.
- NIST20, available under license for academic use.
- MoNA, publicly available.
- Waters QTOF, our own experimental dataset.
The data directory structure should look like this:
|- data
|- origin
|- Agilent_Combined.sdf
|- Agilent_Metlin.sdf
|- hr_msms_nist.SDF
|- MoNA-export-All_LC-MS-MS_QTOF.sdf
|- MoNA-export-All_LC-MS-MS_Orbitrap.sdf
|- waters_qtof.mgf
Step 3: Preprocess the Datasets
Run the following commands to preprocess the datasets. Specify the dataset with --dataset and select the instrument type as qtof. Use --maxmin_pick to apply the MaxMin algorithm for selecting training molecules; otherwise, selection will be random. The dataset configurations are in ./src/molnetpack/config/preprocess_etkdgv3.yml.
python ./src/preprocess.py --dataset agilent nist mona waters gnps \
--instrument_type qtof orbitrap \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--mgf_dir ./data/mgf_debug/
Step 4: Train the Model
Use the following commands to train the model. Configuration settings for the model and training process are located in ./src/molnetpack/config/molnet.yml.
# Train the model from pretrain:
# Q-TOF (Orbitrap is ignored here.):
python ./src/train.py --train_data ./data/qtof_etkdgv3_train.pkl \
--test_data ./data/qtof_etkdgv3_test.pkl \
--model_config_path ./src/molnetpack/config/molnet.yml \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--checkpoint_path ./check_point/molnet_qtof_etkdgv3.pt \
--transfer --resume_path ./check_point/molnet_pre_etkdgv3.pt \
--ex_model_path ./check_point/molnet_qtof_etkdgv3_jit.pt
# Train the model from scratch
# Q-TOF:
python ./src/train.py --train_data ./data/qtof_etkdgv3_train.pkl \
--test_data ./data/qtof_etkdgv3_test.pkl \
--model_config_path ./src/molnetpack/config/molnet.yml \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--checkpoint_path ./check_point/molnet_qtof_etkdgv3.pt \
--ex_model_path ./check_point/molnet_qtof_etkdgv3_jit.pt
# Orbitrap:
python ./src/train.py --train_data ./data/orbitrap_etkdgv3_train.pkl \
--test_data ./data/orbitrap_etkdgv3_test.pkl \
--model_config_path ./src/molnetpack/config/molnet.yml \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--checkpoint_path ./check_point/molnet_orbitrap_etkdgv3.pt \
--ex_model_path ./check_point/molnet_orbitrap_etkdgv3_jit.pt
Step 5: Evaluation
Let's evaluate the model trained above!
# Predict the spectra:
# Q-TOF:
python ./src/pred.py \
--test_data ./data/qtof_etkdgv3_test.pkl \
--model_config_path ./src/molnetpack/config/molnet.yml \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--resume_path ./check_point/molnet_qtof_etkdgv3.pt \
--result_path ./result/pred_qtof_etkdgv3_test.mgf
# Orbitrap:
python ./src/pred.py \
--test_data ./data/orbitrap_etkdgv3_test.pkl \
--model_config_path ./src/molnetpack/config/molnet.yml \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--resume_path ./check_point/molnet_orbitrap_etkdgv3.pt \
--result_path ./result/pred_orbitrap_etkdgv3_test.mgf
# Evaluate the cosine similarity between experimental spectra and predicted spectra:
# Q-TOF:
python ./src/eval.py ./data/qtof_etkdgv3_test.pkl ./result/pred_qtof_etkdgv3_test.mgf \
./eval_qtof_etkdgv3_test.csv ./eval_qtof_etkdgv3_test.png
# Orbitrap:
python ./src/eval.py ./data/orbitrap_etkdgv3_test.pkl ./result/pred_orbitrap_etkdgv3_test.mgf \
./eval_orbitrap_etkdgv3_test.csv ./eval_orbitrap_etkdgv3_test.png
Additional application
3DMolMS is also capable of predicting molecular properties and generating reference libraries for molecular identification. For more details, refer to PROP_PRED.md and GEN_REFER_LIB.md respectively.
Citation
@article{hong20233dmolms,
title={3DMolMS: prediction of tandem mass spectra from 3D molecular conformations},
author={Hong, Yuhui and Li, Sujun and Welch, Christopher J and Tichy, Shane and Ye, Yuzhen and Tang, Haixu},
journal={Bioinformatics},
volume={39},
number={6},
pages={btad354},
year={2023},
publisher={Oxford University Press}
}
@article{hong2024enhanced,
title={Enhanced structure-based prediction of chiral stationary phases for chromatographic enantioseparation from 3D molecular conformations},
author={Hong, Yuhui and Welch, Christopher J and Piras, Patrick and Tang, Haixu},
journal={Analytical Chemistry},
volume={96},
number={6},
pages={2351--2359},
year={2024},
publisher={ACS Publications}
}
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file molnetpack-1.1.10.post1.tar.gz.
File metadata
- Download URL: molnetpack-1.1.10.post1.tar.gz
- Upload date:
- Size: 30.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05f85f2f5c6485750ec658be2d24f946f3659abc2e11534d000925fdcb467853
|
|
| MD5 |
a069049354abfa2ce8ff4d1bbaf0b76c
|
|
| BLAKE2b-256 |
811a2c7a58b41d090f444b290269b77821733435494b4852daf700fca629c6d1
|
File details
Details for the file molnetpack-1.1.10.post1-py3-none-any.whl.
File metadata
- Download URL: molnetpack-1.1.10.post1-py3-none-any.whl
- Upload date:
- Size: 32.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d3e96e1ad26eae4b333f3130d66759aa0b4013cbef4f6c63d9259adaa4146a8
|
|
| MD5 |
f0c4452d22a126b45778b2173172445b
|
|
| BLAKE2b-256 |
932f5fe4c80974fac0b815f10bf62062d885053eefa5227ca45f6f1a51d2da72
|