3DMolMS: prediction of tandem mass spectra from 3D molecular conformations
Project description
3DMolMS
3D Molecular Network for Mass Spectra Prediction (3DMolMS) is a deep neural network model to predict the MS/MS spectra of compounds from their 3D conformations. This model's molecular representation, learned through MS/MS prediction tasks, can be further applied to enhance performance in other molecular-related tasks, such as predicting retention times and collision cross sections.
Read our paper in Bioinformatics | Try our online service at GNPS | Install from PyPI
Installation
3DMolMS is available on PyPI. You can install the latest version using pip
:
pip install molnetpack
# PyTorch must be installed separately.
# For CUDA 11.6, install PyTorch with the following command:
pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
# For CUDA 11.7, use:
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
# For CPU-only usage, use:
pip install torch==1.13.0+cpu torchvision==0.14.0+cpu torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cpu
Usage
To get started quickly, you can load a CSV or MGF file to predict MS/MS and then plot the predicted results:
import torch
from molnetpack import MolNet
# Set the device to CPU for CPU-only usage:
device = torch.device("cpu")
# For GPU usage, set the device as follows (replace '0' with your desired GPU index):
# gpu_index = 0
# device = torch.device(f"cuda:{gpu_index}")
# Instantiate a MolNet object
molnet_engine = MolNet(device, seed=42) # The random seed can be any integer.
# Load data (here we use a CSV file as an example)
molnet_engine.load_data(path_to_test_data='./test/demo_input.csv')
# molnet_engine.load_data(path_to_test_data='./test/demo_input.mgf') # MGF file is also supported
# Predict MS/MS
spectra = molnet_engine.pred_msms(path_to_results='./test/demo_msms.mgf')
# Plot the predicted MS/MS with 3D molecular conformation
molnet_engine.plot_msms(dir_to_img='./img/')
Please note that unsupported input data will be automatically filtered out during the data loading process. The table below shows the supported inputs:
Item | Supported input |
---|---|
Atom number | <=300 |
Atom types | 'C', 'O', 'N', 'H', 'P', 'S', 'F', 'Cl', 'B', 'Br', 'I', 'Na' |
Precursor types | '[M+H]+', '[M-H]-', '[M+H-H2O]+', '[M+Na]+' |
Collision energy | any number |
Here is an example of predicted MS/MS plot.
The documents for running MS/MS prediction from source codes are at MSMS_PRED.md.
Train your own model
Step 0: Clone the Repository and Set Up the Environment
Clone the 3DMolMS repository and install the required packages using the following commands:
git clone https://github.com/JosieHong/3DMolMS.git
cd 3DMolMS
pip install .
Step 1: Obtain the Pretrained Model
Download the pretrained model (molnet_pre_etkdgv3.pt.zip
) from Google Drive or train the model yourself. For details on pretraining the model on the QM9 dataset, refer to PRETRAIN.md.
Step 2: Prepare the Datasets
Download and organize the datasets into the ./data/
directory. The current version uses four datasets:
- Agilent DPCL, provided by Agilent Technologies.
- NIST20, available under license for academic use.
- MoNA, publicly available.
- Waters QTOF, our own experimental dataset.
The data directory structure should look like this:
|- data
|- origin
|- Agilent_Combined.sdf
|- Agilent_Metlin.sdf
|- hr_msms_nist.SDF
|- MoNA-export-All_LC-MS-MS_QTOF.sdf
|- MoNA-export-All_LC-MS-MS_Orbitrap.sdf
|- waters_qtof.mgf
Step 3: Preprocess the Datasets
Run the following commands to preprocess the datasets. Specify the dataset with --dataset
and select the instrument type as qtof
. Use --maxmin_pick
to apply the MaxMin algorithm for selecting training molecules; otherwise, selection will be random. The dataset configurations are in ./src/molnetpack/config/preprocess_etkdgv3.yml
.
python ./src/scripts/preprocess.py --dataset agilent nist mona waters \
--instrument_type qtof \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--mgf_dir ./data/mgf_debug/
Step 4: Train the Model
Use the following commands to train the model. Configuration settings for the model and training process are located in ./src/molnetpack/config/molnet.yml
.
python ./src/scripts/train.py --train_data ./data/qtof_etkdgv3_train.pkl \
--test_data ./data/qtof_etkdgv3_test.pkl \
--model_config_path ./src/molnetpack/config/molnet.yml \
--data_config_path ./src/molnetpack/config/preprocess_etkdgv3.yml \
--checkpoint_path ./check_point/molnet_qtof_etkdgv3.pt \
--transfer --resume_path ./check_point/molnet_pre_etkdgv3.pt
Additional application
3DMolMS is also capable of predicting molecular properties and generating reference libraries for molecular identification. Examples of such applications include retention time prediction and collision cross-section prediction. For more details, refer to PROP_PRED.md and GEN_REFER_LIB.md respectively.
Citation
If you use 3DMolMS in your research, please cite our paper:
@article{hong20233dmolms,
title={3DMolMS: prediction of tandem mass spectra from 3D molecular conformations},
author={Hong, Yuhui and Li, Sujun and Welch, Christopher J and Tichy, Shane and Ye, Yuzhen and Tang, Haixu},
journal={Bioinformatics},
volume={39},
number={6},
pages={btad354},
year={2023},
publisher={Oxford University Press}
}
Thank you for considering 3DMolMS for your research needs!
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for molnetpack-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 309f8174bb37bd7be069b61dbe3f11eb55188c75a3a7caef315c75b7c9147445 |
|
MD5 | f65d1a1705a213854411f252464a1dee |
|
BLAKE2b-256 | a90b3c193d12fd2404693aeb7a3efe70c1f5e4ffbf05665215603eeb566e87a6 |