Skip to main content

Predict the actual molecules in LC-MS/MS data through an interpretation of the ions detected via combinatorial triangulation.

Project description

icon-small.png

MolNotator is a Python package that predicts the actual molecules present in LC-MS/MS data. The final data is represented in the form of actual molecular networks, representing the predicted molecules as nodes amidst the ions they generated. The aim of the method is to help users of LCMS to pinpoint the molecules of interest in their data and avoid the effort of sorting through ions to manually find their target compound.

Features

  • Predicts molecule nodes in spectral networks
  • Dereplicates with spectral and exact mass data
  • Adds retention time and adduct filters to dereplication
  • Supervised adduct search

Documentation

Note: This README provides instructions for setup and using basic functions of MolNotator. For more details, see the paper.

MolNotator works within a user-defined project folder with a specific file structure. An example is given in the examples folder:

working_directory
|   input.py
|
|___databases
|   |   211005_MIX_LDB.mgf
|   |   211018_COLOTUS_DB.tsv
|   
|___mzmine_out
|   |   200909_LDB_Thermo_NEG.csv
|   |   200909_LDB_Thermo_NEG.mgf
|   |   200912_LDB_Thermo_POS.csv
|   |   200912_LDB_Thermo_POS.mgf
|    
|___params
|   |   fragnotator_table.tsv
|   |   NEG_adduct_table_primary.tsv
|   |   NEG_adduct_table_secondary.tsv
|   |   POS_adduct_table_primary.tsv
|   |   POS_adduct_table_secondary.tsv
|   |   params.yaml
|   |   params_colotus.yaml
|   |   params_ldb_ions.yaml

The databases folder contains database files in MGF, TSV or CSV format. Two files are provided in the examples. The mzmine_out folder contains the input MGF (MS/MS spectra) and CSV (metadata) files for positive, negative or ideally, both ionization modes. The params folder contains all parameter files for annotation, dereplication and also the folder names to be used in the project:

  • The fragnotator table.
  • Primary negative mode adduct table.
  • Secondary negative mode adduct table.
  • Primary positive mode adduct table.
  • Secondary positive mode adduct table.
  • The main "params.yaml" file for global parameters.
  • The secondary params files, one for each dereplication process.

The fragnotator file is a simple two-column table containing the annotation and the corresponding mass difference:

loss mass
CH3 15.023475
H2O 18.010565
CO 27.994915

The adduct tables are all formed the same, the primary being used for triangulation and the secondary only being used to annotate the remaining ions once the neutral node is created.

Adduct_code Adduct Charge Adduct_mass Mol_multiplier Complexity Group
M1|m1H|pC4H11N [M-H+C4H11N]- -1 72.081324 1 3 H
M1|m1H|pHCOOH [M-H+HCOOH]- -1 44.997655 1 3 H
M1|m1H| [M-H]- -1 -1.007825 1 1 H
M1|p1Cl| [M+Cl]- -1 34.968853 1 2 Cl

The user can add or delete rows in these tables to fit the needs of the experiment, or transfer adduct from the primary to the secondary tables if computing becomes too long. When transferring adduct from the primary to the secondary tables, the less abundant ion species should be prioritized as removing common species such as [M+H]+ would highly impact the triangulation in a negative way. Multiple charge adduct processing is not implemented as of yet, we would suggest only using single charge ions.

The params yaml file contains all parameters to be used in the project. Each parameter has a short description as a comment and we would suggest using the default values to begin with. The other params files are dedicated to the dereplication with parameters specific to the dereplication to be carried out and the database file (in the databases folder) to be used.

Once all parameters are set, use the example MolNotator script provided to start the process. After most steps, CSV files are exported including a node table and an edge table. Networks can thus be visualized after each step using softwares like Cytoscape by importing the two tables. The final network with molecules, adducts, in-source fragments, with dereplication and a degree of cosine clustering can be opened after the Cosiner function. Simplified versions of the network (only neutrals and adducts or neutrals only) can be produced after the MolNet function.

Global networks containing all samples are produced at each step, but they can be divided to contain only the data for each specific sample. To do this, refer to the "export_samples" parameters in the params.yaml file.

Installation

Dependencies

Before installing MolNotator, make sure you have the following requirements installed:

  • pandas
  • NumPy
  • matchms <= 0.6.2
  • tqdm
  • PyYaml

These dependencies can be installed using the following command :

 pip install -U pandas numpy matchms==0.6.2 tqdm pyyaml

Via PyPI

We deploy the MolNotator package to PyPi. You can install MolNotator as a python module with:

 pip install MolNotator

Note: This is the recommended way !

From source

If you cannot use the PyPi bundle or want to install MolNotator from source, we suggest these steps:
Open a terminal and clone this repository using

 git clone https://github.com/ZzakB/MolNotator.git

Move to the root directory of your MolNotator repository and run the following command in it

 pip install .

Note: Be aware that you still have to install the above mentioned dependencies and link them correctly.

Usage/Examples

MolNotator depends on a python input file to be runned. The example here under can be used as a template :

import os 
import yaml
from MolNotator.Duplicate_filter import Duplicate_filter
from MolNotator.MGF_sample_slicer import mgf_slicer
from MolNotator.Fragnotator import Fragnotator
from MolNotator.Adnotator import Adnotator
from MolNotator.MGF_updater import MGF_updater
from MolNotator.Mode_merger import Moder_merger
from MolNotator.Dereplicator import Dereplicator
from MolNotator.Cosiner import Cosiner
from MolNotator.MolNet import MolNet

wd = './working_directory' # <---- change the path to your working directory
os.chdir(wd)

for files in os.listdir(os.getcwd()):
    if files not in ['databases','mzmine_out','params']:
        raise Exception('Potential output files already exist! They need to be removed or moved outside the working directory.')

with open("./params/params.yaml") as info:
    params = yaml.load(info, Loader=yaml.FullLoader)


# Duplicate filtering on MZmine's MGF and CSV files (NEG):
Duplicate_filter(params = params,
                 ion_mode = "NEG")

# Duplicate filtering on MZmine's MGF and CSV files (POS):
Duplicate_filter(params = params,
                 ion_mode = "POS")

# Slicing the negative mode MGF file
mgf_slicer(params = params,
           ion_mode = "NEG")

# Slicing the positive mode MGF file
mgf_slicer(params = params,
           ion_mode = "POS")

# Use fragnotator on the negative mode sliced MGF files
Fragnotator(params = params,
            ion_mode = "NEG")

# Use fragnotator on the positive mode sliced MGF files
Fragnotator(params = params,
            ion_mode = "POS")

# Use adnotator on the negative mode data
Adnotator(params = params,
          ion_mode = "NEG")

# Use adnotator on the positive mode data
Adnotator(params = params,
          ion_mode = "POS")

# Use Moder Merger to merge negative and positive mode data :
Moder_merger(params = params)

# Update the MGF files and the node tables with SIRIUS formulas and other annotations
MGF_updater(params = params)

# Dereplicate the data using the database specified in the YAML file
for db_params in params['db_params']:
    print("Dereplicating using the " + db_params + " file...")
    with open("./params/" + db_params) as info:
        db_params = yaml.load(info, Loader=yaml.FullLoader)    
    Dereplicator(params = params,
                 db_params = db_params)

# Compute cosine similarity between some nodes.
Cosiner(params = params)

# Produce molecular networks, neutral nodes only
MolNet(params = params)

Then MolNotator can be runned using the input file above mentioned with the following command :

python input_file.py

Note: The output is written to the working directory.

License

MolNotator is published under the MIT licence. For more information, please read the LICENSE file. Using MolNotator in your commercial or non-commercial project is generally possible when giving a proper reference to this project and the related paper.

Citation Information

If you are using MolNotator in your work, please cite :

(1) Olivier-Jimenez, D.; Bouchouireb, Z.; Ollivier, S.; Mocquard, J.; Allard, P.-M.; Bernadat, G.; Chollet-Krugler, M.; Rondeau, D.; Boustie, J.; van der Hooft, J. J. J.; Wolfender, J.-L. From Mass Spectral Features to Molecules in Molecular Networks: A Novel Workflow for Untargeted Metabolomics, 2021. https://doi.org/10.1101/2021.12.21.473622.

Contact / Maintainer

Do you have feature requests, found a bug or want to use MolNotator in your project ?
Please get in touch : damien.olivier.jimenez@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MolNotator-0.1.1.tar.gz (3.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MolNotator-0.1.1-py3-none-any.whl (69.2 kB view details)

Uploaded Python 3

File details

Details for the file MolNotator-0.1.1.tar.gz.

File metadata

  • Download URL: MolNotator-0.1.1.tar.gz
  • Upload date:
  • Size: 3.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8

File hashes

Hashes for MolNotator-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d103240e105d28c641927cae8a5a0cfd83989ed1a89f94a572930d7ec34c6749
MD5 5f5a4705825d73422d3e0d194059b1ea
BLAKE2b-256 46d3e5d077155dfebc65456f28cef429ddcb6901997d4bb7cdb97cd0296f3a11

See more details on using hashes here.

File details

Details for the file MolNotator-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: MolNotator-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 69.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.8

File hashes

Hashes for MolNotator-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cd8a9f87d04647f64161b423966eabfa2c9ef736d2e4c3a69b9257ce6995fe4f
MD5 a858e2d0ed0671410fcf688393cecf40
BLAKE2b-256 896b72295bcb7947904f2c6c02bb4910373bd8e993213cbc5db1008771817733

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page