Skip to main content

No project description provided

Project description

MutAIverse

Facilitating the identification of DNA adducts from untargeted metabolomics mass spectrometry data along with predictive capabilities to determine potential source genotoxins responsible for the novel identified or pre-existing adduct formation.

The single strong dependency for this resource is RDKit, which can be installed in a local Conda environment.

Other dependencies

  1. matchms==0.13.0
  2. hnswlib==0.8.0
  3. gensim==4.3.3
  4. pandas==1.5.3
  5. numpy==1.23.0
  6. matplotlib==3.7.1
  7. tqdm==4.65.0
  8. seaborn==0.12.2
  9. rdkit==2023.3.1

Adduct Mapper module

MutAIverse provides two approaches for mapping query MS spectra against in silico MS MS spectral library of Experimentally validated adducts or Synthetic DNA adducts of MutAIverse.

MutAIverse Library setup

This is a one-time task that must be completed to use the Mapper module after installing the package for the first time

from MutAIverse import Mapper
Mapper.load_library()

The function fetches the library data (1.7G) to be used by the Mapper module in the future.

Brute force Approach

Cosine Similarity-based mapping

from MutAIverse import Mapper
Mapper.map('bonafide_adducts',sample_file_path='/path-to-mzML-file',MS_level=1,plot=True)

Additional arguments

Parameters:
- library (str): bonafide_adducts/MutAIversee
- sample_file_path (str): Path to the mzML file containing mass spectrometry data.
- ms level (int): 1 (MS spectrum) or 2 (MS/MS spectrum)
- plot (bool; default True): for visualizations
return
- Result CSV file with suffix _MutAIversee_results.csv or _bonafide_adducts_results.csv

Quick Search Approach

Approximate Nearest Neighbour-based mapping, which executes through 2 steps

  1. Generation of spectral embeddings from query MS spectra
  2. Mapping using the HNSW index of the spectral embeddings
from MutAIverse import Mapper
Mapper.fast_map(mzml_file_path)

Additional arguments

Parameters:
- mzml_file_path (str): Path to the mzML file containing mass spectrometry data.
- level (int; default 2): 1 (MS spectrum) or 2 (MS/MS spectrum)
- k (int; default 1): Number of nearest neighbors to search for.
- ef_query (int; default 300): Parameter controlling the number of elements to visit during a query.
- Energy (int; default 0): Collision energy

Returns:
- pandas.DataFrame: DataFrame containing search results with columns ['Query_Index', 'Nearest_Neighbor_Index', 'Cosine Similarity', 'SMILES', 'COMPID', 'Structures'].
- visualizations(density plot and histograms)

Adduct Linker module

MutAIverse is also capable of re-tracing a DNA adduct to its possible source Genotoxin.

Fragment-based linking

biotransformation backtracking based on abnormalities spliced from the base nucleotides

from MutAIverse import Linker
query_smiles = 'OC[C@H]1O[C@H](CC1O)n1c[n+](c2c1nc(N)[nH]c2=O)C1OC2C(C1O)c1c(O2)cc(c2c1oc(=O)c1c2CCC1=O)OC' 
Linker.backtrace(Adduct = query_smiles)

Additional arguments

Parameters:
- Adduct (str): Path to the mzML file containing mass spectrometry data.
- knn (int; default 20): Number of nearest neighbors to narrow down the search space. 
- tophit (int; default 5): Minimum number of Genotoxins to be linked.
- plot (bool; default False): Traced SMILES 2D structures in rows
- cutoff (int; default 80): Link Probability(%) cutoff

Returns:
- pandas.DataFrame: DataFrame containing search results with columns ['Query', 'Fragment', 'Metabolites', 'N-Transformation', 'Genotoxin', 'Probability'].
- visualizations(Traced smiles 2D structures in rows)

This module also has a sub-function dedicated only to visualize backtrace() output with a user-supplied probability threshold.

import pandas as pd
from MutAIverse import Linker 
Linker.plot_trace(file='/Path-to-Output_file.csv')

Additional arguments

Parameters:
- file (str): Output CSV file (with path) of Linker.backtrace() function
- cutoff (int; default 80): Minimum probability threshold 
Returns:
- visualizations(Traced SMILES 2D structures in rows)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutaiverse-0.2.4.tar.gz (39.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mutaiverse-0.2.4-py3-none-any.whl (41.2 MB view details)

Uploaded Python 3

File details

Details for the file mutaiverse-0.2.4.tar.gz.

File metadata

  • Download URL: mutaiverse-0.2.4.tar.gz
  • Upload date:
  • Size: 39.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for mutaiverse-0.2.4.tar.gz
Algorithm Hash digest
SHA256 60d0d2be387043de340aeb7910b2f4f5fdb152a271dd7c4631c601ed93a1d252
MD5 bd9f545a059f375c13adc8649f059fd2
BLAKE2b-256 9322b4dface0ada9f194198069584ad1e99d760fe01120593b5f56ccc040751e

See more details on using hashes here.

File details

Details for the file mutaiverse-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: mutaiverse-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 41.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for mutaiverse-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7aa7255abac85699b6f3723f776ed6bc614ac5d408dad9373cb41c3780a34613
MD5 a3c95ccddd7055c33ff644addd5bab1f
BLAKE2b-256 70ff924afd8d5f2bc6fd5016bfcf3d658d5d74156af1110b4e88639d118c413d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page