Skip to main content

DeepHalo: A deep learning-integrated workflow for high-throughput discovery of halogenated metabolites from HRMS data.

Project description

DeepHalo

A deep learning-integrated workflow for high-throughput discovery of halogenated metabolites from HRMS data.

Core Features

1. Halogen Prediction

  • Element Prediction Model (EPM)
    • Dual-branch Isotope Neural Network (IsoNN) architecture
    • High accuracy Cl/Br detection (>99.9% precision based on benchmark results)
    • Wide mass range coverage (50-2000 Da)
    • Robust interference resistance to B/Se/Fe/dehydro isomers

2. Isotope Pattern Validation

  • Dual Validation System
    • Mass Dimension: Statistical rule-based correction.
    • Intensity Dimension: Autoencoder-based Anomaly Detection Model (ADM).

3. Multi-Level Halogen Confidence Scoring (H-score)

  • Dual levels
    • Prediction based on centroid-level isotope patterns
    • Prediction based on Scan-level isotope patterns
    • H-score integration for comprehensive assessment on the above both levels

3. Enhanced Dereplication

  • Dual-Strategy Approach
    • MS1-Based Dereplication Using Custom Database Matching
      • Exact mass analysis
      • Halogen presence verification
      • Isotope intensity similarity scoring
    • MS2-Based Dereplication by Integrating GNPS
      • MS2 molecular networking
      • Halogenated compound annotation
      • GraphML file enhancement

Technical Advantages

  • High Throughput

    • end-to-end automated analysis
    • Batch processing of unlimited LC-MS/MS datasets
    • Rapid processing (several to dozens of seconds per sample) on standard hardware (Core i9, 16GB RAM)
  • High Accuracy

    • 98.3% precision in halogen detection across simulated and experimental LC-MS datasets.

    • Comprehensively validation across both simulated and experimental LC-MS datasets
  • Comprehensive Integration

    • Input: Supports .mzML format
    • Output: Cytoscape-compatible network files
    • Seamless integration with GNPS molecular networking
  • Enhanced Dereplication

    • Embeds halogen prediction results into GNPS output GraphML files
    • Significantly higher dereplicaton rate compared to molecular networking alone

Target Applications

  • Natural product discovery
  • Halogenated metabolite annotation

Key Differentiators

  1. Deep leaning-based halogen prediction resistance to Fe/dehydro isomers
  2. First Isotope Pattern Validation strategies specific for halogenated molecules
  3. hierarchical halogen scoring system (H-score)
  4. Comprehensive dereplication workflow
  5. Enhanced GNPS molecular networking

For methodology details and validation datasets, see Methods.

Where to get it?

The source code is hosted on GitHub at: https://github.com/xieyying/DeepHalo

Binary installers of DeepHalo are available at the Python Package Index (PyPI).

Dependencies

  • pandas == 2.0.3
  • numpy == 1.22.0
  • molmass == 2023.8.30
  • scikit-learn == 1.3.1
  • tensorflow == 2.10.1
  • keras == 2.10.0
  • keras_tuner == 1.4.6
  • matplotlib == 3.8.0
  • pyopenms == 3.1.0
  • scipy == 1.11.4
  • tomli == 2.0.1
  • tomli-w == 1.0.0
  • importlib_resources == 6.4.0
  • mzml2gnps == 1.0.3
  • networkx == 3.4.2
  • typer == 0.15.1

Installation

Note
Python 3.10 is required. Verify your Python version with:

 python --version

Install from PyPI

pip install DeepHalo

Install from Local Wheel

pip install path/to/DeepHalo-xxx.whl

Install from Source

git clone https://github.com/xieyying/DeepHalo.git
cd DeepHalo
pip install -e .

Quickstart

High-throughput Detection of Halogenated Compounds

halo detect -i /path/to/mzml_files -o /path/to/output_directory -ms2

Dereplication

halo dereplicate -o /path/to/output_directory -g /path/to/GNPS_results -ud /path/to/custom_database.csv

Full Usage Guide

Get help

halo --help                 # Show all commands
halo detect --help    # Detailed parameters for the subcommand 'detect'
halo dereplicate --help  # Detailed parameters for the subcommand 'dereplicate'

Main Functions

  • Analyze mzML file:
    halo detect -i <input_path> -o <project_path> [-c <config_file>] [-b <blank_samples_dir>] [-ob] [-ms2]
    
  • Dereplication:
    halo dereplicate -o <project_path> [-g <GNPS_folder>] [-ud <user_database.csv>]
    
  • Create training dataset:
    halo create-ds <project_path> [-c <config_file>]
    
  • Train model:
    halo train <project_path> [-c <config_file>] [-m search]
    

If you need to modify configuration parameters, edit the config file (download it here) and override the default settings by specifying:

-c [user_config_file]

See documentation for more applications.

License

This code repository is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deephalo-1.0.0.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deephalo-1.0.0-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file deephalo-1.0.0.tar.gz.

File metadata

  • Download URL: deephalo-1.0.0.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for deephalo-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ebbfe9ce3e001631d2a723101791109bcb56542a9ee2d513f68dd1bcaea45d10
MD5 2c9a353d50ff69fc385fb9c1ac45617b
BLAKE2b-256 5c61252fde3d76f7fefd618e45456139b159329fc08de30420ceb0a0397ba2b4

See more details on using hashes here.

File details

Details for the file deephalo-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: deephalo-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for deephalo-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6d25cafa50d3934a0adf125d6717f8505a2742c7fbbaf3b378ee487cf5c784b2
MD5 a517f15c2a642af095cc342e9e4809ea
BLAKE2b-256 c2a4c5187ace68cbb60a1f4500146fd4c1acd4a40bb0d380b1b749f673b57866

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page