DeepHalo: A deep learning-integrated workflow for high-throughput discovery of halogenated metabolites from HRMS data.
Project description
DeepHalo
A deep learning-integrated workflow for high-throughput discovery of halogenated metabolites from HRMS data.
Core Features
1. Halogen Prediction
- Element Prediction Model (EPM)
- Dual-branch Isotope Neural Network (IsoNN) architecture
- High accuracy Cl/Br detection (>99.9% precision based on benchmark results)
- Wide mass range coverage (50-2000 Da)
- Robust interference resistance to B/Se/Fe/dehydro isomers
2. Isotope Pattern Validation
- Dual Validation System
- Mass Dimension: Statistical rule-based correction.
- Intensity Dimension: Autoencoder-based Anomaly Detection Model (ADM).
3. Multi-Level Halogen Confidence Scoring (H-score)
- Dual levels
- Prediction based on centroid-level isotope patterns
- Prediction based on Scan-level isotope patterns
- H-score integration for comprehensive assessment on the above both levels
3. Enhanced Dereplication
- Dual-Strategy Approach
- MS1-Based Dereplication Using Custom Database Matching
- Exact mass analysis
- Halogen presence verification
- Isotope intensity similarity scoring
- MS2-Based Dereplication by Integrating GNPS
- MS2 molecular networking
- Halogenated compound annotation
- GraphML file enhancement
- MS1-Based Dereplication Using Custom Database Matching
Technical Advantages
-
High Throughput
- end-to-end automated analysis
- Batch processing of unlimited LC-MS/MS datasets
- Rapid processing (several to dozens of seconds per sample) on standard hardware (Core i9, 16GB RAM)
-
High Accuracy
-
98.3% precision in halogen detection across simulated and experimental LC-MS datasets.
- Comprehensively validation across both simulated and experimental LC-MS datasets
-
-
Comprehensive Integration
- Input: Supports
.mzMLformat - Output: Cytoscape-compatible network files
- Seamless integration with GNPS molecular networking
- Input: Supports
-
Enhanced Dereplication
- Embeds halogen prediction results into GNPS output GraphML files
- Significantly higher dereplicaton rate compared to molecular networking alone
Target Applications
- Natural product discovery
- Halogenated metabolite annotation
Key Differentiators
- Deep leaning-based halogen prediction resistance to Fe/dehydro isomers
- First Isotope Pattern Validation strategies specific for halogenated molecules
- hierarchical halogen scoring system (H-score)
- Comprehensive dereplication workflow
- Enhanced GNPS molecular networking
For methodology details and validation datasets, see Methods.
Where to get it?
The source code is hosted on GitHub at: https://github.com/xieyying/DeepHalo
Binary installers of DeepHalo are available at the Python Package Index (PyPI).
Dependencies
- pandas == 2.0.3
- numpy == 1.22.0
- molmass == 2023.8.30
- scikit-learn == 1.3.1
- tensorflow == 2.10.1
- keras == 2.10.0
- keras_tuner == 1.4.6
- matplotlib == 3.8.0
- pyopenms == 3.1.0
- scipy == 1.11.4
- tomli == 2.0.1
- tomli-w == 1.0.0
- importlib_resources == 6.4.0
- mzml2gnps == 1.0.3
- networkx == 3.4.2
- typer == 0.15.1
Installation
Note
Python 3.10 is required. Verify your Python version with:
python --version
Install from PyPI
pip install DeepHalo
Install from Local Wheel
pip install path/to/DeepHalo-xxx.whl
Install from Source
git clone https://github.com/xieyying/DeepHalo.git
cd DeepHalo
pip install -e .
Quickstart
High-throughput Detection of Halogenated Compounds
halo detect -i /path/to/mzml_files -o /path/to/output_directory -ms2
Dereplication
halo dereplicate -o /path/to/output_directory -g /path/to/GNPS_results -ud /path/to/custom_database.csv
Full Usage Guide
Get help
halo --help # Show all commands
halo detect --help # Detailed parameters for the subcommand 'detect'
halo dereplicate --help # Detailed parameters for the subcommand 'dereplicate'
Main Functions
- Analyze mzML file:
halo detect -i <input_path> -o <project_path> [-c <config_file>] [-b <blank_samples_dir>] [-ob] [-ms2]
- Dereplication:
halo dereplicate -o <project_path> [-g <GNPS_folder>] [-ud <user_database.csv>]
- Create training dataset:
halo create-ds <project_path> [-c <config_file>]
- Train model:
halo train <project_path> [-c <config_file>] [-m search]
If you need to modify configuration parameters, edit the config file (download it here) and override the default settings by specifying:
-c [user_config_file]
See documentation for more applications.
License
This code repository is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deephalo-1.0.0.tar.gz.
File metadata
- Download URL: deephalo-1.0.0.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebbfe9ce3e001631d2a723101791109bcb56542a9ee2d513f68dd1bcaea45d10
|
|
| MD5 |
2c9a353d50ff69fc385fb9c1ac45617b
|
|
| BLAKE2b-256 |
5c61252fde3d76f7fefd618e45456139b159329fc08de30420ceb0a0397ba2b4
|
File details
Details for the file deephalo-1.0.0-py3-none-any.whl.
File metadata
- Download URL: deephalo-1.0.0-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d25cafa50d3934a0adf125d6717f8505a2742c7fbbaf3b378ee487cf5c784b2
|
|
| MD5 |
a517f15c2a642af095cc342e9e4809ea
|
|
| BLAKE2b-256 |
c2a4c5187ace68cbb60a1f4500146fd4c1acd4a40bb0d380b1b749f673b57866
|