A package to convert mzML files to HDF5 for deep learning.
Project description
mzrt2h5
A Python package to convert mzML files to HDF5 format for deep learning applications. Version 0.1.8
Installation
pip install mzrt2h5
After installation, a new command mzrt2h5 will be available in your terminal.
CLI Usage
This is the most straightforward way to use the package. After installation, you can call the mzrt2h5 command from your terminal.
Batch Processing (Multiple Files)
Example:
mzrt2h5 process \
/path/to/your/mzml_folder/ \
/path/to/your/output.h5 \
--metadata-csv-path /path/to/your/metadata.csv \
--rt-precision 0.1 \
--mz-precision 0.01
Single File Conversion
To convert a single mzML file without needing metadata:
Example:
mzrt2h5 process-single \
/path/to/your/file.mzML \
/path/to/your/output.h5 \
--rt-precision 0.1 \
--mz-precision 0.01
Options:
Use mzrt2h5 --help to see all available options.
Python Usage
Batch Processing (Multiple Files)
from mzrt2h5.processing import save_dataset_as_sparse_h5
from mzrt2h5.dataset import DynamicSparseH5Dataset
from mzrt2h5.visualization import plot_sample_image
# Process mzML files and save to HDF5
save_dataset_as_sparse_h5(
folder="path/to/your/mzML_files",
save_path="output.h5",
rt_precision=0.1,
mz_precision=0.01,
metadata_csv_path="path/to/your/metadata.csv",
)
Single File Conversion
from mzrt2h5.processing import save_single_mzml_as_sparse_h5
# Process a single mzML file and save to HDF5
save_single_mzml_as_sparse_h5(
mzml_file_path="path/to/your/file.mzML",
save_path="output.h5",
rt_precision=0.1,
mz_precision=0.01,
)
Create a PyTorch dataset
dataset = DynamicSparseH5Dataset(
h5_path="output.h5",
target_rt_precision=0.5,
target_mz_precision=0.05,
)
# Create a dataset with on-the-fly augmentation for training
# with a random retention time shift of +/- 30 seconds
# and a random m/z shift of +/- 5 ppm.
train_dataset = DynamicSparseH5Dataset(
h5_path="output.h5",
target_rt_precision=0.5,
target_mz_precision=0.05,
augment=True,
aug_rt_shift_s=30,
aug_mz_shift_ppm=5
)
# Plot a sample image from the HDF5 file
plot_sample_image(
h5_path="output.h5",
sample_id="Sample_A", # Or an integer index like 0
target_rt_precision=0.5,
target_mz_precision=0.05,
output_path="sample_A_plot.png" # Saves to file, remove to display interactively
)
Visualization
To visualize a mass spectrometry image from your HDF5 file, use the mzrt2h5 plot command:
mzrt2h5 plot \
/path/to/your/output.h5 \
"Sample_A" \
--rt-precision 0.5 \
--mz-precision 0.05 \
--output-path sample_A_plot.png
Options:
Use mzrt2h5 plot --help to see all available options for plotting.
Changelog
Version 0.1.8
- Added CNN end-to-end deep learning model (
MzrtCNN) for sample classification directly on sparse 2D mass spec images viamzrt2h5.modelandmzrt2h5.trainer. - Added RT alignment module (
align_h5) using base peak chromatogram (BPC) cross-correlation viamzrt2h5.alignment. - Enhanced
DynamicSparseH5Datasetby supportingtarget_covariatefor classification tasks.
Version 0.1.7
- Added simulated intensity column (
sim_ins) to CSV output of mzML simulation:- The new column shows the maximum simulated intensity (peak height) for each compound peak.
- Values match the theoretical maximum that peak detection algorithms should find.
- Supports both
simmzmlandsimmzml_backgroundsimulation functions. - Useful for validating peak finding algorithms and understanding simulation parameters.
Version 0.1.6
- Enhanced simulation capabilities in
generate_simulation_data:pwidth,snr, andrtimenow accept vectors to specify values per compound.baselineaccepts a vector to simulate baseline shifts over time.tailingindexallows specifying which compounds exhibit tailing.
- Fixed
DynamicSparseH5Datasetto correctly handle samples with no peaks (empty spectra), ensuring robust loading and label handling.
Version 0.1.5
- Added support for 0-compound simulation in
mzrtsimto enable matrix-only simulations, useful for generating blank matrix data. - Added support for
mzrtsimfor mzml simulation.
Version 0.1.4
- Fixed path resolution issues in the web interface to ensure HDF5 files are properly located
- Improved error handling in HDF5 file writing
- Updated default precision values in the web interface (rt_precision: 1.0, mz_precision: 0.001)
- Enhanced progress tracking and debugging in both CLI and web interface
- Added better file extension handling for output filenames
- Fixed version consistency across all package files
Web Interface
This package includes a web interface with real-time progress indicators for both single-file and batch processing.
-
Run the Flask app:
python app/app.py -
Access the web interface: Open your web browser and go to
http://127.0.0.1:5002. -
Use the interface: The web interface has two modes:
- Batch Processing: Upload a metadata file and multiple mzML files for processing
- Single File: Upload a single mzML file without needing metadata
Select the appropriate tab, set the parameters, and click the "Process" button.
-
Monitor progress:
- Real-time progress bar shows processing status from 0% to 100%
- Detailed status messages indicate current processing stage
- Progress updates automatically without page refresh
-
Download results:
- Download button appears automatically when processing completes
- Click to download the generated HDF5 file
- Temporary files are automatically cleaned up after download
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mzrt2h5-0.1.8.tar.gz.
File metadata
- Download URL: mzrt2h5-0.1.8.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4516969478a693a5ee2b8c5422ff91eb69aed3bea5c415e601abbf992be23e8e
|
|
| MD5 |
b37a96651fc9283ba9b5d453a9bcd05f
|
|
| BLAKE2b-256 |
950d7dc30657d34977a07d26a9ac837a7f4af46cccb499dc634bbb858ec1f4c2
|
File details
Details for the file mzrt2h5-0.1.8-py3-none-any.whl.
File metadata
- Download URL: mzrt2h5-0.1.8-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fdebc085a01a8558a0be26d1e301045bd502944181fe4842f8059c01e0631d2
|
|
| MD5 |
55a398755e66957ebb14e8ad593d4b4c
|
|
| BLAKE2b-256 |
34f912126436577420c677771f609f08a41396e52a58064525752dfa4bb95c4a
|