Skip to main content

A package to convert mzML files to HDF5 for deep learning.

Project description

mzrt2h5

License: MIT

A Python package to convert mzML files to HDF5 format for deep learning applications. Version 0.1.4

Installation

pip install mzrt2h5

After installation, a new command mzrt2h5 will be available in your terminal.

CLI Usage

This is the most straightforward way to use the package. After installation, you can call the mzrt2h5 command from your terminal.

Batch Processing (Multiple Files)

Example:

mzrt2h5 process \
    /path/to/your/mzml_folder/ \
    /path/to/your/output.h5 \
    --metadata-csv-path /path/to/your/metadata.csv \
    --rt-precision 0.1 \
    --mz-precision 0.01

Single File Conversion

To convert a single mzML file without needing metadata:

Example:

mzrt2h5 process-single \
    /path/to/your/file.mzML \
    /path/to/your/output.h5 \
    --rt-precision 0.1 \
    --mz-precision 0.01

Options:

Use mzrt2h5 --help to see all available options.

Python Usage

Batch Processing (Multiple Files)

from mzrt2h5.processing import save_dataset_as_sparse_h5
from mzrt2h5.dataset import DynamicSparseH5Dataset
from mzrt2h5.visualization import plot_sample_image

# Process mzML files and save to HDF5
save_dataset_as_sparse_h5(
    folder="path/to/your/mzML_files",
    save_path="output.h5",
    rt_precision=0.1,
    mz_precision=0.01,
    metadata_csv_path="path/to/your/metadata.csv",
)

Single File Conversion

from mzrt2h5.processing import save_single_mzml_as_sparse_h5

# Process a single mzML file and save to HDF5
save_single_mzml_as_sparse_h5(
    mzml_file_path="path/to/your/file.mzML",
    save_path="output.h5",
    rt_precision=0.1,
    mz_precision=0.01,
)

Create a PyTorch dataset

dataset = DynamicSparseH5Dataset(
    h5_path="output.h5",
    target_rt_precision=0.5,
    target_mz_precision=0.05,
)

# Create a dataset with on-the-fly augmentation for training
# with a random retention time shift of +/- 30 seconds
# and a random m/z shift of +/- 5 ppm.
train_dataset = DynamicSparseH5Dataset(
    h5_path="output.h5",
    target_rt_precision=0.5,
    target_mz_precision=0.05,
    augment=True,
    aug_rt_shift_s=30,
    aug_mz_shift_ppm=5
)

# Plot a sample image from the HDF5 file
plot_sample_image(
    h5_path="output.h5",
    sample_id="Sample_A", # Or an integer index like 0
    target_rt_precision=0.5,
    target_mz_precision=0.05,
    output_path="sample_A_plot.png" # Saves to file, remove to display interactively
)

Visualization

To visualize a mass spectrometry image from your HDF5 file, use the mzrt2h5 plot command:

mzrt2h5 plot \
    /path/to/your/output.h5 \
    "Sample_A" \
    --rt-precision 0.5 \
    --mz-precision 0.05 \
    --output-path sample_A_plot.png

Options:

Use mzrt2h5 plot --help to see all available options for plotting.

Changelog

Version 0.1.5

  • Added support for 0-compound simulation in mzrtsim to enable matrix-only simulations, useful for generating blank matrix data.
  • Added support for mzrtsim for mzml simulation.

Version 0.1.4

  • Fixed path resolution issues in the web interface to ensure HDF5 files are properly located
  • Improved error handling in HDF5 file writing
  • Updated default precision values in the web interface (rt_precision: 1.0, mz_precision: 0.001)
  • Enhanced progress tracking and debugging in both CLI and web interface
  • Added better file extension handling for output filenames
  • Fixed version consistency across all package files

Web Interface

This package includes a web interface with real-time progress indicators for both single-file and batch processing.

  1. Run the Flask app:

    python app/app.py
    
  2. Access the web interface: Open your web browser and go to http://127.0.0.1:5002.

  3. Use the interface: The web interface has two modes:

    • Batch Processing: Upload a metadata file and multiple mzML files for processing
    • Single File: Upload a single mzML file without needing metadata

    Select the appropriate tab, set the parameters, and click the "Process" button.

  4. Monitor progress:

    • Real-time progress bar shows processing status from 0% to 100%
    • Detailed status messages indicate current processing stage
    • Progress updates automatically without page refresh
  5. Download results:

    • Download button appears automatically when processing completes
    • Click to download the generated HDF5 file
    • Temporary files are automatically cleaned up after download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mzrt2h5-0.1.5.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mzrt2h5-0.1.5-py3-none-any.whl (1.3 MB view details)

Uploaded Python 3

File details

Details for the file mzrt2h5-0.1.5.tar.gz.

File metadata

  • Download URL: mzrt2h5-0.1.5.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mzrt2h5-0.1.5.tar.gz
Algorithm Hash digest
SHA256 5279db813b57fd0ef84b21bdbe5407a65a0db5c9156e4496b0b9cdc01b13e05e
MD5 1fdb6a0cd8dfbf71c79e61a6ef014e4e
BLAKE2b-256 31236ac2ddff11e4c0a4cad03507ad0d349bee9ec1caea73e075eae92309bb63

See more details on using hashes here.

File details

Details for the file mzrt2h5-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: mzrt2h5-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 1.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mzrt2h5-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0d3f573ccbbd09a06539bb1e0e96bb81fd507a1708b5587c2566394160e8025e
MD5 3d7a766158eed3e96d4d262955ebad39
BLAKE2b-256 cdaab2d8df262c5a2106eb9f58a04eceb88f98a115ae49139424bb1c460f4385

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page