Skip to main content

A lightweight Python library for parsing mzML mass spectrometry files.

Project description

MZMLpy Logo

A lightweight Python library for parsing mzML mass spectrometry files. Implements a type-safe, lazy-loading API with direct support for modern mzML structures (>= 1.1.0).

Python package codecov PyPI version Python 3.12+ License: MIT

Installation

pip install mzmlpy

Optional extras:

pip install mzmlpy[numpress]   # MS-Numpress decoding
pip install mzmlpy[zstd]       # Zstandard compression
pip install mzmlpy[rapidgzip]  # Parallel gzip decompression (recommended for .gz files)

Quick Start

from mzmlpy import Mzml

with Mzml("path/to/file.mzML") as reader:
    print(f"File: {reader.file_name}  |  Spectra: {len(reader.spectra)}")

    for spectrum in reader.spectra:
        mz = spectrum.mz
        intensity = spectrum.intensity
        print(f"  {spectrum.id} MS{spectrum.ms_level}{len(mz)} peaks")

Both .mzML and .mzML.gz files are supported. Metadata is parsed eagerly; binary data is decoded on demand.

Reading Gzipped Files

When opening .mzML.gz files, the gzip_mode parameter controls how the file is accessed:

Mode Description
"extract" (default) Decompress to <tmpdir>/mzmlpy/ and cache across sessions. First open pays decompression cost; subsequent opens reuse the cache instantly. The OS clears tmp on reboot.
"indexed" Seekable access to the compressed file using rapidgzip. No decompression to disk. Requires pip install mzmlpy[rapidgzip].
"stream" Stream sequentially. Lowest startup cost but no efficient random access.

For most use cases, "extract" or "indexed" is recommended:

# Default — extracts to tmp, cached across sessions
with Mzml("data.mzML.gz") as reader:
    spec = reader.spectra[0]

# Indexed — no extraction, seekable access (requires rapidgzip)
with Mzml("data.mzML.gz", gzip_mode="indexed") as reader:
    spec = reader.spectra[0]

To reclaim disk space before the OS clears tmp on reboot:

from mzmlpy import clear_cache
clear_cache()

Performance

Benchmarked on a real-world DDA file (33,535 spectra, first-open cold start, with rapidgzip):

Mode Startup Iterate (500 spectra) Random access (5 reads)
plain .mzML 0.042s 0.087s 0.001s
in_memory=True 1.499s 0.362s 0.002s
gzip_mode="extract" 0.957s 0.083s 0.001s
gzip_mode="indexed" ¹ 6.850s 0.135s 0.074s
gzip_mode="stream" 0.089s 0.155s 22.8s

¹ "indexed" startup includes building the gzip seek index and mzML offset index on first open — both are cached alongside the file, so subsequent opens are fast.

"extract" pays a one-time decompression cost (~1s for a large file) then matches plain .mzML speed. "stream" is sequential-only — random access requires re-scanning from the start.

For full usage examples see the Getting Started guide and API Reference.

Development

just lint     # ruff check
just format   # ruff isort + format
just ty       # ty type checker
just test     # pytest

# or all at once:
just check

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mzmlpy-0.4.0.tar.gz (13.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mzmlpy-0.4.0-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file mzmlpy-0.4.0.tar.gz.

File metadata

  • Download URL: mzmlpy-0.4.0.tar.gz
  • Upload date:
  • Size: 13.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mzmlpy-0.4.0.tar.gz
Algorithm Hash digest
SHA256 12ce07fe4b5e43aa55c1c8fd2ee9fd320bbb3e1c4092e2762aa8982dffe6a9e4
MD5 f7986cca8ffa840de56383ec86d4f9f7
BLAKE2b-256 833dcb7760ca49c40ad031346567b9ce6ba6b9321108108b1c88aec430e58a70

See more details on using hashes here.

File details

Details for the file mzmlpy-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: mzmlpy-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mzmlpy-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e74629b05b0dc440665fd99cf3cd5e85b8b3904dd6a262bb62df75392d52e35a
MD5 7867f6f0017abd969fc4a5a6f0e38a6e
BLAKE2b-256 0187eeac1022c8ce72aa6db97a3440fcb1347754be25e243369eeb7e28dd5f7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page