Skip to main content

Memory-efficient mzML to Parquet converter for mass spectrometry files

Project description

Pyquet

Memory-efficient mzML to Parquet converter for mass spectrometry files.

Overview

Pyquet provides streaming conversion of mzML files to Parquet format with minimal memory usage, making it suitable for processing large mass spectrometry datasets without running out of memory. This project was originally developed as a side project inspired by GSoC 25' with OpenMS, with the goal of providing a simple CLI for converting .mzML to .parquet files, which is especially important in big data projects (e.g., machine learning).

Installation

From PyPI

pip install pyquet

From source

git clone https://github.com/Avni2000/pyquet.git
cd pyquet
pip install .

Development installation

git clone https://github.com/Avni2000/pyquet.git
cd pyquet
pip install -e ".[dev]"

Usage

CLI

Basic conversion:

pyquet input.mzML

or

pyquet ~/Downloads/input.mzML

Specify output file (defaults to working directory):

pyquet input.mzML -o output.parquet

Customize batch size and compression. I recommend :

pyquet input.mzML --batch-size 5000 --compression gzip

Get file information without converting:

pyquet input.mzML --info

Output Format

The converted Parquet files contain the following columns:

Depending on the type of mzml file, we have slightly different columns. Some columns may be blank, which is perfectly okay! It doesn't mean your mzml is wrong. The main expected values are time, m/z, and intensity

Contributions

It's quite a small project, feel free to make a PR or open an issue!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyquetms-0.1.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyquetms-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file pyquetms-0.1.0.tar.gz.

File metadata

  • Download URL: pyquetms-0.1.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.6

File hashes

Hashes for pyquetms-0.1.0.tar.gz
Algorithm Hash digest
SHA256 70785b57d68113a679385d5cb9995e89db488eeb6bb8b3762735ae2446e46342
MD5 55552a580d4524b56cff8a5d9e43253c
BLAKE2b-256 a69b4af44e77cb48bf0e0f7646eeaf0f2f9d2067d297e9c191180875140fec70

See more details on using hashes here.

File details

Details for the file pyquetms-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyquetms-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.6

File hashes

Hashes for pyquetms-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ee88a379cfab2861a9577c040b8929c335fe9737827b8c37d51b9820ae157f5
MD5 367474e61ff2a191d3f75c09348cf0c7
BLAKE2b-256 e003806c4b5c34dcce6d18888af564dc55163d98b474134e8a1235172fdf742a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page