Skip to main content

Memory-efficient mzML to Parquet converter for mass spectrometry files

Project description

pyquetms

Memory-efficient mzML to Parquet converter for mass spectrometry files.

Overview

pyquetms provides streaming conversion of mzML files to Parquet format with minimal memory usage, making it suitable for processing large mass spectrometry datasets without running out of memory. This project was originally developed as a side project inspired by GSoC 25' with OpenMS, with the goal of providing a simple CLI for converting .mzML to .parquet files, which is especially important in big data projects (e.g., machine learning).

Installation

From PyPI

pip install pyquetms

From source

git clone https://github.com/Avni2000/pyquetms.git
cd pyquetms
pip install .

Development installation

git clone https://github.com/Avni2000/pyquetms.git
cd pyquetms
pip install -e ".[dev]"

Usage

CLI

Basic conversion:

pyquetms input.mzML

or

pyquetms ~/Downloads/input.mzML

Specify output file (defaults to working directory):

pyquetms input.mzML -o output.parquet

Customize batch size and compression. I recommend :

pyquetms input.mzML --batch-size 5000 --compression gzip

Get file information without converting:

pyquetms input.mzML --info

Output Format

The converted Parquet files contain the following columns:

Depending on the type of mzml file, we have slightly different columns. Some columns may be blank, which is perfectly okay! It doesn't mean your mzml is wrong. The main expected values are time, m/z, and intensity

Contributions

It's quite a small project, feel free to make a PR or open an issue!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyquetmsms-0.1.1.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyquetmsms-0.1.1-py3-none-any.whl (2.4 kB view details)

Uploaded Python 3

File details

Details for the file pyquetmsms-0.1.1.tar.gz.

File metadata

  • Download URL: pyquetmsms-0.1.1.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.6

File hashes

Hashes for pyquetmsms-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4812512f0605675a881834e968ce31dbbf445d8e634aadf1666bf448d3ecd389
MD5 8175613beb8446532dece9e681fb8c25
BLAKE2b-256 192546f14d75802d272366600c216b8bad60d6eaa6eba812dc7559adb29cdcd9

See more details on using hashes here.

File details

Details for the file pyquetmsms-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyquetmsms-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 2.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.6

File hashes

Hashes for pyquetmsms-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c5e39a1fd753ce8aecc3c847bf8d37a078a53fbb270424d4902cc8c6e6ae2666
MD5 a43752007b12b7e6908f23cbcc433265
BLAKE2b-256 c8295a3bccbb278f45108ef55d768db76b53514403a33df8c436819ee40b47ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page