Skip to main content

CLI tool to benchmark compression algorithms on Parquet datasets

Project description

compressbench

Benchmark compression algorithms on Parquet datasets.

Why

Compression settings affect performance, storage cost, and latency — but most data engineers inherit defaults without testing.
compressbench lets you benchmark compression ratio, compression speed, and decompression speed across algorithms using your own Parquet files.

Features

  • Accepts local Parquet files as input.
  • Supports gzip and snappy.
  • Outputs:
    • Compression ratio.
    • Compression time.
    • Decompression time.
  • CLI built with Typer.
  • Unit tests with pytest.

Installation

pip install compressbench

Usage

compressbench input.parquet --algorithms gzip snappy If --algorithms is omitted, runs benchmarks for all available algorithms.

Example Output Algorithm: gzip Compression ratio: 2.91 Compression time: 0.43s Decompression time: 0.12s

Algorithm: snappy Compression ratio: 1.67 Compression time: 0.12s Decompression time: 0.05s

CLI Options

input.parquet Path to the Parquet file to benchmark. --algorithms List of algorithms to test (gzip, snappy). --level Not supported in v0.1.0. Reserved for future versions.

Roadmap

See ROADMAP.md for planned features.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compressbench-0.1.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

compressbench-0.1.0-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file compressbench-0.1.0.tar.gz.

File metadata

  • Download URL: compressbench-0.1.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for compressbench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 41204ae22615ca25d59b68c65074fefc632270bd0c849690f34335ec23e9905f
MD5 33730f3aec0a928c518ea4e9e9c7aaba
BLAKE2b-256 4555f6dc3b34a296c84f97f765aff73e3b1c20751555300b7d07a8521c1235ad

See more details on using hashes here.

File details

Details for the file compressbench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: compressbench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for compressbench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9bc5fb5baa4494d2c611f9032d653b44279eaec398d4dd7f3cfcbd6ebb009a3
MD5 8922f2678a1fc4227ddac0425f0e1f6e
BLAKE2b-256 234eef948c448b1103a800487c11d1e38c0e3485ff3c831b100f0c3a1d4c7d90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page