Skip to main content

A collection of utilities for efficiently working with **large-scale** Parquet datasets.

Project description

parq-tools

Run Tests PyPI Coverage Python Versions License Publish Docs Open Issues Open PRs

Overview

parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets. A typical use case is asset-based workflows with large scientific datasets.

:::note If your datasets are not large, you might find the pandas library more convenient. :::

Features

  • Filtering → Efficiently filter large parquet files.
  • Concatenation → Combines multiple Parquet files efficiently along rows (axis=0) or columns (axis=1).
  • Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
  • Profiling Enhancements → Improves ydata-profiling by profiling specific columns incrementally, merging results for large files.
  • Block Model Generation → Create a parquet block model that exceeds the machine memory capacity, useful for testing pipelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_tools-0.3.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parq_tools-0.3.0-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file parq_tools-0.3.0.tar.gz.

File metadata

  • Download URL: parq_tools-0.3.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for parq_tools-0.3.0.tar.gz
Algorithm Hash digest
SHA256 8e28fa02ec7ce3a9caa89d49f20b7fe68f3827d4c5924cc3d30260c4c49fbc97
MD5 cd6d9394180747011104e879f2fa8966
BLAKE2b-256 a2c485bc28a867a770c28269ffc93059868bd4d56fc2c192763c064aea64cd11

See more details on using hashes here.

File details

Details for the file parq_tools-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: parq_tools-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for parq_tools-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 88c2419afb692256a5e0bad4b0678b9cb3f19e59f649db8d4b538e585d0d8d1f
MD5 9fea3bec5ad045a94c20bb5ecf1581e3
BLAKE2b-256 40ef7a17e17a2e039cfd9ecbf20112bb827a2177ee62212f8a08779b5d965eba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page