Skip to main content

A collection of utilities for efficiently working with **large-scale** Parquet datasets.

Project description

parq-tools

Run Tests PyPI Coverage Python Versions License Publish Docs Open Issues Open PRs

Overview

parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets. A typical use case is asset-based workflows with large scientific datasets.

:::note If your datasets are not large, you might find the pandas library more convenient. :::

Features

  • Filtering → Efficiently filter large parquet files.
  • Concatenation → Combines multiple Parquet files efficiently along rows (axis=0) or columns (axis=1).
  • Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
  • Profiling Enhancements → Improves ydata-profiling by profiling specific columns incrementally, merging results for large files.
  • DataFrame Enhancements → Provides a LazyParquetDataFrame class that extends pandas.DataFrame with lazy loading from Parquet files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_tools-0.3.1.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parq_tools-0.3.1-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file parq_tools-0.3.1.tar.gz.

File metadata

  • Download URL: parq_tools-0.3.1.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for parq_tools-0.3.1.tar.gz
Algorithm Hash digest
SHA256 3d8bb11744f9b0dc32284f7549cb6f304583076c7b5050886acdc076b9ad4218
MD5 e76e6a7d6191d0136283f7e50770a9c8
BLAKE2b-256 dd8ea6f9f6dd8afdaade3c0d16865cf8e83da2e70e27164751f67eb677d449ab

See more details on using hashes here.

File details

Details for the file parq_tools-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: parq_tools-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for parq_tools-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7fb1b529c9a7244455f1d8c64e2d6aa9087332c280ebc15dd4a1a7dd585e3689
MD5 d02f89375b0fd40ac392bb58abf92a8e
BLAKE2b-256 402ea942724ce5c2faf4650e10fa226cc677ca07583d25e81272db5e35f8c3b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page