Skip to main content

A collection of utilities for efficiently working with **large-scale** Parquet datasets.

Project description

parq-tools

Run Tests PyPI Coverage Python Versions License Publish Docs Open Issues Open PRs

Overview

parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets. A typical use case is asset-based workflows with large scientific datasets.

:::note If your datasets are not large, you might find the pandas library more convenient. :::

Features

  • Filtering → Efficiently filter large parquet files.
  • Concatenation → Combines multiple Parquet files efficiently along rows (axis=0) or columns (axis=1).
  • Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
  • Profiling Enhancements → Improves ydata-profiling by profiling specific columns incrementally, merging results for large files.
  • DataFrame Enhancements → Provides a LazyParquetDataFrame class that extends pandas.DataFrame with lazy loading from Parquet files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_tools-0.3.2.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parq_tools-0.3.2-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file parq_tools-0.3.2.tar.gz.

File metadata

  • Download URL: parq_tools-0.3.2.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for parq_tools-0.3.2.tar.gz
Algorithm Hash digest
SHA256 0ab718b3f5e9eec06dba17b02c1703736f5926e9c0afdc6461f5150aa6f4bc6a
MD5 5b6a53e45ead51c122aa09c49ba24a16
BLAKE2b-256 c2eec67c51695d404f85db93d043849d6ac8e7394b2255d5faf1adea423df50f

See more details on using hashes here.

File details

Details for the file parq_tools-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: parq_tools-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 35.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for parq_tools-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ab94c61715efed17cf0d573d865166a367f8dd6e0a77358a487310863f44a1a3
MD5 ce4c66403454429edf88801235ed1722
BLAKE2b-256 c5a9e70dd6e1a7a5c1d011b7c8313dd608b25cd2d79ed72c1880acf5179ba4bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page