Skip to main content

No project description provided

Project description

parq-tools

Run Tests PyPI Coverage Python Versions License Publish Docs Open Issues Open PRs

Overview

parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets. A typical use case is asset-based workflows with large scientific datasets.

:::note If your datasets are not large, you might find the pandas library more convenient. :::

Features

  • Filtering → Efficiently filter large parquet files.
  • Concatenation → Combines multiple Parquet files efficiently along rows (axis=0) or columns (axis=1).
  • Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
  • Profiling Enhancements → Improves ydata-profiling by profiling specific columns incrementally, merging results for large files.
  • Block Model Generation → Create a parquet block model that exceeds the machine memory capacity, useful for testing pipelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_tools-0.2.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parq_tools-0.2.1-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file parq_tools-0.2.1.tar.gz.

File metadata

  • Download URL: parq_tools-0.2.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for parq_tools-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8492a0a14c324f9d96292b0a910629e4d74de7d208431e0fdac9a748452aeeb8
MD5 d407d1f33e2c68c565361ea058f1352f
BLAKE2b-256 ad6f7f9653970fc03df57b13c320a9995aaab543b64ac1957aab56fbe61fb0c5

See more details on using hashes here.

File details

Details for the file parq_tools-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: parq_tools-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for parq_tools-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 772d94bff5f93e641b5fb5d025c4d264d20184ac806b84a5859f544ea879153d
MD5 bea69a070fd1e86fc24d06196d92785e
BLAKE2b-256 8fef940fcf7c70c001d7c6726b18d61d839430f135cc2c8e4a75e02a11c85008

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page