Skip to main content

No project description provided

Project description

parq-tools

Run Tests PyPI Coverage Python Versions License Publish Docs Open Issues Open PRs

Overview

parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets. A typical use case is asset-based workflows with large scientific datasets.

:::note If your datasets are not large, you might find the pandas library more convenient. :::

Features

  • Filtering → Efficiently filter large parquet files.
  • Concatenation → Combines multiple Parquet files efficiently along rows (axis=0) or columns (axis=1).
  • Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
  • Profiling Enhancements → Improves ydata-profiling by profiling specific columns incrementally, merging results for large files.
  • Block Model Generation → Create a parquet block model that exceeds the machine memory capacity, useful for testing pipelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_tools-0.2.2.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parq_tools-0.2.2-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file parq_tools-0.2.2.tar.gz.

File metadata

  • Download URL: parq_tools-0.2.2.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for parq_tools-0.2.2.tar.gz
Algorithm Hash digest
SHA256 9b54a94ec93fd0391a57306b5a02106cab715581cdb99330849437d67d42f349
MD5 8550320e3486f1682dd24d3f0b5dc749
BLAKE2b-256 0941b4a4461d07019b55f74c3abb59292d669f031de68b0714b9f178af4607a2

See more details on using hashes here.

File details

Details for the file parq_tools-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: parq_tools-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for parq_tools-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 29dda95e1953dba293a221228563d66fdc328b16b2bb1ac1341edf0adef098c9
MD5 abfed508874cebf58ccf7ea6e1a162e5
BLAKE2b-256 2a446349279f42d5156ff9d280c2de44c7b8d053bcdd21dac7b83433a8ca1499

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page