Skip to main content

No project description provided

Project description

parq-tools

License PyPI Run Tests Publish Docs

Overview

parq-tools is a collection of utilities for efficiently working with large-scale Parquet datasets. Designed for scalability, it supports chunk-wise processing, metadata handling, and optimized workflows for datasets too large to fit into memory.

Features

  • Filtering → Efficiently filter large parquet files.
  • Concatenation → Combines multiple Parquet files efficiently along rows (axis=0) or columns (axis=1).
  • Tokenized Filtering → Converts pandas-style expressions into efficient PyArrow queries.
  • Block Model Generation → Creates massive Parquet datasets that exceed memory limits, useful for testing pipelines.
  • Profiling Enhancements → Improves ydata-profiling by profiling specific columns incrementally, merging results for large files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_tools-0.1.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parq_tools-0.1.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file parq_tools-0.1.0.tar.gz.

File metadata

  • Download URL: parq_tools-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for parq_tools-0.1.0.tar.gz
Algorithm Hash digest
SHA256 29d271a01da4af9af42dbc2998c8a87395dfc755170b7757e9e0b2d9e84a9b76
MD5 b8147b7fb26f3731ec85272aecc2c61a
BLAKE2b-256 8a926ccc34cf627dc0cc3e8558bbad961dc027643c593df26608639c836d8d64

See more details on using hashes here.

File details

Details for the file parq_tools-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: parq_tools-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for parq_tools-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0cb403437d194c6c0bde183154ffc24e2f64cb37d53834037854150bd5068a83
MD5 d4c03a0bf854fa332610b7b9fb71ab0f
BLAKE2b-256 2b40f650f156ce151f006600df91a16ba2b3cb526308fdb3b12c3152369eec4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page