Skip to main content

Generic Parquet filtering tool (CLI + API)

Project description

pqfilt

Generic Parquet predicate-pushdown filter tool — CLI and Python API.

pqfilt wraps pyarrow.dataset to let you filter Parquet files before they are fully read into memory, using row-group-level predicate pushdown.

Installation

pip install pqfilt
# or
uv add pqfilt

Python API

import pqfilt

# Simple filter
df = pqfilt.read("data.parquet", filters="vmag < 20")

# AND + OR with expression syntax
df = pqfilt.read("data.parquet", filters="(a < 30 & b > 50) | c == 1")

# Tuple syntax (flat AND)
df = pqfilt.read("data.parquet", filters=[("a", "<", 30), ("b", ">", 50)])

# DNF syntax (OR of ANDs)
df = pqfilt.read("data.parquet", filters=[
    [("a", "<", 30)],
    [("b", ">", 50)],
])

# Column selection + output
df = pqfilt.read("data/*.parquet", columns=["a", "b"], output="out.parquet")

CLI

# Basic filter
pqfilt data/*.parquet -f "vmag < 20" -o filtered.parquet

# AND + OR expression
pqfilt data/*.parquet -f "(a < 30 & b > 50) | c == 1" -o filtered.parquet

# Multiple -f flags (AND-ed together)
pqfilt data/*.parquet -f "vmag < 20" -f "dec > 30" -o filtered.parquet

# Column selection
pqfilt data/*.parquet -f "vmag < 20" --columns vmag,ra,dec -o filtered.parquet

# Membership filter
pqfilt data/*.parquet -f "desig in 1,2,3" -o filtered.parquet

Column names with special characters

Columns containing operator characters can be backtick-quoted:

pqfilt.read("data.parquet", filters="`alpha*360` > 100")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqfilt-0.1.0.tar.gz (9.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pqfilt-0.1.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file pqfilt-0.1.0.tar.gz.

File metadata

  • Download URL: pqfilt-0.1.0.tar.gz
  • Upload date:
  • Size: 9.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pqfilt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8beb532392bd45495fbacf5260b71d1ff52d4e04544d6267b95913b1fa9378fa
MD5 49545a7f1e4c7ce0a171523cf2a56ab5
BLAKE2b-256 2090f96e620b06581617a7c38bb44ddaf6d10f296a3c4724fd039a7deb7ab40b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pqfilt-0.1.0.tar.gz:

Publisher: publish.yml on ysBach/pqfilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pqfilt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pqfilt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pqfilt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77920fe63f6db248971303af9497e4940848e40f76d3416a6503dbbcc275922c
MD5 1ee1cef252776f7cedab9300959eca0f
BLAKE2b-256 c14f0471e24a236dcb1be8f8092da13b7f34a9496ad8b0987001f63e1724a328

See more details on using hashes here.

Provenance

The following attestation bundles were made for pqfilt-0.1.0-py3-none-any.whl:

Publisher: publish.yml on ysBach/pqfilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page