Skip to main content

Generic Parquet filtering tool (CLI + API)

Project description

pqfilt

Generic Parquet filtering tool (CLI and Python API).

ReadtheDocs Documentation.

Originally developed while dealing with large Parquet files in SPHEREx mission (GitHub).

pqfilt wraps pyarrow.dataset to let you filter Parquet files before they are fully read into memory, using row-group-level filtering.

Installation

pip install pqfilt
# or
uv add pqfilt

Python API

import pqfilt

# Simple filter
df = pqfilt.read("data.parquet", filters="vmag < 20")

# AND + OR with expression syntax
df = pqfilt.read("data.parquet", filters="(a < 30 & b > 50) | c == 1")

# Membership filter (explicit quotes preserve string types, e.g., to prevent Parquet type errors)
df = pqfilt.read("data.parquet", filters="desig in '3200', '356', '134'")

# Tuple syntax (flat AND)
df = pqfilt.read("data.parquet", filters=[("a", "<", 30), ("b", ">", 50)])

# DNF syntax (OR of ANDs)
df = pqfilt.read("data.parquet", filters=[
    [("a", "<", 30)],
    [("b", ">", 50)],
])

# Column selection + output
df = pqfilt.read("data/*.parquet", columns=["a", "b"], output="out.parquet")

CLI

# Basic filter
pqfilt data/*.parquet -f "vmag < 20" -o filtered.parquet

# AND + OR expression
pqfilt data/*.parquet -f "(a < 30 & b > 50) | c == 1" -o filtered.parquet

# Multiple -f flags (AND-ed together)
pqfilt data/*.parquet -f "vmag < 20" -f "dec > 30" -o filtered.parquet

# Column selection
pqfilt data/*.parquet -f "vmag < 20" --columns vmag,ra,dec -o filtered.parquet

# Membership filter
pqfilt data/*.parquet -f "desig in 1,2,3" -o filtered.parquet

Column names with special characters

Columns containing operator characters can be backtick-quoted:

pqfilt.read("data.parquet", filters="`alpha*360` > 100")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqfilt-0.1.4.tar.gz (76.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pqfilt-0.1.4-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file pqfilt-0.1.4.tar.gz.

File metadata

  • Download URL: pqfilt-0.1.4.tar.gz
  • Upload date:
  • Size: 76.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pqfilt-0.1.4.tar.gz
Algorithm Hash digest
SHA256 44ff1093a659395c04f5bc697ecd13c751f01b4203d6208f2ae509fa78ed5e05
MD5 38c56b2b71a7e5057fc05ffde34da5ab
BLAKE2b-256 e311aa78785fc100e282dc8146efb262d13ec16f8e1975c01ea59dd7f06f8540

See more details on using hashes here.

Provenance

The following attestation bundles were made for pqfilt-0.1.4.tar.gz:

Publisher: publish.yml on ysBach/pqfilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pqfilt-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pqfilt-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pqfilt-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 51667483b426d301d18c99f396103e2ef5a1e9ab534d833e23e44d30e5716b9d
MD5 3ac81adc0654acb5e56f730b01e61204
BLAKE2b-256 2d5d47e8152405131572b83c635b2c21b4f5d76bb9ff0ed09990eb7bbd61fb0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pqfilt-0.1.4-py3-none-any.whl:

Publisher: publish.yml on ysBach/pqfilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page