Skip to main content

Generic Parquet filtering tool (CLI + API)

Project description

pqfilt

Generic Parquet filtering tool (CLI and Python API).

ReadtheDocs Documentation.

Originally developed while dealing with large Parquet files in SPHEREx mission (GitHub).

pqfilt wraps pyarrow.dataset to let you filter Parquet files before they are fully read into memory, using row-group-level filtering.

Installation

pip install pqfilt
# or
uv add pqfilt

Python API

import pqfilt

# Simple filter
df = pqfilt.read("data.parquet", filters="vmag < 20")

# AND + OR with expression syntax
df = pqfilt.read("data.parquet", filters="(a < 30 & b > 50) | c == 1")

# Membership filter (explicit quotes preserve string types, e.g., to prevent Parquet type errors)
# Supported array formats: "val1, val2", "(val1, val2)", "[val1, val2]"
df = pqfilt.read("data.parquet", filters="desig in '1', '2', '3'")
df = pqfilt.read("data.parquet", filters="desig in ('1', '2', '3')")
df = pqfilt.read("data.parquet", filters="desig in ['1', '2', '3']")

# Tuple syntax (flat AND)
df = pqfilt.read("data.parquet", filters=[("a", "<", 30), ("b", ">", 50)])

# DNF syntax (OR of ANDs)
df = pqfilt.read("data.parquet", filters=[
    [("a", "<", 30)],
    [("b", ">", 50)],
])

# Column selection + output
df = pqfilt.read("data/*.parquet", columns=["a", "b"], output="out.parquet")

CLI

# Basic filter
pqfilt data/*.parquet -f "vmag < 20" -o filtered.parquet

# AND + OR expression
pqfilt data/*.parquet -f "(a < 30 & b > 50) | c == 1" -o filtered.parquet

# Multiple -f flags (AND-ed together)
pqfilt data/*.parquet -f "vmag < 20" -f "dec > 30" -o filtered.parquet

# Column selection
pqfilt data/*.parquet -f "vmag < 20" --columns vmag,ra,dec -o filtered.parquet

# Membership filter (enclosing brackets [] or () are automatically stripped)
pqfilt data/*.parquet -f "desig in [1, 2, 3]" -o filtered.parquet

Column names with special characters

Columns containing operator characters can be backtick-quoted:

pqfilt.read("data.parquet", filters="`alpha*360` > 100")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqfilt-0.1.6.tar.gz (77.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pqfilt-0.1.6-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file pqfilt-0.1.6.tar.gz.

File metadata

  • Download URL: pqfilt-0.1.6.tar.gz
  • Upload date:
  • Size: 77.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pqfilt-0.1.6.tar.gz
Algorithm Hash digest
SHA256 0cfe8b76a1f521a1138e06278ff9bc8e427098e913f751f76c21373333b6876d
MD5 b99e2d3886ed1c0ef2ddcf3a3e6e1393
BLAKE2b-256 a8e9bc4785d099c1d4bd4e902e627852aab706bffa0a8c5af61d5158063766c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for pqfilt-0.1.6.tar.gz:

Publisher: publish.yml on ysBach/pqfilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pqfilt-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: pqfilt-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pqfilt-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 833394a905bc692ecaeda4a1ec6f981fae5187a5f9fe943da069977b1df5e978
MD5 2d9f894ca6d2a02c51cc9e1e5a482578
BLAKE2b-256 f5b3de9dc444d7ce2da68323c1b5dd51362d3be8ab02d192238ff9c32aabacb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pqfilt-0.1.6-py3-none-any.whl:

Publisher: publish.yml on ysBach/pqfilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page