Skip to main content

Generic Parquet filtering tool (CLI + API)

Project description

pqfilt

Generic Parquet filtering tool (CLI and Python API).

ReadtheDocs Documentation.

Originally developed while dealing with large Parquet files in SPHEREx mission (GitHub).

pqfilt wraps pyarrow.dataset to let you filter Parquet files before they are fully read into memory, using row-group-level filtering.

Installation

pip install pqfilt
# or
uv add pqfilt

Python API

import pqfilt

# Simple filter
df = pqfilt.read("data.parquet", filters="vmag < 20")

# AND + OR with expression syntax
df = pqfilt.read("data.parquet", filters="(a < 30 & b > 50) | c == 1")

# Tuple syntax (flat AND)
df = pqfilt.read("data.parquet", filters=[("a", "<", 30), ("b", ">", 50)])

# DNF syntax (OR of ANDs)
df = pqfilt.read("data.parquet", filters=[
    [("a", "<", 30)],
    [("b", ">", 50)],
])

# Column selection + output
df = pqfilt.read("data/*.parquet", columns=["a", "b"], output="out.parquet")

CLI

# Basic filter
pqfilt data/*.parquet -f "vmag < 20" -o filtered.parquet

# AND + OR expression
pqfilt data/*.parquet -f "(a < 30 & b > 50) | c == 1" -o filtered.parquet

# Multiple -f flags (AND-ed together)
pqfilt data/*.parquet -f "vmag < 20" -f "dec > 30" -o filtered.parquet

# Column selection
pqfilt data/*.parquet -f "vmag < 20" --columns vmag,ra,dec -o filtered.parquet

# Membership filter
pqfilt data/*.parquet -f "desig in 1,2,3" -o filtered.parquet

Column names with special characters

Columns containing operator characters can be backtick-quoted:

pqfilt.read("data.parquet", filters="`alpha*360` > 100")

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pqfilt-0.1.2.tar.gz (9.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pqfilt-0.1.2-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file pqfilt-0.1.2.tar.gz.

File metadata

  • Download URL: pqfilt-0.1.2.tar.gz
  • Upload date:
  • Size: 9.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pqfilt-0.1.2.tar.gz
Algorithm Hash digest
SHA256 27a4631c3cc0eeb49a464cf72e6358be120bd844b195b7dcfa45b7d684623f6e
MD5 24dc6902b2b141b15a8cae0189025bf1
BLAKE2b-256 b41565b17cd5f8f5c466b3d61a24e379a7b02613ebb136fd25dd7b192b4db74a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pqfilt-0.1.2.tar.gz:

Publisher: publish.yml on ysBach/pqfilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pqfilt-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pqfilt-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pqfilt-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 11e0d1db2bfb760600bfb0d1a3f52466724acd0e4749f7f4dda865cd3477ecc4
MD5 92e2a7f1ac1c738d3d0cb277428411b9
BLAKE2b-256 6cff905628cf64429414871d1a43f591e0d2c459620bc59b2b016e1e37de2768

See more details on using hashes here.

Provenance

The following attestation bundles were made for pqfilt-0.1.2-py3-none-any.whl:

Publisher: publish.yml on ysBach/pqfilt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page