Generic Parquet filtering tool (CLI + API)
Project description
pqfilt
Generic Parquet filtering tool (CLI and Python API).
Originally developed while dealing with large Parquet files in SPHEREx mission (GitHub).
pqfilt wraps pyarrow.dataset to let you filter Parquet files before they
are fully read into memory, using row-group-level filtering.
Installation
pip install pqfilt
# or
uv add pqfilt
Python API
import pqfilt
# Simple filter
df = pqfilt.read("data.parquet", filters="vmag < 20")
# AND + OR with expression syntax
df = pqfilt.read("data.parquet", filters="(a < 30 & b > 50) | c == 1")
# Membership filter (explicit quotes preserve string types, e.g., to prevent Parquet type errors)
df = pqfilt.read("data.parquet", filters="desig in '3200', '356', '134'")
# Tuple syntax (flat AND)
df = pqfilt.read("data.parquet", filters=[("a", "<", 30), ("b", ">", 50)])
# DNF syntax (OR of ANDs)
df = pqfilt.read("data.parquet", filters=[
[("a", "<", 30)],
[("b", ">", 50)],
])
# Column selection + output
df = pqfilt.read("data/*.parquet", columns=["a", "b"], output="out.parquet")
CLI
# Basic filter
pqfilt data/*.parquet -f "vmag < 20" -o filtered.parquet
# AND + OR expression
pqfilt data/*.parquet -f "(a < 30 & b > 50) | c == 1" -o filtered.parquet
# Multiple -f flags (AND-ed together)
pqfilt data/*.parquet -f "vmag < 20" -f "dec > 30" -o filtered.parquet
# Column selection
pqfilt data/*.parquet -f "vmag < 20" --columns vmag,ra,dec -o filtered.parquet
# Membership filter
pqfilt data/*.parquet -f "desig in 1,2,3" -o filtered.parquet
Column names with special characters
Columns containing operator characters can be backtick-quoted:
pqfilt.read("data.parquet", filters="`alpha*360` > 100")
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pqfilt-0.1.4.tar.gz.
File metadata
- Download URL: pqfilt-0.1.4.tar.gz
- Upload date:
- Size: 76.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44ff1093a659395c04f5bc697ecd13c751f01b4203d6208f2ae509fa78ed5e05
|
|
| MD5 |
38c56b2b71a7e5057fc05ffde34da5ab
|
|
| BLAKE2b-256 |
e311aa78785fc100e282dc8146efb262d13ec16f8e1975c01ea59dd7f06f8540
|
Provenance
The following attestation bundles were made for pqfilt-0.1.4.tar.gz:
Publisher:
publish.yml on ysBach/pqfilt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pqfilt-0.1.4.tar.gz -
Subject digest:
44ff1093a659395c04f5bc697ecd13c751f01b4203d6208f2ae509fa78ed5e05 - Sigstore transparency entry: 984683177
- Sigstore integration time:
-
Permalink:
ysBach/pqfilt@d9a0089839d9141697c6ef4c807544194b3de8af -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/ysBach
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d9a0089839d9141697c6ef4c807544194b3de8af -
Trigger Event:
release
-
Statement type:
File details
Details for the file pqfilt-0.1.4-py3-none-any.whl.
File metadata
- Download URL: pqfilt-0.1.4-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51667483b426d301d18c99f396103e2ef5a1e9ab534d833e23e44d30e5716b9d
|
|
| MD5 |
3ac81adc0654acb5e56f730b01e61204
|
|
| BLAKE2b-256 |
2d5d47e8152405131572b83c635b2c21b4f5d76bb9ff0ed09990eb7bbd61fb0f
|
Provenance
The following attestation bundles were made for pqfilt-0.1.4-py3-none-any.whl:
Publisher:
publish.yml on ysBach/pqfilt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pqfilt-0.1.4-py3-none-any.whl -
Subject digest:
51667483b426d301d18c99f396103e2ef5a1e9ab534d833e23e44d30e5716b9d - Sigstore transparency entry: 984683179
- Sigstore integration time:
-
Permalink:
ysBach/pqfilt@d9a0089839d9141697c6ef4c807544194b3de8af -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/ysBach
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d9a0089839d9141697c6ef4c807544194b3de8af -
Trigger Event:
release
-
Statement type: