Programmatic, injection-safe builder for DataFusion SQL.
Project description
datafusion-query-builder
A programmatic, injection-safe builder for DataFusion SQL — a typed Rust core with a pyo3-exposed Python API. It replaces hand-rolled f-string / template-literal SQL with a typed, composable surface, while still emitting SQL text (so the SQL stays visible in logs/traces, greppable, and cache-keyable).
Values are escaped by construction, so untrusted input (column values, user filters) is safe to
embed. Strings, arrays, and numbers are quoted/encoded for you; an explicit raw(...) escape hatch
is the only unescaped path.
Install
pip install datafusion-query-builder
Prebuilt abi3 wheels are published for CPython 3.9+ on macOS (Apple Silicon + Intel) and Linux
(x86_64 + aarch64).
Quick start (Python)
from datafusion_query_builder import col, lit, param, raw, when, and_, table
from datafusion_query_builder import functions as f
q = (
table("records")
.filter((col("kind") == "span") & col("deployment_environment").is_in(["prod", "staging"]))
.select(
f.coalesce(col("service_name"), "(unknown)").alias("service"),
f.approx_distinct(col("trace_id")).alias("request_count"),
f.approx_percentile_cont(col("duration"), 0.95).alias("p95"),
)
.group_by(col("service_name"))
.order_by(col("request_count").desc())
.limit(200)
)
print(q.to_sql())
Bare Python scalars auto-promote to literals (so col("x") == "prod" works), and string/array
literals are escaped — values are injection-safe by construction. Reach for raw(...) (a SQL
fragment), f.call("name", ...) (any function), or param("name") (a ${name} placeholder) when
you step outside the v1 grammar.
Architecture
façade types ──lower.rs──▶ sqlparser::ast ──Display──▶ SQL text
expr.rs / query.rs / functions.rs (the only file that names sqlparser::ast)
expr.rs,query.rs,functions.rs— span-free,Default-friendly façade enums. Immutable / generative: every method returns a new value.lower.rs— the single boundary tosqlparser::ast. Asqlparserversion bump surfaces here and nowhere else.render.rs—to_sql()plusvalidate()(renders then re-parses to prove well-formedness).python.rs—Expr/Querywrappers, thef.*functions namespace, operator overloading with scalar→literal coercion. Gated behind thepythonfeature.
sqlparser is pinned to the pydantic dollar-brace-0.62.0 fork via [patch.crates-io] — the same
parser the DataFusion ecosystem uses, including the ${var} placeholder extension. The crate does
not depend on DataFusion itself (only the test oracle does, optionally).
Develop & test
# Rust core (no Python toolchain needed):
cargo test --test core # rendering snapshots + regressions
cargo test --features datafusion-oracle --test properties # property tests, see below
cargo clippy --all-targets --features datafusion-oracle -- -D warnings
# Coercion tests against a real embedded interpreter (needs PYO3_PYTHON -> a 3.9+ interpreter):
PYO3_PYTHON=$PWD/.venv/bin/python cargo test --lib --features test-embed
# Python extension:
uv venv && uvx maturin develop
python tests/test_python.py
How the tests are layered
The crate is correctness-critical (it generates SQL from user-controlled input), so the test surface is layered:
tests/properties.rs(proptest, behinddatafusion-oracle) uses DataFusion itself as the oracle — it renders a query to SQL, then plans + executes it through real DataFusion and reads the value back. This upgrades "the SQL re-parses" to "the value survives a real engine round-trip", catching silent mis-encoding (escaping, float formatting, operator precedence). DataFusion is an optional, test-only dependency — never compiled into the lib or the wheel.tests/properties.proptest-regressionspins seeds that previously found bugs.src/python.rscoercion_tests(behindtest-embed) unit-test the Python-type → façade-literal boundary that can only be exercised with real Python objects (bool-before-int, big ints, non-finite floats,list/tuple). They compose with the property tests: coercion proves "Python value → correct value", the property tests prove "value → correct SQL → correct result".
When a property/coercion test finds a bug, fix it in the crate — that is the whole point of the library: one fix covers every caller.
Release
Wheels are built and published to PyPI by CI (.github/workflows/ci.yml) on a v* tag, using
PyPI Trusted Publishing (OIDC) — no API tokens. To cut a release: bump version in
Cargo.toml and pyproject.toml, tag vX.Y.Z, and push the tag.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datafusion_query_builder-0.1.2.tar.gz.
File metadata
- Download URL: datafusion_query_builder-0.1.2.tar.gz
- Upload date:
- Size: 57.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5bf5bb2a48d97edb2b4899b29a612e4e8ec8713b67e641fd316f467b3295b5c7
|
|
| MD5 |
83f96f4c017f2deb8d33a8908199b390
|
|
| BLAKE2b-256 |
5b0651b3230bde7ad2cfeed02a8d481d28532a1fbde58710aa19c6164de9aafb
|
Provenance
The following attestation bundles were made for datafusion_query_builder-0.1.2.tar.gz:
Publisher:
ci.yml on pydantic/datafusion-query-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datafusion_query_builder-0.1.2.tar.gz -
Subject digest:
5bf5bb2a48d97edb2b4899b29a612e4e8ec8713b67e641fd316f467b3295b5c7 - Sigstore transparency entry: 1994466318
- Sigstore integration time:
-
Permalink:
pydantic/datafusion-query-builder@f8d4f53242f4d4d013160053cc6cab2fcf73cd9a -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/pydantic
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@f8d4f53242f4d4d013160053cc6cab2fcf73cd9a -
Trigger Event:
push
-
Statement type:
File details
Details for the file datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 3.0 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
104b1bd831e9a983463b9c089150397bb3f707cab4368acbf19e823623ac6aa1
|
|
| MD5 |
cb64c1a92a7174f26b684840e2170e42
|
|
| BLAKE2b-256 |
88acb2a41d46feb016724df9390a6e6ee83859647412203ea5bfce40636c3b0b
|
Provenance
The following attestation bundles were made for datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
ci.yml on pydantic/datafusion-query-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
104b1bd831e9a983463b9c089150397bb3f707cab4368acbf19e823623ac6aa1 - Sigstore transparency entry: 1994466399
- Sigstore integration time:
-
Permalink:
pydantic/datafusion-query-builder@f8d4f53242f4d4d013160053cc6cab2fcf73cd9a -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/pydantic
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@f8d4f53242f4d4d013160053cc6cab2fcf73cd9a -
Trigger Event:
push
-
Statement type:
File details
Details for the file datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 2.8 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23da0bdda25734d2e254b369fb7a7226102b47182b5e2060ce92b9983cc3d400
|
|
| MD5 |
b0cd04a2f8e1a202946e21e26f9ed893
|
|
| BLAKE2b-256 |
9c22d2e3df5a2df239f40eecb3c7c866cdfbb644fec4afde3555cc5fba70549f
|
Provenance
The following attestation bundles were made for datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
ci.yml on pydantic/datafusion-query-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
23da0bdda25734d2e254b369fb7a7226102b47182b5e2060ce92b9983cc3d400 - Sigstore transparency entry: 1994466589
- Sigstore integration time:
-
Permalink:
pydantic/datafusion-query-builder@f8d4f53242f4d4d013160053cc6cab2fcf73cd9a -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/pydantic
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@f8d4f53242f4d4d013160053cc6cab2fcf73cd9a -
Trigger Event:
push
-
Statement type:
File details
Details for the file datafusion_query_builder-0.1.2-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: datafusion_query_builder-0.1.2-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.6 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
568cd8556016635fa78379325882d54de0f5db6432fd9ff0bf62c3b3e6043e86
|
|
| MD5 |
794d43bcb19075e1569b60a5b4e9d79b
|
|
| BLAKE2b-256 |
6702e8e0541584cb92adaf2ab4479d9814246fdb0f0f823cdf071be4c07ff5fb
|
Provenance
The following attestation bundles were made for datafusion_query_builder-0.1.2-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
ci.yml on pydantic/datafusion-query-builder
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datafusion_query_builder-0.1.2-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
568cd8556016635fa78379325882d54de0f5db6432fd9ff0bf62c3b3e6043e86 - Sigstore transparency entry: 1994466497
- Sigstore integration time:
-
Permalink:
pydantic/datafusion-query-builder@f8d4f53242f4d4d013160053cc6cab2fcf73cd9a -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/pydantic
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@f8d4f53242f4d4d013160053cc6cab2fcf73cd9a -
Trigger Event:
push
-
Statement type: