Skip to main content

Programmatic, injection-safe builder for DataFusion SQL.

Project description

datafusion-query-builder

A programmatic, injection-safe builder for DataFusion SQL — a typed Rust core with a pyo3-exposed Python API. It replaces hand-rolled f-string / template-literal SQL with a typed, composable surface, while still emitting SQL text (so the SQL stays visible in logs/traces, greppable, and cache-keyable).

Values are escaped by construction, so untrusted input (column values, user filters) is safe to embed. Strings, arrays, and numbers are quoted/encoded for you; an explicit raw(...) escape hatch is the only unescaped path.

Install

pip install datafusion-query-builder

Prebuilt abi3 wheels are published for CPython 3.9+ on macOS (Apple Silicon + Intel) and Linux (x86_64 + aarch64).

Quick start (Python)

from datafusion_query_builder import col, lit, param, raw, when, and_, table
from datafusion_query_builder import functions as f

q = (
    table("records")
    .filter((col("kind") == "span") & col("deployment_environment").is_in(["prod", "staging"]))
    .select(
        f.coalesce(col("service_name"), "(unknown)").alias("service"),
        f.approx_distinct(col("trace_id")).alias("request_count"),
        f.approx_percentile_cont(col("duration"), 0.95).alias("p95"),
    )
    .group_by(col("service_name"))
    .order_by(col("request_count").desc())
    .limit(200)
)
print(q.to_sql())

Bare Python scalars auto-promote to literals (so col("x") == "prod" works), and string/array literals are escaped — values are injection-safe by construction. Reach for raw(...) (a SQL fragment), f.call("name", ...) (any function), or param("name") (a ${name} placeholder) when you step outside the v1 grammar.

Architecture

façade types  ──lower.rs──▶  sqlparser::ast  ──Display──▶  SQL text
expr.rs / query.rs / functions.rs        (the only file that names sqlparser::ast)
  • expr.rs, query.rs, functions.rs — span-free, Default-friendly façade enums. Immutable / generative: every method returns a new value.
  • lower.rs — the single boundary to sqlparser::ast. A sqlparser version bump surfaces here and nowhere else.
  • render.rsto_sql() plus validate() (renders then re-parses to prove well-formedness).
  • python.rsExpr / Query wrappers, the f.* functions namespace, operator overloading with scalar→literal coercion. Gated behind the python feature.

sqlparser is pinned to the pydantic dollar-brace-0.62.0 fork via [patch.crates-io] — the same parser the DataFusion ecosystem uses, including the ${var} placeholder extension. The crate does not depend on DataFusion itself (only the test oracle does, optionally).

Develop & test

# Rust core (no Python toolchain needed):
cargo test --test core                       # rendering snapshots + regressions
cargo test --features datafusion-oracle --test properties   # property tests, see below
cargo clippy --all-targets --features datafusion-oracle -- -D warnings

# Coercion tests against a real embedded interpreter (needs PYO3_PYTHON -> a 3.9+ interpreter):
PYO3_PYTHON=$PWD/.venv/bin/python cargo test --lib --features test-embed

# Python extension:
uv venv && uvx maturin develop
python tests/test_python.py

How the tests are layered

The crate is correctness-critical (it generates SQL from user-controlled input), so the test surface is layered:

  • tests/properties.rs (proptest, behind datafusion-oracle) uses DataFusion itself as the oracle — it renders a query to SQL, then plans + executes it through real DataFusion and reads the value back. This upgrades "the SQL re-parses" to "the value survives a real engine round-trip", catching silent mis-encoding (escaping, float formatting, operator precedence). DataFusion is an optional, test-only dependency — never compiled into the lib or the wheel. tests/properties.proptest-regressions pins seeds that previously found bugs.
  • src/python.rs coercion_tests (behind test-embed) unit-test the Python-type → façade-literal boundary that can only be exercised with real Python objects (bool-before-int, big ints, non-finite floats, list/tuple). They compose with the property tests: coercion proves "Python value → correct value", the property tests prove "value → correct SQL → correct result".

When a property/coercion test finds a bug, fix it in the crate — that is the whole point of the library: one fix covers every caller.

Release

Wheels are built and published to PyPI by CI (.github/workflows/ci.yml) on a v* tag, using PyPI Trusted Publishing (OIDC) — no API tokens. To cut a release: bump version in Cargo.toml and pyproject.toml, tag vX.Y.Z, and push the tag.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datafusion_query_builder-0.1.2.tar.gz (57.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.8 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

datafusion_query_builder-0.1.2-cp39-abi3-macosx_11_0_arm64.whl (2.6 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file datafusion_query_builder-0.1.2.tar.gz.

File metadata

  • Download URL: datafusion_query_builder-0.1.2.tar.gz
  • Upload date:
  • Size: 57.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datafusion_query_builder-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5bf5bb2a48d97edb2b4899b29a612e4e8ec8713b67e641fd316f467b3295b5c7
MD5 83f96f4c017f2deb8d33a8908199b390
BLAKE2b-256 5b0651b3230bde7ad2cfeed02a8d481d28532a1fbde58710aa19c6164de9aafb

See more details on using hashes here.

Provenance

The following attestation bundles were made for datafusion_query_builder-0.1.2.tar.gz:

Publisher: ci.yml on pydantic/datafusion-query-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 104b1bd831e9a983463b9c089150397bb3f707cab4368acbf19e823623ac6aa1
MD5 cb64c1a92a7174f26b684840e2170e42
BLAKE2b-256 88acb2a41d46feb016724df9390a6e6ee83859647412203ea5bfce40636c3b0b

See more details on using hashes here.

Provenance

The following attestation bundles were made for datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: ci.yml on pydantic/datafusion-query-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 23da0bdda25734d2e254b369fb7a7226102b47182b5e2060ce92b9983cc3d400
MD5 b0cd04a2f8e1a202946e21e26f9ed893
BLAKE2b-256 9c22d2e3df5a2df239f40eecb3c7c866cdfbb644fec4afde3555cc5fba70549f

See more details on using hashes here.

Provenance

The following attestation bundles were made for datafusion_query_builder-0.1.2-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: ci.yml on pydantic/datafusion-query-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datafusion_query_builder-0.1.2-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for datafusion_query_builder-0.1.2-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 568cd8556016635fa78379325882d54de0f5db6432fd9ff0bf62c3b3e6043e86
MD5 794d43bcb19075e1569b60a5b4e9d79b
BLAKE2b-256 6702e8e0541584cb92adaf2ab4479d9814246fdb0f0f823cdf071be4c07ff5fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for datafusion_query_builder-0.1.2-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: ci.yml on pydantic/datafusion-query-builder

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page