Skip to main content

GizmoSQL adapter for SQLFrame - PySpark-like DataFrame API for GizmoSQL

Project description

sqlframe-gizmosql

GizmoSQL adapter for SQLFrame - a PySpark-like DataFrame API for GizmoSQL.

sqlframe-gizmosql-ci Supported Python Versions PyPI version PyPI Downloads

Overview

This package provides a GizmoSQL backend for SQLFrame, allowing you to use PySpark-compatible DataFrame operations against a GizmoSQL server. GizmoSQL is a database server that uses DuckDB as its execution engine with an Arrow Flight SQL interface.

Installation

pip install sqlframe-gizmosql

Requirements

  • Python >= 3.10
  • GizmoSQL server running and accessible

Quick Start

First, start a GizmoSQL server (see Running GizmoSQL with Docker below), then:

from sqlframe_gizmosql import GizmoSQLSession

# Create a session connected to GizmoSQL
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Create a DataFrame from a SQL query
df = session.sql("SELECT 1 as id, 'hello' as message")

# Show the results
df.show()

# Use PySpark-like DataFrame API
df2 = session.createDataFrame([
    (1, "Alice", 30),
    (2, "Bob", 25),
    (3, "Charlie", 35),
], ["id", "name", "age"])

# Filter, select, and aggregate
result = df2.filter("age > 25").select("name", "age")
result.show()

# Group by and aggregate
df2.groupBy("age").count().show()

Configuration

The session can be configured using the builder pattern:

session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

Using PySpark Imports (activate mode)

You can use the activate() function to enable standard PySpark imports while running on GizmoSQL:

from sqlframe_gizmosql import activate

# Activate GizmoSQL as the backend
activate(
    uri="grpc+tls://localhost:31337",
    username="gizmosql_user",
    password="gizmosql_password",
    tls_skip_verify=True  # For self-signed certificates
)

# Now use standard PySpark imports!
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

# Create DataFrame and use PySpark-like functions
df = spark.createDataFrame([
    (1, "alice", 100),
    (2, "bob", 200),
    (3, "alice", 150),
], ["id", "name", "amount"])

# Use functions like F.upper, F.sum, F.col, etc.
result = df.select(
    F.col("id"),
    F.upper(F.col("name")).alias("name_upper"),
    F.col("amount")
)
result.show()

# Aggregations
df.groupBy("name").agg(
    F.sum("amount").alias("total"),
    F.count("*").alias("count")
).show()

You can also activate with an existing connection:

from sqlframe_gizmosql import activate, GizmoSQLSession

# Create session first
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Activate with existing connection
activate(conn=session._conn)

# Use PySpark imports
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Configuration Options

Option Description Default
gizmosql.uri GizmoSQL server URI (grpc://host:port or grpc+tls://host:port) grpc://localhost:31337
gizmosql.username Username for authentication None
gizmosql.password Password for authentication None
gizmosql.tls_skip_verify Skip TLS certificate verification (for self-signed certs) False

Features

  • Full PySpark DataFrame API compatibility via SQLFrame
  • Arrow Flight SQL protocol for high-performance data transfer
  • Support for reading/writing various file formats (Parquet, CSV, JSON)
  • Window functions
  • Aggregations and groupBy operations
  • Joins
  • UDF registration
  • Catalog operations

Running GizmoSQL with Docker

You can run GizmoSQL locally using Docker:

docker run -d \
    --name gizmosql \
    -p 31337:31337 \
    -e GIZMOSQL_USERNAME=gizmosql_user \
    -e GIZMOSQL_PASSWORD=gizmosql_password \
    -e DATABASE_FILENAME=/tmp/test.duckdb \
    -e TLS_ENABLED=1 \
    gizmodata/gizmosql:latest

For TLS connections, use grpc+tls:// in the URI and set gizmosql.tls_skip_verify to True for self-signed certificates.

Development

Setup

# Clone the repository
git clone https://github.com/gizmodata/sqlframe-gizmosql.git
cd sqlframe-gizmosql

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dev dependencies
pip install -e ".[dev]"

Running Tests

# Run unit tests
pytest tests/unit

# Run integration tests (requires GizmoSQL server)
pytest tests/integration

Code Quality

# Run linting
ruff check .

# Run formatting
ruff format .

License

Apache License 2.0

Related Projects

  • SQLFrame - PySpark-like DataFrame API for multiple SQL backends
  • GizmoSQL - Database server using DuckDB with Arrow Flight SQL interface
  • sqlmesh-gizmosql - GizmoSQL adapter for SQLMesh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlframe_gizmosql-1.0.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlframe_gizmosql-1.0.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file sqlframe_gizmosql-1.0.0.tar.gz.

File metadata

  • Download URL: sqlframe_gizmosql-1.0.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlframe_gizmosql-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8b4b8d306c526d3305abe4fd5a00a70b86eb8e02617cc99ed971aa1798755b14
MD5 92e4bcd20b240bf6421e268c4ba93d1a
BLAKE2b-256 1148ef765bc322c9939cec2aa071b7bbdbe6d6deb6fd3a6ab457607876d273af

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-1.0.0.tar.gz:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlframe_gizmosql-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlframe_gizmosql-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b08de9c7174ecee96e64df53e7a04948ac16bff86bcd0b2b1f5f8c18e9630b5
MD5 9e1c127ab9388ddac226fb3563113fa8
BLAKE2b-256 b16b9aa773c8732cacca272eb1016bf9bf6ca7c4dbc9594855932e6c36830333

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-1.0.0-py3-none-any.whl:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page