Skip to main content

GizmoSQL adapter for SQLFrame - PySpark-like DataFrame API for GizmoSQL

Project description

sqlframe-gizmosql

GizmoSQL adapter for SQLFrame - a PySpark-like DataFrame API for GizmoSQL.

sqlframe-gizmosql-ci Supported Python Versions PyPI version PyPI Downloads

Overview

This package provides a GizmoSQL backend for SQLFrame, allowing you to use PySpark-compatible DataFrame operations against a GizmoSQL server. GizmoSQL is a database server that uses DuckDB as its execution engine with an Arrow Flight SQL interface.

Installation

pip install sqlframe-gizmosql

Requirements

  • Python >= 3.10
  • GizmoSQL server running and accessible

Quick Start

First, start a GizmoSQL server (see Running GizmoSQL with Docker below), then:

from sqlframe_gizmosql import GizmoSQLSession

# Create a session connected to GizmoSQL
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Create a DataFrame from a SQL query
df = session.sql("SELECT 1 as id, 'hello' as message")

# Show the results
df.show()

# Use PySpark-like DataFrame API
df2 = session.createDataFrame([
    (1, "Alice", 30),
    (2, "Bob", 25),
    (3, "Charlie", 35),
], ["id", "name", "age"])

# Filter, select, and aggregate
result = df2.filter("age > 25").select("name", "age")
result.show()

# Group by and aggregate
df2.groupBy("age").count().show()

Configuration

The session can be configured using the builder pattern:

session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

Using PySpark Imports (activate mode)

You can use the activate() function to enable standard PySpark imports while running on GizmoSQL:

from sqlframe_gizmosql import activate

# Activate GizmoSQL as the backend
activate(
    uri="grpc+tls://localhost:31337",
    username="gizmosql_user",
    password="gizmosql_password",
    tls_skip_verify=True  # For self-signed certificates
)

# Now use standard PySpark imports!
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

# Create DataFrame and use PySpark-like functions
df = spark.createDataFrame([
    (1, "alice", 100),
    (2, "bob", 200),
    (3, "alice", 150),
], ["id", "name", "amount"])

# Use functions like F.upper, F.sum, F.col, etc.
result = df.select(
    F.col("id"),
    F.upper(F.col("name")).alias("name_upper"),
    F.col("amount")
)
result.show()

# Aggregations
df.groupBy("name").agg(
    F.sum("amount").alias("total"),
    F.count("*").alias("count")
).show()

You can also activate with an existing connection:

from sqlframe_gizmosql import activate, GizmoSQLSession

# Create session first
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Activate with existing connection
activate(conn=session._conn)

# Use PySpark imports
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Configuration Options

Option Description Default
gizmosql.uri GizmoSQL server URI (grpc://host:port or grpc+tls://host:port) grpc://localhost:31337
gizmosql.username Username for authentication None
gizmosql.password Password for authentication None
gizmosql.tls_skip_verify Skip TLS certificate verification (for self-signed certs) False
gizmosql.auth_type Authentication type (e.g., "external" for browser-based OAuth/SSO) None

OAuth/SSO Authentication

GizmoSQL supports browser-based OAuth/SSO via auth_type="external". When using external auth, no username or password is needed — a browser window will open for authentication:

from sqlframe_gizmosql import GizmoSQLSession

session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://gizmosql.example.com:31337") \
    .config("gizmosql.auth_type", "external") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

Or with activate mode:

from sqlframe_gizmosql import activate

activate(
    uri="grpc+tls://gizmosql.example.com:31337",
    auth_type="external",
    tls_skip_verify=True
)

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Features

  • Full PySpark DataFrame API compatibility via SQLFrame
  • Arrow Flight SQL protocol for high-performance data transfer
  • Support for reading/writing various file formats (Parquet, CSV, JSON)
  • Window functions
  • Aggregations and groupBy operations
  • Joins
  • UDF registration
  • Catalog operations

Running GizmoSQL with Docker

You can run GizmoSQL locally using Docker:

docker run -d \
    --name gizmosql \
    -p 31337:31337 \
    -e GIZMOSQL_USERNAME=gizmosql_user \
    -e GIZMOSQL_PASSWORD=gizmosql_password \
    -e DATABASE_FILENAME=/tmp/test.duckdb \
    -e TLS_ENABLED=1 \
    gizmodata/gizmosql:latest

For TLS connections, use grpc+tls:// in the URI and set gizmosql.tls_skip_verify to True for self-signed certificates.

Development

Setup

# Clone the repository
git clone https://github.com/gizmodata/sqlframe-gizmosql.git
cd sqlframe-gizmosql

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dev dependencies
pip install -e ".[dev]"

Running Tests

# Run unit tests
pytest tests/unit

# Run integration tests (requires GizmoSQL server)
pytest tests/integration

Code Quality

# Run linting
ruff check .

# Run formatting
ruff format .

License

Apache License 2.0

Related Projects

  • SQLFrame - PySpark-like DataFrame API for multiple SQL backends
  • GizmoSQL - Database server using DuckDB with Arrow Flight SQL interface
  • sqlmesh-gizmosql - GizmoSQL adapter for SQLMesh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlframe_gizmosql-1.2.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlframe_gizmosql-1.2.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file sqlframe_gizmosql-1.2.0.tar.gz.

File metadata

  • Download URL: sqlframe_gizmosql-1.2.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlframe_gizmosql-1.2.0.tar.gz
Algorithm Hash digest
SHA256 81649e72a68c2fd6d123b74aecb2ad72d5a66af4446630eaec70affd5ca25837
MD5 476bd62ba8e767776bb6b1d9331a23cb
BLAKE2b-256 1243aed2c30199acc2248fb0f06d56abed53cc471b21813275723f7331b97495

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-1.2.0.tar.gz:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlframe_gizmosql-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlframe_gizmosql-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5170a262526d203f6ba4d0f614217a1a4d64948743d0a57eb237309cb932d561
MD5 6cc67a3bec1329f725bc78951994da5c
BLAKE2b-256 e2b3bf4144e1845c0d4a34f7edbd349943517297475e161331b92dffd51cf85d

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-1.2.0-py3-none-any.whl:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page