Skip to main content

GizmoSQL adapter for SQLFrame - PySpark-like DataFrame API for GizmoSQL

Project description

sqlframe-gizmosql

GizmoSQL adapter for SQLFrame - a PySpark-like DataFrame API for GizmoSQL.

sqlframe-gizmosql-ci Supported Python Versions PyPI version PyPI Downloads

Overview

This package provides a GizmoSQL backend for SQLFrame, allowing you to use PySpark-compatible DataFrame operations against a GizmoSQL server. GizmoSQL is a database server that uses DuckDB as its execution engine with an Arrow Flight SQL interface.

Installation

pip install sqlframe-gizmosql

Requirements

  • Python >= 3.10
  • GizmoSQL server running and accessible

Quick Start

First, start a GizmoSQL server (see Running GizmoSQL with Docker below), then:

from sqlframe_gizmosql import GizmoSQLSession

# Create a session connected to GizmoSQL
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Create a DataFrame from a SQL query
df = session.sql("SELECT 1 as id, 'hello' as message")

# Show the results
df.show()

# Use PySpark-like DataFrame API
df2 = session.createDataFrame([
    (1, "Alice", 30),
    (2, "Bob", 25),
    (3, "Charlie", 35),
], ["id", "name", "age"])

# Filter, select, and aggregate
result = df2.filter("age > 25").select("name", "age")
result.show()

# Group by and aggregate
df2.groupBy("age").count().show()

Configuration

The session can be configured using the builder pattern:

session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

Using PySpark Imports (activate mode)

You can use the activate() function to enable standard PySpark imports while running on GizmoSQL:

from sqlframe_gizmosql import activate

# Activate GizmoSQL as the backend
activate(
    uri="grpc+tls://localhost:31337",
    username="gizmosql_user",
    password="gizmosql_password",
    tls_skip_verify=True  # For self-signed certificates
)

# Now use standard PySpark imports!
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

# Create DataFrame and use PySpark-like functions
df = spark.createDataFrame([
    (1, "alice", 100),
    (2, "bob", 200),
    (3, "alice", 150),
], ["id", "name", "amount"])

# Use functions like F.upper, F.sum, F.col, etc.
result = df.select(
    F.col("id"),
    F.upper(F.col("name")).alias("name_upper"),
    F.col("amount")
)
result.show()

# Aggregations
df.groupBy("name").agg(
    F.sum("amount").alias("total"),
    F.count("*").alias("count")
).show()

You can also activate with an existing connection:

from sqlframe_gizmosql import activate, GizmoSQLSession

# Create session first
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Activate with existing connection
activate(conn=session._conn)

# Use PySpark imports
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Configuration Options

Option Description Default
gizmosql.uri GizmoSQL server URI (grpc://host:port or grpc+tls://host:port) grpc://localhost:31337
gizmosql.username Username for authentication None
gizmosql.password Password for authentication None
gizmosql.tls_skip_verify Skip TLS certificate verification (for self-signed certs) False
gizmosql.auth_type Authentication type (e.g., "external" for browser-based OAuth/SSO) None

OAuth/SSO Authentication

GizmoSQL supports browser-based OAuth/SSO via auth_type="external". When using external auth, no username or password is needed — a browser window will open for authentication:

from sqlframe_gizmosql import GizmoSQLSession

session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://gizmosql.example.com:31337") \
    .config("gizmosql.auth_type", "external") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

Or with activate mode:

from sqlframe_gizmosql import activate

activate(
    uri="grpc+tls://gizmosql.example.com:31337",
    auth_type="external",
    tls_skip_verify=True
)

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Features

  • Full PySpark DataFrame API compatibility via SQLFrame
  • Arrow Flight SQL protocol for high-performance data transfer
  • Support for reading/writing various file formats (Parquet, CSV, JSON)
  • Window functions
  • Aggregations and groupBy operations
  • Joins
  • UDF registration
  • Catalog operations

Running GizmoSQL with Docker

You can run GizmoSQL locally using Docker:

docker run -d \
    --name gizmosql \
    -p 31337:31337 \
    -e GIZMOSQL_USERNAME=gizmosql_user \
    -e GIZMOSQL_PASSWORD=gizmosql_password \
    -e DATABASE_FILENAME=/tmp/test.duckdb \
    -e TLS_ENABLED=1 \
    gizmodata/gizmosql:latest

For TLS connections, use grpc+tls:// in the URI and set gizmosql.tls_skip_verify to True for self-signed certificates.

Development

Setup

# Clone the repository
git clone https://github.com/gizmodata/sqlframe-gizmosql.git
cd sqlframe-gizmosql

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dev dependencies
pip install -e ".[dev]"

Running Tests

# Run unit tests
pytest tests/unit

# Run integration tests (requires GizmoSQL server)
pytest tests/integration

Code Quality

# Run linting
ruff check .

# Run formatting
ruff format .

License

Apache License 2.0

Related Projects

  • SQLFrame - PySpark-like DataFrame API for multiple SQL backends
  • GizmoSQL - Database server using DuckDB with Arrow Flight SQL interface
  • sqlmesh-gizmosql - GizmoSQL adapter for SQLMesh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlframe_gizmosql-1.1.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlframe_gizmosql-1.1.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file sqlframe_gizmosql-1.1.0.tar.gz.

File metadata

  • Download URL: sqlframe_gizmosql-1.1.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlframe_gizmosql-1.1.0.tar.gz
Algorithm Hash digest
SHA256 8f7b3333d39d55aa7b4a70e83e945323eb2ef89a02d9eb40da312bc1f9e6636a
MD5 1920c453285cfe402ac7859b19b78cca
BLAKE2b-256 a0acea063653acaf4549f28f9780b76f561b84fe95aacdb826de3abbda9f0ab6

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-1.1.0.tar.gz:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlframe_gizmosql-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlframe_gizmosql-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6e4d83cb9e2a356d9de000c4d98fc9cf79c9fd34009635aacd3a1862ddb1263
MD5 7cd2a6ceecea2befff5ae4697f6c6406
BLAKE2b-256 c24fb0d5d3380187d059f281e4bcb6653e8bbff5ea2f14fb22d421bb4b182db6

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-1.1.0-py3-none-any.whl:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page