Skip to main content

GizmoSQL adapter for SQLFrame - PySpark-like DataFrame API for GizmoSQL

Project description

sqlframe-gizmosql

GizmoSQL adapter for SQLFrame - a PySpark-like DataFrame API for GizmoSQL.

sqlframe-gizmosql-ci Supported Python Versions PyPI version PyPI Downloads

Overview

This package provides a GizmoSQL backend for SQLFrame, allowing you to use PySpark-compatible DataFrame operations against a GizmoSQL server. GizmoSQL is a database server that uses DuckDB as its execution engine with an Arrow Flight SQL interface.

Installation

pip install sqlframe-gizmosql

Requirements

  • Python >= 3.10
  • GizmoSQL server running and accessible

Quick Start

First, start a GizmoSQL server (see Running GizmoSQL with Docker below), then:

from sqlframe_gizmosql import GizmoSQLSession

# Create a session connected to GizmoSQL
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Create a DataFrame from a SQL query
df = session.sql("SELECT 1 as id, 'hello' as message")

# Show the results
df.show()

# Use PySpark-like DataFrame API
df2 = session.createDataFrame([
    (1, "Alice", 30),
    (2, "Bob", 25),
    (3, "Charlie", 35),
], ["id", "name", "age"])

# Filter, select, and aggregate
result = df2.filter("age > 25").select("name", "age")
result.show()

# Group by and aggregate
df2.groupBy("age").count().show()

Configuration

The session can be configured using the builder pattern:

session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

Using PySpark Imports (activate mode)

You can use the activate() function to enable standard PySpark imports while running on GizmoSQL:

from sqlframe_gizmosql import activate

# Activate GizmoSQL as the backend
activate(
    uri="grpc+tls://localhost:31337",
    username="gizmosql_user",
    password="gizmosql_password",
    tls_skip_verify=True  # For self-signed certificates
)

# Now use standard PySpark imports!
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

# Create DataFrame and use PySpark-like functions
df = spark.createDataFrame([
    (1, "alice", 100),
    (2, "bob", 200),
    (3, "alice", 150),
], ["id", "name", "amount"])

# Use functions like F.upper, F.sum, F.col, etc.
result = df.select(
    F.col("id"),
    F.upper(F.col("name")).alias("name_upper"),
    F.col("amount")
)
result.show()

# Aggregations
df.groupBy("name").agg(
    F.sum("amount").alias("total"),
    F.count("*").alias("count")
).show()

You can also activate with an existing connection:

from sqlframe_gizmosql import activate, GizmoSQLSession

# Create session first
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_user") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Activate with existing connection
activate(conn=session._conn)

# Use PySpark imports
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Configuration Options

Option Description Default
gizmosql.uri GizmoSQL server URI (grpc://host:port or grpc+tls://host:port) grpc://localhost:31337
gizmosql.username Username for authentication None
gizmosql.password Password for authentication None
gizmosql.tls_skip_verify Skip TLS certificate verification (for self-signed certs) False
gizmosql.auth_type Authentication type (e.g., "external" for browser-based OAuth/SSO) None

OAuth/SSO Authentication

GizmoSQL supports browser-based OAuth/SSO via auth_type="external". When using external auth, no username or password is needed — a browser window will open for authentication:

from sqlframe_gizmosql import GizmoSQLSession

session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://gizmosql.example.com:31337") \
    .config("gizmosql.auth_type", "external") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

Or with activate mode:

from sqlframe_gizmosql import activate

activate(
    uri="grpc+tls://gizmosql.example.com:31337",
    auth_type="external",
    tls_skip_verify=True
)

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Features

  • Full PySpark DataFrame API compatibility via SQLFrame
  • Arrow Flight SQL protocol for high-performance data transfer
  • Support for reading/writing various file formats (Parquet, CSV, JSON)
  • Window functions
  • Aggregations and groupBy operations
  • Joins
  • UDF registration
  • Catalog operations

Running GizmoSQL with Docker

You can run GizmoSQL locally using Docker:

docker run -d \
    --name gizmosql \
    -p 31337:31337 \
    -e GIZMOSQL_USERNAME=gizmosql_user \
    -e GIZMOSQL_PASSWORD=gizmosql_password \
    -e DATABASE_FILENAME=/tmp/test.duckdb \
    -e TLS_ENABLED=1 \
    gizmodata/gizmosql:latest

For TLS connections, use grpc+tls:// in the URI and set gizmosql.tls_skip_verify to True for self-signed certificates.

Development

Setup

# Clone the repository
git clone https://github.com/gizmodata/sqlframe-gizmosql.git
cd sqlframe-gizmosql

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dev dependencies
pip install -e ".[dev]"

Running Tests

# Run unit tests
pytest tests/unit

# Run integration tests (requires GizmoSQL server)
pytest tests/integration

Code Quality

# Run linting
ruff check .

# Run formatting
ruff format .

License

Apache License 2.0

Related Projects

  • SQLFrame - PySpark-like DataFrame API for multiple SQL backends
  • GizmoSQL - Database server using DuckDB with Arrow Flight SQL interface
  • sqlmesh-gizmosql - GizmoSQL adapter for SQLMesh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlframe_gizmosql-1.2.1.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlframe_gizmosql-1.2.1-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file sqlframe_gizmosql-1.2.1.tar.gz.

File metadata

  • Download URL: sqlframe_gizmosql-1.2.1.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sqlframe_gizmosql-1.2.1.tar.gz
Algorithm Hash digest
SHA256 1d38ce5cc32d27bb8453ccf8fe29a23165e0dffc1b8494ac436c387ae93da05d
MD5 975d21547d0f204af98ec21889de8f1a
BLAKE2b-256 20e188a6526d53d65c75f4fe1ec1a8a17dc2bfcc76d6bc45cde26fdd51603f56

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-1.2.1.tar.gz:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlframe_gizmosql-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlframe_gizmosql-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 86b34cd1e3ab020020669566ea4d0a1cf0a360d66a66d9ef3a67bbd8a85ea5f4
MD5 b6f1a571c19a877cd710408593c99e37
BLAKE2b-256 85f95cd6bbf905893a1ba8536f9a38b9af5bd048db46d6f9785409623268b408

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-1.2.1-py3-none-any.whl:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page