GizmoSQL adapter for SQLFrame - PySpark-like DataFrame API for GizmoSQL
Project description
sqlframe-gizmosql
GizmoSQL adapter for SQLFrame - a PySpark-like DataFrame API for GizmoSQL.
Overview
This package provides a GizmoSQL backend for SQLFrame, allowing you to use PySpark-compatible DataFrame operations against a GizmoSQL server. GizmoSQL is a database server that uses DuckDB as its execution engine with an Arrow Flight SQL interface.
Installation
pip install sqlframe-gizmosql
Requirements
- Python >= 3.10
- GizmoSQL server running and accessible
Quick Start
First, start a GizmoSQL server (see Running GizmoSQL with Docker below), then:
from sqlframe_gizmosql import GizmoSQLSession
# Create a session connected to GizmoSQL
session = GizmoSQLSession.builder \
.config("gizmosql.uri", "grpc+tls://localhost:31337") \
.config("gizmosql.username", "gizmosql_user") \
.config("gizmosql.password", "gizmosql_password") \
.config("gizmosql.tls_skip_verify", True) \
.getOrCreate()
# Create a DataFrame from a SQL query
df = session.sql("SELECT 1 as id, 'hello' as message")
# Show the results
df.show()
# Use PySpark-like DataFrame API
df2 = session.createDataFrame([
(1, "Alice", 30),
(2, "Bob", 25),
(3, "Charlie", 35),
], ["id", "name", "age"])
# Filter, select, and aggregate
result = df2.filter("age > 25").select("name", "age")
result.show()
# Group by and aggregate
df2.groupBy("age").count().show()
Configuration
The session can be configured using the builder pattern:
session = GizmoSQLSession.builder \
.config("gizmosql.uri", "grpc+tls://localhost:31337") \
.config("gizmosql.username", "gizmosql_user") \
.config("gizmosql.password", "gizmosql_password") \
.config("gizmosql.tls_skip_verify", True) \
.getOrCreate()
Using PySpark Imports (activate mode)
You can use the activate() function to enable standard PySpark imports while running on GizmoSQL:
from sqlframe_gizmosql import activate
# Activate GizmoSQL as the backend
activate(
uri="grpc+tls://localhost:31337",
username="gizmosql_user",
password="gizmosql_password",
tls_skip_verify=True # For self-signed certificates
)
# Now use standard PySpark imports!
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
spark = SparkSession.builder.getOrCreate()
# Create DataFrame and use PySpark-like functions
df = spark.createDataFrame([
(1, "alice", 100),
(2, "bob", 200),
(3, "alice", 150),
], ["id", "name", "amount"])
# Use functions like F.upper, F.sum, F.col, etc.
result = df.select(
F.col("id"),
F.upper(F.col("name")).alias("name_upper"),
F.col("amount")
)
result.show()
# Aggregations
df.groupBy("name").agg(
F.sum("amount").alias("total"),
F.count("*").alias("count")
).show()
You can also activate with an existing connection:
from sqlframe_gizmosql import activate, GizmoSQLSession
# Create session first
session = GizmoSQLSession.builder \
.config("gizmosql.uri", "grpc+tls://localhost:31337") \
.config("gizmosql.username", "gizmosql_user") \
.config("gizmosql.password", "gizmosql_password") \
.config("gizmosql.tls_skip_verify", True) \
.getOrCreate()
# Activate with existing connection
activate(conn=session._conn)
# Use PySpark imports
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
Configuration Options
| Option | Description | Default |
|---|---|---|
gizmosql.uri |
GizmoSQL server URI (grpc://host:port or grpc+tls://host:port) | grpc://localhost:31337 |
gizmosql.username |
Username for authentication | None |
gizmosql.password |
Password for authentication | None |
gizmosql.tls_skip_verify |
Skip TLS certificate verification (for self-signed certs) | False |
gizmosql.auth_type |
Authentication type (e.g., "external" for browser-based OAuth/SSO) |
None |
OAuth/SSO Authentication
GizmoSQL supports browser-based OAuth/SSO via auth_type="external". When using external auth, no username or password is needed — a browser window will open for authentication:
from sqlframe_gizmosql import GizmoSQLSession
session = GizmoSQLSession.builder \
.config("gizmosql.uri", "grpc+tls://gizmosql.example.com:31337") \
.config("gizmosql.auth_type", "external") \
.config("gizmosql.tls_skip_verify", True) \
.getOrCreate()
Or with activate mode:
from sqlframe_gizmosql import activate
activate(
uri="grpc+tls://gizmosql.example.com:31337",
auth_type="external",
tls_skip_verify=True
)
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
Features
- Full PySpark DataFrame API compatibility via SQLFrame
- Arrow Flight SQL protocol for high-performance data transfer
- Support for reading/writing various file formats (Parquet, CSV, JSON)
- Window functions
- Aggregations and groupBy operations
- Joins
- UDF registration
- Catalog operations
Running GizmoSQL with Docker
You can run GizmoSQL locally using Docker:
docker run -d \
--name gizmosql \
-p 31337:31337 \
-e GIZMOSQL_USERNAME=gizmosql_user \
-e GIZMOSQL_PASSWORD=gizmosql_password \
-e DATABASE_FILENAME=/tmp/test.duckdb \
-e TLS_ENABLED=1 \
gizmodata/gizmosql:latest
For TLS connections, use grpc+tls:// in the URI and set gizmosql.tls_skip_verify to True for self-signed certificates.
Development
Setup
# Clone the repository
git clone https://github.com/gizmodata/sqlframe-gizmosql.git
cd sqlframe-gizmosql
# Create a virtual environment
python -m venv .venv
source .venv/bin/activate
# Install dev dependencies
pip install -e ".[dev]"
Running Tests
# Run unit tests
pytest tests/unit
# Run integration tests (requires GizmoSQL server)
pytest tests/integration
Code Quality
# Run linting
ruff check .
# Run formatting
ruff format .
License
Apache License 2.0
Related Projects
- SQLFrame - PySpark-like DataFrame API for multiple SQL backends
- GizmoSQL - Database server using DuckDB with Arrow Flight SQL interface
- sqlmesh-gizmosql - GizmoSQL adapter for SQLMesh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqlframe_gizmosql-1.2.0.tar.gz.
File metadata
- Download URL: sqlframe_gizmosql-1.2.0.tar.gz
- Upload date:
- Size: 19.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81649e72a68c2fd6d123b74aecb2ad72d5a66af4446630eaec70affd5ca25837
|
|
| MD5 |
476bd62ba8e767776bb6b1d9331a23cb
|
|
| BLAKE2b-256 |
1243aed2c30199acc2248fb0f06d56abed53cc471b21813275723f7331b97495
|
Provenance
The following attestation bundles were made for sqlframe_gizmosql-1.2.0.tar.gz:
Publisher:
ci.yml on gizmodata/sqlframe-gizmosql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlframe_gizmosql-1.2.0.tar.gz -
Subject digest:
81649e72a68c2fd6d123b74aecb2ad72d5a66af4446630eaec70affd5ca25837 - Sigstore transparency entry: 1203515631
- Sigstore integration time:
-
Permalink:
gizmodata/sqlframe-gizmosql@089446d671ee415caebe7abc7bc69d5ac753208f -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/gizmodata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@089446d671ee415caebe7abc7bc69d5ac753208f -
Trigger Event:
push
-
Statement type:
File details
Details for the file sqlframe_gizmosql-1.2.0-py3-none-any.whl.
File metadata
- Download URL: sqlframe_gizmosql-1.2.0-py3-none-any.whl
- Upload date:
- Size: 18.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5170a262526d203f6ba4d0f614217a1a4d64948743d0a57eb237309cb932d561
|
|
| MD5 |
6cc67a3bec1329f725bc78951994da5c
|
|
| BLAKE2b-256 |
e2b3bf4144e1845c0d4a34f7edbd349943517297475e161331b92dffd51cf85d
|
Provenance
The following attestation bundles were made for sqlframe_gizmosql-1.2.0-py3-none-any.whl:
Publisher:
ci.yml on gizmodata/sqlframe-gizmosql
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sqlframe_gizmosql-1.2.0-py3-none-any.whl -
Subject digest:
5170a262526d203f6ba4d0f614217a1a4d64948743d0a57eb237309cb932d561 - Sigstore transparency entry: 1203515636
- Sigstore integration time:
-
Permalink:
gizmodata/sqlframe-gizmosql@089446d671ee415caebe7abc7bc69d5ac753208f -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/gizmodata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@089446d671ee415caebe7abc7bc69d5ac753208f -
Trigger Event:
push
-
Statement type: