Skip to main content

GizmoSQL adapter for SQLFrame - PySpark-like DataFrame API for GizmoSQL

Project description

sqlframe-gizmosql

GizmoSQL adapter for SQLFrame - a PySpark-like DataFrame API for GizmoSQL.

sqlframe-gizmosql-ci Supported Python Versions PyPI version PyPI Downloads

Overview

This package provides a GizmoSQL backend for SQLFrame, allowing you to use PySpark-compatible DataFrame operations against a GizmoSQL server. GizmoSQL is a database server that uses DuckDB as its execution engine with an Arrow Flight SQL interface.

Installation

pip install sqlframe-gizmosql

Requirements

  • Python >= 3.10
  • GizmoSQL server running and accessible

Quick Start

First, start a GizmoSQL server (see Running GizmoSQL with Docker below), then:

from sqlframe_gizmosql import GizmoSQLSession

# Create a session connected to GizmoSQL
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_username") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Create a DataFrame from a SQL query
df = session.sql("SELECT 1 as id, 'hello' as message")

# Show the results
df.show()

# Use PySpark-like DataFrame API
df2 = session.createDataFrame([
    (1, "Alice", 30),
    (2, "Bob", 25),
    (3, "Charlie", 35),
], ["id", "name", "age"])

# Filter, select, and aggregate
result = df2.filter("age > 25").select("name", "age")
result.show()

# Group by and aggregate
df2.groupBy("age").count().show()

Configuration

The session can be configured using the builder pattern:

session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_username") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

Using PySpark Imports (activate mode)

You can use the activate() function to enable standard PySpark imports while running on GizmoSQL:

from sqlframe_gizmosql import activate

# Activate GizmoSQL as the backend
activate(
    uri="grpc+tls://localhost:31337",
    username="gizmosql_username",
    password="gizmosql_password",
    tls_skip_verify=True  # For self-signed certificates
)

# Now use standard PySpark imports!
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

# Create DataFrame and use PySpark-like functions
df = spark.createDataFrame([
    (1, "alice", 100),
    (2, "bob", 200),
    (3, "alice", 150),
], ["id", "name", "amount"])

# Use functions like F.upper, F.sum, F.col, etc.
result = df.select(
    F.col("id"),
    F.upper(F.col("name")).alias("name_upper"),
    F.col("amount")
)
result.show()

# Aggregations
df.groupBy("name").agg(
    F.sum("amount").alias("total"),
    F.count("*").alias("count")
).show()

You can also activate with an existing connection:

from sqlframe_gizmosql import activate, GizmoSQLSession

# Create session first
session = GizmoSQLSession.builder \
    .config("gizmosql.uri", "grpc+tls://localhost:31337") \
    .config("gizmosql.username", "gizmosql_username") \
    .config("gizmosql.password", "gizmosql_password") \
    .config("gizmosql.tls_skip_verify", True) \
    .getOrCreate()

# Activate with existing connection
activate(conn=session._conn)

# Use PySpark imports
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

Configuration Options

Option Description Default
gizmosql.uri GizmoSQL server URI (grpc://host:port or grpc+tls://host:port) grpc://localhost:31337
gizmosql.username Username for authentication None
gizmosql.password Password for authentication None
gizmosql.tls_skip_verify Skip TLS certificate verification (for self-signed certs) False

Features

  • Full PySpark DataFrame API compatibility via SQLFrame
  • Arrow Flight SQL protocol for high-performance data transfer
  • Support for reading/writing various file formats (Parquet, CSV, JSON)
  • Window functions
  • Aggregations and groupBy operations
  • Joins
  • UDF registration
  • Catalog operations

Running GizmoSQL with Docker

You can run GizmoSQL locally using Docker:

docker run -d \
    --name gizmosql \
    -p 31337:31337 \
    -e GIZMOSQL_USERNAME=gizmosql_username \
    -e GIZMOSQL_PASSWORD=gizmosql_password \
    -e DATABASE_FILENAME=/tmp/test.duckdb \
    -e TLS_ENABLED=1 \
    gizmodata/gizmosql:latest

For TLS connections, use grpc+tls:// in the URI and set gizmosql.tls_skip_verify to True for self-signed certificates.

Development

Setup

# Clone the repository
git clone https://github.com/gizmodata/sqlframe-gizmosql.git
cd sqlframe-gizmosql

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate

# Install dev dependencies
pip install -e ".[dev]"

Running Tests

# Run unit tests
pytest tests/unit

# Run integration tests (requires GizmoSQL server)
pytest tests/integration

Code Quality

# Run linting
ruff check .

# Run formatting
ruff format .

License

Apache License 2.0

Related Projects

  • SQLFrame - PySpark-like DataFrame API for multiple SQL backends
  • GizmoSQL - Database server using DuckDB with Arrow Flight SQL interface
  • sqlmesh-gizmosql - GizmoSQL adapter for SQLMesh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlframe_gizmosql-0.1.2.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlframe_gizmosql-0.1.2-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file sqlframe_gizmosql-0.1.2.tar.gz.

File metadata

  • Download URL: sqlframe_gizmosql-0.1.2.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sqlframe_gizmosql-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b1d85bd42c171d23e986dc7e61d4a0115436ae86e2729b9f890b7e9280494693
MD5 c35b6b9b470488aff064c2715dad56e3
BLAKE2b-256 b518e3f91052269cf275e644b47866c3a66c89f73062eb6e4120d914f1249acd

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-0.1.2.tar.gz:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sqlframe_gizmosql-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlframe_gizmosql-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 15f7a6abcbfc4f583cf0dd2bdca4739e2da90772d96cd2e0dcba3785e533d2fc
MD5 317c17d0ef848ca7a0ad4c1b3a612471
BLAKE2b-256 62ee1f1673526c21fed7fe611108164e727116893ccdfda117dc7631b54cda8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sqlframe_gizmosql-0.1.2-py3-none-any.whl:

Publisher: ci.yml on gizmodata/sqlframe-gizmosql

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page