Skip to main content

A Flight SQL proxy for Delta Lake

Project description

flydelta on pypi Python test and package pre-commit Coverage Status AGPLv3+ License

flydelta

A Flight SQL proxy for Delta Lake. Query Delta tables via Apache Arrow Flight with efficient streaming and predicate pushdown.

flydelta is read-only on existing data and has no authentication logic to keep it very simple.

Why?

When a Delta Lake storage backend (on S3, disk, etc.) is queried by multiple client applications, having each client read Parquet files from the source storage directly is inefficient due to network traffic, even when using predicate pushdown.

flydelta solves this by acting as a query proxy deployed close to the data:

Architecture

Installation

pip install flydelta

Usage

Server

Start a flydelta server with Delta tables:

flydelta serve -t users=s3://bucket/users -t orders=/data/orders

Options:

flydelta serve \
  --host 0.0.0.0 \
  --port 8815 \
  --table users=s3://bucket/users \
  --table orders=/data/orders \
  --pool-size 20 \
  --batch-size 100000

Docker

docker build -t flydelta .
docker run -p 8815:8815 flydelta -t users=/data/users -t orders=/data/orders

Python Client

from flydelta import Client

with Client("grpc://localhost:8815") as client:
    # Query to pandas DataFrame
    df = client.query_df("SELECT * FROM users WHERE active = true")

    # Query to polars DataFrame
    df = client.query_polars("SELECT * FROM users LIMIT 1000")

    # Query to Arrow table
    table = client.query("SELECT COUNT(*) FROM orders")

    # List available tables
    tables = client.list_tables()

Streaming Large Results

For memory-efficient processing of large result sets:

from flydelta import Client

with Client("grpc://localhost:8815") as client:
    for batch in client.stream_query("SELECT * FROM huge_table"):
        # Process each batch (default 100k rows)
        for row in batch.to_pylist():
            process(row)

        # Or process columnar (faster)
        ids = batch.column('id')
        values = batch.column('value')

CLI Client

# Query with table output
flydelta query "SELECT * FROM users LIMIT 10"

# Query with JSON output
flydelta query "SELECT * FROM users" -o json

# Query with CSV output
flydelta query "SELECT * FROM users" -o csv

# List tables
flydelta tables

Architecture

flydelta uses:

  • delta-rs: Rust-based Delta Lake implementation (no Spark needed)
  • DuckDB: Fast SQL execution with predicate pushdown
  • Apache Arrow Flight: Efficient gRPC-based data transfer

On startup, flydelta:

  1. Loads Delta table metadata
  2. Creates a connection pool with tables pre-registered
  3. Caches schemas for fast query planning

Queries are executed via DuckDB and streamed back as Arrow record batches.

Development

This package uses poetry for packaging and dependencies management.

# Clone and install
git clone https://github.com/dataresearchcenter/flydelta.git
cd flydelta
poetry install --with dev

# Setup pre-commit hooks
poetry run pre-commit install

# Run tests
make test

# Run linting
make lint

Disclaimer

Despite the name suggesting otherwise, flydelta has no affiliation with Delta Air Lines. We cannot help you book flights, upgrade your SkyMiles status, or locate your lost luggage. Actually, please stop flying at all if possible. 🌱

License

flydelta is licensed under the AGPLv3 or later license. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flydelta-0.0.1.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flydelta-0.0.1-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file flydelta-0.0.1.tar.gz.

File metadata

  • Download URL: flydelta-0.0.1.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.5 Linux/6.12.57+deb13-amd64

File hashes

Hashes for flydelta-0.0.1.tar.gz
Algorithm Hash digest
SHA256 69f4b2d3d6516048841ea9ae2ee99cad062616b9ac7680b233627a46047c2f8a
MD5 55fa6fbb5e260a30d0390ca4fd2bc53d
BLAKE2b-256 ffae4adcbaf1de0c9c98bcb227b3cbc87a9355f22434a5d09e5a211a2141ec69

See more details on using hashes here.

File details

Details for the file flydelta-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: flydelta-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.5 Linux/6.12.57+deb13-amd64

File hashes

Hashes for flydelta-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2897ab27c94a79f43965d90eba7c782d4ad5c48d433d9b048225222a3c3b4c1c
MD5 cbb43b66a91ab9818d96c0072b290be6
BLAKE2b-256 24f1d3994682415c1cc6b19be9650f77881f53ac455986678c262ec63642c922

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page