A Flight SQL proxy for Delta Lake
Project description
flydelta
A Flight SQL proxy for Delta Lake. Query Delta tables via Apache Arrow Flight with efficient streaming and predicate pushdown.
flydelta is read-only on existing data and has no authentication logic to keep it very simple.
Why?
When a Delta Lake storage backend (on S3, disk, etc.) is queried by multiple client applications, having each client read Parquet files from the source storage directly is inefficient due to network traffic, even when using predicate pushdown.
flydelta solves this by acting as a query proxy deployed close to the data:
Installation
pip install flydelta
Usage
Server
Start a flydelta server with Delta tables:
flydelta serve -t users=s3://bucket/users -t orders=/data/orders
Options:
flydelta serve \
--host 0.0.0.0 \
--port 8815 \
--table users=s3://bucket/users \
--table orders=/data/orders \
--pool-size 20 \
--batch-size 100000
Docker
docker build -t flydelta .
docker run -p 8815:8815 flydelta -t users=/data/users -t orders=/data/orders
Python Client
from flydelta import Client
with Client("grpc://localhost:8815") as client:
# Query to Arrow table
table = client.query("SELECT * FROM users WHERE active = true")
# Convert to pandas DataFrame
df = table.to_pandas()
# List available tables
tables = client.list_tables()
Streaming Large Results
For memory-efficient processing of large result sets:
from flydelta import Client
with Client("grpc://localhost:8815") as client:
for batch in client.stream_query("SELECT * FROM huge_table"):
# Process each batch (default 100k rows)
for row in batch.to_pylist():
process(row)
# Or process columnar (faster)
ids = batch.column('id')
values = batch.column('value')
CLI Client
# Query with table output
flydelta query "SELECT * FROM users LIMIT 10"
# Query with JSON output
flydelta query "SELECT * FROM users" -o json
# Query with CSV output
flydelta query "SELECT * FROM users" -o csv
# List tables
flydelta tables
Architecture
flydelta uses:
- delta-rs: Rust-based Delta Lake implementation (no Spark needed)
- DuckDB: Fast SQL execution with predicate pushdown
- Apache Arrow Flight: Efficient gRPC-based data transfer
On startup, flydelta:
- Loads Delta table metadata
- Creates a connection pool with tables pre-registered
- Caches schemas for fast query planning
Queries are executed via DuckDB and streamed back as Arrow record batches.
Development
This package uses poetry for packaging and dependencies management.
# Clone and install
git clone https://github.com/dataresearchcenter/flydelta.git
cd flydelta
poetry install --with dev
# Setup pre-commit hooks
poetry run pre-commit install
# Run tests
make test
# Run linting
make lint
Disclaimer
Despite the name suggesting otherwise, flydelta has no affiliation with Delta Air Lines. We cannot help you book flights, upgrade your SkyMiles status, or locate your lost luggage. Actually, please stop flying at all if possible. 🌱
License
flydelta is licensed under the AGPLv3 or later license. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flydelta-0.0.3.tar.gz.
File metadata
- Download URL: flydelta-0.0.3.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.13.5 Linux/6.12.63+deb13-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a54e85763427b43ecf02bff8f9d55b00d7a55e49266d8aa2babe86b80d60074
|
|
| MD5 |
ae651fa015d8cf4f1e05530810079ab4
|
|
| BLAKE2b-256 |
3efca62179b4ed14285743478a5d7135a88f50a499187f51289f0059e71219c7
|
File details
Details for the file flydelta-0.0.3-py3-none-any.whl.
File metadata
- Download URL: flydelta-0.0.3-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.13.5 Linux/6.12.63+deb13-amd64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b13cafb49295296b60b8d56c8bec149e4d93d377d67cd988ca715146dd77b460
|
|
| MD5 |
a7aa153aa20db87926de305c3d977ff9
|
|
| BLAKE2b-256 |
e48ad063f15ebb3f5ef3fbcac1bbe4e08d5aab34b6edace0a6f9bbb18dd9097e
|