Skip to main content

Thin Python wrapper for reading Delta tables from Azure Blob Storage with low and stable latency.

Project description

deltabridge

Thin Python wrapper for reading Delta tables from object storage (currently Azure Blob Storage) or a local filesystem, with low and stable latency. Optimized for repeated reads from long-running Python services. A typical use case is exposing the final products of a data pipeline via a REST API, where request latency should stay predictable.

Note: The efficiency is achieved by using Rust-based loading of Delta tables through delta-rs and automatic incremental caching of Delta transaction logs.

Installation

pip install deltabridge

Or, with uv:

uv add deltabridge

Usage

Examples

Azure

import os

import deltalake
import polars as pl

from deltabridge import PartitionFilterOperator
from deltabridge.azure import AzureDeltaClient

azure_delta_client = AzureDeltaClient()
table_client = azure_delta_client.get_table_client(
    table_uri=os.environ['MY_TABLE_STORAGE_URI'],
)

# Get a DeltaTable instance
delta_table: deltalake.DeltaTable = table_client.load_as_delta()

# Load the data as a Polars LazyFrame
table_ldf: pl.LazyFrame = table_client.load_as_polars()
# Collect to a Polars DataFrame
table_df: pl.DataFrame = table_ldf.filter(pl.col('x') > 3).collect()

# For partitioned tables, push filters down to the partition columns so that
# only matching partitions are read from storage (avoiding a full scan).
# Multiple partition filters are combined using the logical AND operator.
table_df = table_client.load_as_polars(
    partition_filter=[
        ('country', PartitionFilterOperator.IN, ['CZ', 'SK']),
        ('year', PartitionFilterOperator.EQUAL, '2024'),
    ],
).collect()

Local filesystem

import polars as pl

from deltabridge.local import LocalDeltaClient

MY_TABLE_PATH = '/tmp/my_table'

# Write a table to a local filesystem
pl.DataFrame({'x': [1, 2, 3]}).write_delta(
    target=MY_TABLE_PATH
)

local_delta_client = LocalDeltaClient()
table_client = local_delta_client.get_table_client(
    table_uri=MY_TABLE_PATH  # File path can be used as table URI
)

# Load the data as a Polars LazyFrame and collect it into a DataFrame
table_df = table_client.load_as_polars().collect()
print(table_df)

Databricks tables

If your Delta tables are managed by Databricks (Unity Catalog), they are still stored as ordinary Delta tables in object storage. Deltabridge can read them directly from the storage, so you can access them without a Databricks SQL warehouse or cluster:

  • Use the table's storage location (in Azure Blob Storage) as the table URI.
    • You can find it in the Databricks Catalog Explorer UI under Details of the table.
  • The reading identity needs at least the Storage Blob Data Reader permission on the storage location (storage account/container).

Writing to Delta tables

deltabridge is read-focused: it provides no write API, and its optimizations don't apply to writes. This is deliberate:

  • write use cases are more varied and harder to abstract well - appends, overwrites, merges/upserts, schema evolution and concurrency control all behave differently
  • writes are typically handled upstream by the systems that produce the tables (often Spark/PySpark pipelines)

Writing is still possible: load_as_delta() returns a deltalake.DeltaTable with deltabridge's auth already configured, which you can pass to deltalake's write API:

import deltalake

deltalake.write_deltalake(table_client.load_as_delta(), df, mode='append')

Cloud provider support

Object storage support currently covers Azure Blob Storage (plus the local filesystem).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltabridge-1.0.0.tar.gz (55.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltabridge-1.0.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file deltabridge-1.0.0.tar.gz.

File metadata

  • Download URL: deltabridge-1.0.0.tar.gz
  • Upload date:
  • Size: 55.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deltabridge-1.0.0.tar.gz
Algorithm Hash digest
SHA256 099ca8eefd1cceae9bc75a5a49f3120be0f5265f93a4c12edad97543c726658e
MD5 3ecf6ed13549bd9cd7af0e0bf6d4593b
BLAKE2b-256 9ee5b574f85415ef377ca353a16fb479eb4e31ea6276d6a9ae44fde9b1c7c38f

See more details on using hashes here.

Provenance

The following attestation bundles were made for deltabridge-1.0.0.tar.gz:

Publisher: release.yaml on datamole-ai/deltabridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deltabridge-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: deltabridge-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for deltabridge-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bdf9edf796c604493ccb063e80dc53130c44d4fc0abe7008418ef9aea887cd28
MD5 e0adee88b4a0ab4c4f9d9e052480415d
BLAKE2b-256 8b686362520bc5b46c55bea603460510360e984d82cd0d209aafd070752d8189

See more details on using hashes here.

Provenance

The following attestation bundles were made for deltabridge-1.0.0-py3-none-any.whl:

Publisher: release.yaml on datamole-ai/deltabridge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page