Local Databricks Development Bridge - Intercept Spark operations for local Unity Catalog access

These details have not been verified by PyPI

Project links

Project description

MangledDLT - Local Databricks Development Bridge

MangledDLT enables developers to write and test Databricks code locally by intercepting Spark operations and fetching data from remote Unity Catalog environments. Write your PySpark code once and run it anywhere - locally or on Databricks - without changes.

Features

Transparent Spark Interception: Automatically intercepts spark.read.table() and spark.readStream.table() calls
Unity Catalog Integration: Fetches data directly from remote Unity Catalog tables
Smart Caching: LRU cache with TTL for improved development performance
Multiple Auth Methods: Supports PAT, OAuth, and Service Principal authentication
Zero Code Changes: Same code works locally and on Databricks
Connection Pooling: Efficient connection management for better performance
Error Recovery: Automatic retry with exponential backoff

Installation

pip install MangledDlt

Or with all dependencies:

pip install MangledDlt[all]

Quick Start

from pyspark.sql import SparkSession
from mangledlt import MangledDLT

# Create Spark session as usual
spark = SparkSession.builder \
    .appName("LocalDev") \
    .getOrCreate()

# Enable MangledDLT
mdlt = MangledDLT()
mdlt.enable()

# Now you can read from Unity Catalog!
df = spark.read.table("main.default.customers")
df.show()

# When done, disable interception
mdlt.disable()

Configuration

Using Environment Variables

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="dapi..."
export DATABRICKS_WAREHOUSE_ID="your-warehouse-id"

Using Databricks CLI Config

# Configure Databricks CLI
databricks configure --token

# MangledDLT will automatically use your configuration

Using Custom Config

from mangledlt import MangledDLT

config = {
    "host": "https://workspace.cloud.databricks.com",
    "token": "your-token",
    "warehouse_id": "warehouse-id",
    "cache_enabled": True,
    "cache_ttl": 600  # 10 minutes
}

mdlt = MangledDLT(config=config)
mdlt.enable()

Development vs Production

from pyspark.sql import SparkSession
from mangledlt import MangledDLT

spark = SparkSession.builder.appName("MyApp").getOrCreate()

# Auto-detect environment
if not spark.conf.get("spark.databricks.service.clusterId"):
    # Running locally - enable MangledDLT
    mdlt = MangledDLT()
    mdlt.enable()
    print("Running locally with MangledDLT")
else:
    print("Running on Databricks")

# Your code works the same in both environments
customers = spark.read.table("catalog.schema.customers")
orders = spark.read.table("catalog.schema.orders")
result = customers.join(orders, "customer_id")
result.show()

Caching

MangledDLT includes intelligent caching to speed up iterative development:

mdlt = MangledDLT(config={
    "cache_enabled": True,
    "cache_ttl": 1800,  # 30 minutes
    "cache_max_size": 100  # Max 100 cached queries
})
mdlt.enable()

# First read - fetches from Unity Catalog
df1 = spark.read.table("catalog.schema.large_table")  # Takes 5 seconds

# Subsequent reads - served from cache
df2 = spark.read.table("catalog.schema.large_table")  # Takes <100ms

# Check cache statistics
stats = mdlt.get_cache_stats()
print(f"Cache hits: {stats['hits']}")
print(f"Hit rate: {stats['hit_rate']}%")

# Clear cache when needed
mdlt.clear_cache()

Error Handling

from mangledlt import MangledDLT
from mangledlt.exceptions import AuthError, TableNotFoundError

try:
    mdlt = MangledDLT()
    mdlt.enable()

    df = spark.read.table("catalog.schema.table")
    df.show()

except AuthError as e:
    print(f"Authentication failed: {e}")
    print("Please check your Databricks credentials")

except TableNotFoundError as e:
    print(f"Table not found: {e}")
    print("Please verify the table exists and you have access")

Multiple Workspaces

from mangledlt import MangledDLT
from mangledlt.config import Config

# Connect to development workspace
dev_config = Config.from_file(profile="DEV")
dev_mdlt = MangledDLT(config=dev_config)
dev_mdlt.enable()

# Read from dev
dev_data = spark.read.table("dev_catalog.schema.table")

# Switch to production
dev_mdlt.disable()
prod_config = Config.from_file(profile="PROD")
prod_mdlt = MangledDLT(config=prod_config)
prod_mdlt.enable()

# Read from production
prod_data = spark.read.table("prod_catalog.schema.table")

API Reference

MangledDLT

Main class for enabling local Databricks development.

__init__(config=None): Initialize with optional configuration
enable(): Enable Spark operation interception
disable(): Disable interception
get_status(): Get connection status
clear_cache(): Clear query cache
get_cache_stats(): Get cache statistics

Config

Configuration management class.

from_file(path, profile): Load from Databricks CLI config
from_env(): Load from environment variables
validate(): Validate configuration

Exceptions

ConfigError: Configuration issues
AuthError: Authentication failures
ConnectionError: Connection problems
TableNotFoundError: Table doesn't exist
PermissionError: Insufficient permissions
InvalidReferenceError: Invalid table reference format

Requirements

Python 3.9+
PySpark 3.4+ (user must install separately)
databricks-sql-connector 2.9+

Development

# Clone the repository
git clone https://github.com/mangledlt/mangledlt.git
cd mangledlt

# Install in development mode
pip install -e .[dev]

# Run tests
pytest tests/

License

MIT License - see LICENSE file for details.

Support

Issues: https://github.com/mangledlt/mangledlt/issues
Discussions: https://github.com/mangledlt/mangledlt/discussions
Documentation: https://github.com/mangledlt/mangledlt/docs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.4

Oct 7, 2025

This version

0.1.3

Sep 30, 2025

0.1.2

Sep 29, 2025

0.1.1

Sep 29, 2025

0.1.0

Sep 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mangleddlt-0.1.3.tar.gz (36.8 kB view details)

Uploaded Sep 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mangleddlt-0.1.3-py3-none-any.whl (43.5 kB view details)

Uploaded Sep 30, 2025 Python 3

File details

Details for the file mangleddlt-0.1.3.tar.gz.

File metadata

Download URL: mangleddlt-0.1.3.tar.gz
Upload date: Sep 30, 2025
Size: 36.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for mangleddlt-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`b0eb9bd1c9ec77a5111a12f5861d1a6b2ef9c4256edefd1c9bd72292ff4091fd`
MD5	`c43f2bc7432ae7df0256e8cb049c69ad`
BLAKE2b-256	`c6102d69a3df55a404ccc7a24c18b856bfbc068b4d27aafcd5e5efa3e0c494ef`

See more details on using hashes here.

File details

Details for the file mangleddlt-0.1.3-py3-none-any.whl.

File metadata

Download URL: mangleddlt-0.1.3-py3-none-any.whl
Upload date: Sep 30, 2025
Size: 43.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for mangleddlt-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1bf74501ceb62e4025b22890bdf81dea880457aff976f1e29080f472c933b5bb`
MD5	`637db10a67f7875f0fe0684d8c891aa2`
BLAKE2b-256	`4780728581f2cd5118db7ac058ee7b3530a8858caaf9764acd8ad1b1cb537920`

See more details on using hashes here.

MangledDlt 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MangledDLT - Local Databricks Development Bridge

Features

Installation

Quick Start

Configuration

Using Environment Variables

Using Databricks CLI Config

Using Custom Config

Development vs Production

Caching

Error Handling

Multiple Workspaces

API Reference

MangledDLT

Config

Exceptions

Requirements

Development

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes