Production-grade utilities for Delta Lake management
Project description
delta-lake-utils
Production-grade utilities for Delta Lake table management and optimization on Databricks.
What Does This Package Do?
Automates Delta Lake table optimization, health monitoring, and pipeline generation for Databricks data engineers.
Main Features:
-
Smart OPTIMIZE - Automatically consolidates small files and improves query performance
- Detects when your Delta table has too many small files
- Intelligently chooses which columns to Z-ORDER by
- Reduces query time by up to 10x
-
Health Checker - Diagnoses table problems before they impact production
- Identifies small file problems
- Detects data skew across partitions
- Finds configuration issues
-
Performance Profiler - Measures how fast your Delta operations run
- Track read/write speeds
- Identify bottlenecks
- Compare before/after optimization
-
Medallion Generator - Auto-creates Bronze/Silver/Gold pipeline code
- Generates production-ready notebooks
- Follows best practices
- Saves hours of boilerplate coding
-
Unity Catalog Auditor - Manages permissions and access control
- Audits table permissions
- Generates permission scripts
- Ensures security compliance
Installation
pip install delta-lake-utils
Quick Start
from pyspark.sql import SparkSession
from delta_utils import DeltaOptimizer
spark = SparkSession.builder.getOrCreate()
optimizer = DeltaOptimizer(spark)
# Optimize a table - reduces files, improves performance
result = optimizer.auto_optimize('/mnt/delta/my_table')
print(f"Optimized! Removed {result.files_removed} files")
Use Cases
Use Case 1: Your queries are slow
Problem: Delta table has 5000 small files, queries take 10 minutes
Solution: Run optimizer, consolidates to 50 files, queries now take 1 minute
Use Case 2: Starting a new data pipeline
Problem: Need to build Bronze/Silver/Gold architecture from scratch
Solution: Use medallion generator, get complete pipeline in 30 seconds
Use Case 3: Data quality issues
Problem: Not sure if table is healthy, production keeps failing
Solution: Run health checker, get specific recommendations to fix issues
Use Case 4: Permission audit required
Problem: Need to verify all tables have correct access controls
Solution: Use catalog auditor to check and fix permissions
Documentation
- See QUICKSTART.md for 5-minute tutorial
- See EXPLANATION.md for detailed use cases
- See examples/EXAMPLES.md for code examples
Requirements
- Python 3.8+
- PySpark 3.2+
- Delta Lake 2.0+
- Databricks Runtime 11.0+ recommended
Author
Nalini Panwar GitHub: @panwarnalini-hub
License
MIT License - see LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file delta_lake_utils-1.0.0.tar.gz.
File metadata
- Download URL: delta_lake_utils-1.0.0.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be946bf4b966fc4cd3eec2cd9b1c7b0fcdc448b2b3ea0e58a151c1a0690e8c08
|
|
| MD5 |
11357e2b80e91c67de7562f4058afbc9
|
|
| BLAKE2b-256 |
f11d428b4fcf6ef283669d212b99994649377498ce2605baecf7d47e1d133816
|
File details
Details for the file delta_lake_utils-1.0.0-py3-none-any.whl.
File metadata
- Download URL: delta_lake_utils-1.0.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f78744dbf77ad52838e27c9c79ca6592b8e78b373b4a880de18a3f99acdeb3e5
|
|
| MD5 |
fe84cac6b7de25994b503e78b679eeb3
|
|
| BLAKE2b-256 |
99f29a73cf2aeb8d9cb1650c1daeca2589070eec998c8a1290431f3ddc978afb
|