Easily manage incremental progress using watermarks in your Databricks data pipelines
Project description
dbx-marker
Easily manage incremental progress using watermarks in your Databricks data pipelines.
Overview
dbx-marker is a Python library that helps you manage watermarks in your Databricks data pipelines using Delta tables.
It provides a simple interface to track and manage pipeline progress, making it easier to implement incremental processing and resume operations.
Features
- Simple API for managing pipeline watermarks
- Persistent storage using Delta tables
- Thread-safe operations
- Comprehensive error handling
- Built for Databricks environments
Installation
Install using pip:
pip install dbx-marker
Quick Start
from dbx_marker import DbxMarker
from pyspark.sql import SparkSession
# Initialize SparkSession
spark = SparkSession.builder.getOrCreate()
# Create a marker manager
manager = DbxMarker(
delta_table_path="/path/to/markers",
spark=spark
)
# Update a marker (will upsert if it doesn't exist)
manager.update_marker("my_pipeline", "2024-01-21")
# Get the current marker
current_marker = manager.get_marker("my_pipeline")
# Delete a marker when needed
manager.delete_marker("my_pipeline")
Usage
Initialization
Create a DbxMarkerManager instance by specifying the Delta table path where markers will be stored:
from dbx_marker import DbxMarker
from pyspark.sql import SparkSession
# Initialize SparkSession
spark = SparkSession.builder.getOrCreate()
manager = DbxMarker(
delta_table_path="/path/to/markers",
spark=spark # Optional: will create new session if not provided
)
Managing Markers
Update a Marker
manager.update_marker("pipeline_name", "marker_value")
Get Current Marker
current_value = manager.get_marker("pipeline_name")
Delete a Marker
manager.delete_marker("pipeline_name")
Error Handling
The library provides specific exceptions for different scenarios:
MarkerExistsError: When trying to create a duplicate markerMarkerNotFoundError: When a requested marker doesn't existMarkerUpdateError: When marker update failsMarkerDeleteError: When marker deletion fails
Requirements
- Python >= 3.13
- PySpark >= 3.5.4
- Delta-Spark >= 3.3.0
- Loguru >= 0.7.3
Development
- Clone the repository
- Install development dependencies:
pdm install -G dev
- Run tests:
pdm run test
- Run all checks:
pdm run all-checks
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dbx_marker-1.0.5.tar.gz.
File metadata
- Download URL: dbx_marker-1.0.5.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b992259d221bbf4ec6168d27645f31ecc013b2a220e5e96f08653df267c9ffb5
|
|
| MD5 |
5e426c6f3bc4241ed985a3b0fc275f46
|
|
| BLAKE2b-256 |
b6cd4a52c532ddd28e8b54e667616acc8814c25e4df4c6d52cec8397a1df7fe4
|
Provenance
The following attestation bundles were made for dbx_marker-1.0.5.tar.gz:
Publisher:
release.yml on jelther/dbx-marker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbx_marker-1.0.5.tar.gz -
Subject digest:
b992259d221bbf4ec6168d27645f31ecc013b2a220e5e96f08653df267c9ffb5 - Sigstore transparency entry: 164331352
- Sigstore integration time:
-
Permalink:
jelther/dbx-marker@6a2f27e3bbcd84f8686c9819bf7b0e465b9e8557 -
Branch / Tag:
refs/tags/1.0.5 - Owner: https://github.com/jelther
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6a2f27e3bbcd84f8686c9819bf7b0e465b9e8557 -
Trigger Event:
release
-
Statement type:
File details
Details for the file dbx_marker-1.0.5-py3-none-any.whl.
File metadata
- Download URL: dbx_marker-1.0.5-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79839f1ee91595de121a942b9d34004d77ba4c0248078e2b92de9cc7abfd8a10
|
|
| MD5 |
2e9f7ea946f0a9020fa48159aed17a62
|
|
| BLAKE2b-256 |
1b809e8d9036808dbe019614155b1d4ac8d36c95ca58c6f629cfdb665d6a40ca
|
Provenance
The following attestation bundles were made for dbx_marker-1.0.5-py3-none-any.whl:
Publisher:
release.yml on jelther/dbx-marker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dbx_marker-1.0.5-py3-none-any.whl -
Subject digest:
79839f1ee91595de121a942b9d34004d77ba4c0248078e2b92de9cc7abfd8a10 - Sigstore transparency entry: 164331353
- Sigstore integration time:
-
Permalink:
jelther/dbx-marker@6a2f27e3bbcd84f8686c9819bf7b0e465b9e8557 -
Branch / Tag:
refs/tags/1.0.5 - Owner: https://github.com/jelther
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6a2f27e3bbcd84f8686c9819bf7b0e465b9e8557 -
Trigger Event:
release
-
Statement type: