Skip to main content

Easily manage incremental progress using watermarks in your Databricks data pipelines

Project description

dbx-marker

Easily manage incremental progress using watermarks in your Databricks data pipelines.

Overview

dbx-marker is a Python library that helps you manage watermarks in your Databricks data pipelines using Delta tables.

It provides a simple interface to track and manage pipeline progress, making it easier to implement incremental processing and resume operations.

Features

  • Simple API for managing pipeline watermarks
  • Persistent storage using Delta tables
  • Thread-safe operations
  • Comprehensive error handling
  • Built for Databricks environments

Installation

Install using pip:

pip install dbx-marker

Quick Start

from dbx_marker import DbxMarker
from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.getOrCreate()

# Create a marker manager
manager = DbxMarker(
    delta_table_path="/path/to/markers",
    spark=spark
)

# Update a marker (will upsert if it doesn't exist)
manager.update_marker("my_pipeline", "2024-01-21")

# Get the current marker
current_marker = manager.get_marker("my_pipeline")

# Delete a marker when needed
manager.delete_marker("my_pipeline")

Usage

Initialization

Create a DbxMarkerManager instance by specifying the Delta table path where markers will be stored:

from dbx_marker import DbxMarker
from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.getOrCreate()

manager = DbxMarker(
    delta_table_path="/path/to/markers",
    spark=spark  # Optional: will create new session if not provided
)

Managing Markers

Update a Marker

manager.update_marker("pipeline_name", "marker_value")

Get Current Marker

current_value = manager.get_marker("pipeline_name")

Delete a Marker

manager.delete_marker("pipeline_name")

Error Handling

The library provides specific exceptions for different scenarios:

  • MarkerExistsError: When trying to create a duplicate marker
  • MarkerNotFoundError: When a requested marker doesn't exist
  • MarkerUpdateError: When marker update fails
  • MarkerDeleteError: When marker deletion fails

Requirements

  • Python >= 3.13
  • PySpark >= 3.5.4
  • Delta-Spark >= 3.3.0
  • Loguru >= 0.7.3

Development

  1. Clone the repository
  2. Install development dependencies:
pdm install -G dev
  1. Run tests:
pdm run test
  1. Run all checks:
pdm run all-checks

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbx_marker-1.0.5.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbx_marker-1.0.5-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file dbx_marker-1.0.5.tar.gz.

File metadata

  • Download URL: dbx_marker-1.0.5.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dbx_marker-1.0.5.tar.gz
Algorithm Hash digest
SHA256 b992259d221bbf4ec6168d27645f31ecc013b2a220e5e96f08653df267c9ffb5
MD5 5e426c6f3bc4241ed985a3b0fc275f46
BLAKE2b-256 b6cd4a52c532ddd28e8b54e667616acc8814c25e4df4c6d52cec8397a1df7fe4

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbx_marker-1.0.5.tar.gz:

Publisher: release.yml on jelther/dbx-marker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbx_marker-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: dbx_marker-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dbx_marker-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 79839f1ee91595de121a942b9d34004d77ba4c0248078e2b92de9cc7abfd8a10
MD5 2e9f7ea946f0a9020fa48159aed17a62
BLAKE2b-256 1b809e8d9036808dbe019614155b1d4ac8d36c95ca58c6f629cfdb665d6a40ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbx_marker-1.0.5-py3-none-any.whl:

Publisher: release.yml on jelther/dbx-marker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page