Skip to main content

Easily manage incremental progress using watermarks in your Databricks data pipelines

Reason this release was yanked:

Python version required was too high.

Project description

dbx-marker

Easily manage incremental progress using watermarks in your Databricks data pipelines.

Overview

dbx-marker is a Python library that helps you manage watermarks in your Databricks data pipelines using Delta tables.

It provides a simple interface to track and manage pipeline progress, making it easier to implement incremental processing and resume operations.

Features

  • Simple API for managing pipeline watermarks
  • Persistent storage using Delta tables
  • Thread-safe operations
  • Comprehensive error handling
  • Built for Databricks environments

Installation

Install using pip:

pip install dbx-marker

Quick Start

from dbx_marker import DbxMarker
from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.getOrCreate()

# Create a marker manager
manager = DbxMarker(
    delta_table_path="/path/to/markers",
    spark=spark
)

# Update a marker (will upsert if it doesn't exist)
manager.update_marker("my_pipeline", "2024-01-21")

# Get the current marker
current_marker = manager.get_marker("my_pipeline")

# Delete a marker when needed
manager.delete_marker("my_pipeline")

Usage

Initialization

Create a DbxMarkerManager instance by specifying the Delta table path where markers will be stored:

from dbx_marker import DbxMarker
from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.getOrCreate()

manager = DbxMarker(
    delta_table_path="/path/to/markers",
    spark=spark  # Optional: will create new session if not provided
)

Managing Markers

Update a Marker

manager.update_marker("pipeline_name", "marker_value")

Get Current Marker

current_value = manager.get_marker("pipeline_name")

Delete a Marker

manager.delete_marker("pipeline_name")

Error Handling

The library provides specific exceptions for different scenarios:

  • MarkerExistsError: When trying to create a duplicate marker
  • MarkerNotFoundError: When a requested marker doesn't exist
  • MarkerUpdateError: When marker update fails
  • MarkerDeleteError: When marker deletion fails

Requirements

  • Python >= 3.13
  • PySpark >= 3.5.4
  • Delta-Spark >= 3.3.0
  • Loguru >= 0.7.3

Development

  1. Clone the repository
  2. Install development dependencies:
pdm install -G dev
  1. Run tests:
pdm run test
  1. Run all checks:
pdm run all-checks

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbx_marker-1.0.2.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbx_marker-1.0.2-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file dbx_marker-1.0.2.tar.gz.

File metadata

  • Download URL: dbx_marker-1.0.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dbx_marker-1.0.2.tar.gz
Algorithm Hash digest
SHA256 9ad85fa25d69566c716dd7505be32acb989625d7c987c2f8bcd09f2aaf95d015
MD5 556062cd5e755d0477d936ec6d719810
BLAKE2b-256 c1044d6458f1774ad91335b527d3623788af84016c9e7f118feb57656860d414

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbx_marker-1.0.2.tar.gz:

Publisher: release.yml on jelther/dbx-marker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dbx_marker-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: dbx_marker-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for dbx_marker-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 56806587db97b79cc3f7f1be913146fa22de4cb06df614bf6575641885ad60f5
MD5 3e610a45426652bfec70dbe1750b234b
BLAKE2b-256 74536de487ed4328d2d79c9e9e7977ae53646d2a8f980ceec7cd5fe1c2e0a6b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for dbx_marker-1.0.2-py3-none-any.whl:

Publisher: release.yml on jelther/dbx-marker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page