Skip to main content

Shared utilities for dlt data pipelines with multi-company support

Project description

dlt_utils

PyPI version Python 3.13+ License: MIT

Shared utilities for dlt data pipelines with multi-company support.

Features

  • PartitionedIncremental: Incremental state tracking per partition key (e.g., company_id)
  • Date utilities: Generate (year, week) and (year, month) tuples for time-based partitioning
  • Schema utilities: Ensure tables exist in destination database

Installation

# From PyPI
pip install dlt_utils

# For development
pip install -e ".[dev]"

Usage

PartitionedIncremental

Track incremental state per company (or any partition key):

import dlt
from dlt_utils import PartitionedIncremental

@dlt.resource
def sync_resource():
    state = dlt.current.resource_state()
    inc = PartitionedIncremental(
        state=state,
        state_key="sequences",
        cursor_path="sequenceNumber",
        initial_value=0,
    )

    for company_id in ["company_a", "company_b"]:
        start_seq = inc.get_last_value(company_id)
        for record in fetch_data(company_id, since=start_seq):
            inc.track(company_id, record["sequenceNumber"])
            yield record

Date utilities

Generate time periods for partitioned data extraction:

from dlt_utils import generate_year_weeks, generate_year_months

# Generate weeks from 2024 to now + 52 weeks
weeks = generate_year_weeks(start_year=2024)
# [(2024, 1), (2024, 2), ..., (2025, 52)]

# Generate months from October 2024 to February 2025
months = generate_year_months(2024, 10, 2025, 2)
# [(2024, 10), (2024, 11), (2024, 12), (2025, 1), (2025, 2)]

Schema utilities

Ensure tables exist before running pipeline:

from dlt_utils import ensure_all_tables_exist, ensure_tables_for_resources

# Create all tables from schema
ensure_all_tables_exist(pipeline)

# Create only specific resource tables (including child tables)
ensure_tables_for_resources(pipeline, ["trade_items", "organizations"])

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linter
ruff check dlt_utils/

CI/CD Pipeline

De pipeline draait automatisch bij:

  • Push naar main: Voert tests uit
  • Tag met v* prefix: Voert tests uit én publiceert naar PyPI

Pipeline Workflow

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Push/Tag      │────▶│   Test Stage    │────▶│  Publish Stage  │
│   naar repo     │     │   (altijd)      │     │  (alleen tags)  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
                              │                        │
                              ▼                        ▼
                        - Install deps           - Build package
                        - Run pytest             - Upload to PyPI
                        - Publish results

Nieuwe Versie Releasen

Optie 1: Via Git CLI

# 1. Zorg dat alle changes gecommit zijn
git add .
git commit -m "Release v0.2.0"

# 2. Maak een tag aan
git tag v0.2.0

# 3. Push commit én tag naar remote
git push origin main
git push origin v0.2.0

Optie 2: Via Azure DevOps

  1. Ga naar ReposTags
  2. Klik op New tag
  3. Vul in:
    • Name: v0.2.0 (moet beginnen met v)
    • Based on: selecteer de commit of branch (bijv. main)
    • Description: optioneel, bijv. "Added new feature X"
  4. Klik op Create

De pipeline wordt automatisch getriggered en publiceert naar PyPI.

Versienummering

Gebruik Semantic Versioning:

  • vMAJOR.MINOR.PATCH (bijv. v1.2.3)
  • MAJOR: Breaking changes
  • MINOR: Nieuwe features (backwards compatible)
  • PATCH: Bugfixes

⚠️ Belangrijk: Vergeet niet de versie in pyproject.toml bij te werken vóór het taggen!

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dlt_utils-0.1.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dlt_utils-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file dlt_utils-0.1.0.tar.gz.

File metadata

  • Download URL: dlt_utils-0.1.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for dlt_utils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f591c2a1d1677d5a841e89b05edfc689d540537e7ebb53a2be7c70bda9a11574
MD5 3d7427cf43093aca2e0e7a6d412f3964
BLAKE2b-256 bb5d572edffa2f025addb65526cadb9a12cccb790af6192c5db5bbe98b25a7e9

See more details on using hashes here.

File details

Details for the file dlt_utils-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dlt_utils-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for dlt_utils-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ed7e25ee34c77dafec0ec7a9b8a70f869c2201c17bf863b58a4653c6d3bff356
MD5 50c206d9915f3be76f66b7c574ca1000
BLAKE2b-256 78bc18faba499c803449dc42b06c619b8a11b054eebbe6e33542368ec021e664

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page