Skip to main content

Manifest-driven data loading framework with canonical SQLDB manifests, Airflow helpers, and pluggable private integrations.

Project description

dltaf

CI Docs PyPI License

dltaf is a manifest-driven data loading framework built around three ideas:

  • canonical, reviewable YAML manifests
  • a stable OSS core for generic sources
  • extension registries that let private integrations stay private

The public repository ships a clean stage63-based core with:

  • canonical source.kind: sqldb for relational ingestion
  • built-in mongodb support
  • compatibility aliases for legacy SQL manifests such as sql_database, oracle_custom_sql, and oracle
  • Airflow DAG generation helpers
  • manifest linting, doctoring, scaffolding, and lineage tooling
  • Vault-backed secrets resolution through vault-kv-client

Private connectors such as internal APIs, Kafka-backed flows, or company-specific uploaders are intentionally not bundled into the OSS package. They should live in your monorepo or private package index and plug into the same runner, hook, and infra-check registries.

Why dltaf

  • Manifest-first: pipeline behavior stays diffable and reviewable
  • Canonical SQL model: one public SQL contract, with legacy aliases supported as migration shims
  • Plugin-first: private integrations extend the framework without forking it
  • Airflow-friendly: the same manifest can be linted locally, planned in CI, and executed in DAG wrappers
  • Self-service: example manifests, template generation, and migration guidance ship with the package

Installation

Runtime install:

pip install dltaf

Developer install:

git clone https://github.com/PaulKov/dltaf.git
cd dltaf
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .[dev]

Quick start

Validate the canonical SQL example:

dltaf manifest lint --manifest dltaf/examples/manifests/smoke_sqldb_catalog.yaml --allow-filename-mismatch

Render a safe execution plan without side effects:

dltaf manifest run \
  --manifest dltaf/examples/manifests/smoke_sqldb_catalog.yaml \
  --plan

Generate a new public-safe template:

dltaf manifest doctor \
  --template-kind sqldb_query \
  --pipeline-name dlt__oracle__to__clickhouse__raw

Generate Airflow DAG wrappers:

dltaf dags generate --manifests-dir ./manifests --output-dir ./generated_dags

Show lineage:

dltaf lineage show --format mermaid

Canonical built-ins

sqldb

sqldb is the canonical relational source kind.

Use mode: catalog when you want schema-and-table driven extraction:

  • PostgreSQL, MySQL, MSSQL, or other generic SQL databases
  • catalog-level table selection
  • canonical shape under source.catalog

Use mode: query when you want explicit Oracle SQL queries:

  • one or more named queries
  • query files under dltaf/examples/sql/ or your own repo
  • Oracle-specific options under source.dialect_options

mongodb

Use mongodb when you want one or more collections loaded through the bundled generic runtime:

  • explicit collection selection
  • optional table nesting control
  • manifest-level replace/append behavior through run.write_disposition

Compatibility aliases

dltaf still accepts older SQL source kinds as compatibility shims:

  • sql_database -> canonicalized to sqldb + dialect=generic + mode=catalog
  • oracle_custom_sql -> canonicalized to sqldb + dialect=oracle + mode=query
  • oracle -> canonical alias for Oracle query mode

The public recommendation is still to write new manifests directly in canonical sqldb form.

Private integrations

The OSS core uses three extension registries:

  • runner plugins
  • hook plugins
  • infra-check plugins

You can load private modules either from the environment or directly from a manifest:

run:
  runners:
    plugins:
      - internal.dltaf_plugins.customer_export.runner_plugin
  hooks:
    plugins:
      - internal.dltaf_plugins.shared.hooks
  online_checks:
    plugins:
      - internal.dltaf_plugins.customer_export.infra_checks

Or through environment variables:

export DLT_RUNNER_PLUGINS="internal.dltaf_plugins.customer_export.runner_plugin"
export DLT_HOOK_PLUGINS="internal.dltaf_plugins.shared.hooks"
export DLT_INFRA_CHECK_PLUGINS="internal.dltaf_plugins.customer_export.infra_checks"

This keeps the manifest contract stable even if the private catalog later moves from a monorepo to a private wheel.

The roadmap for evolving this split between OSS core and private integrations lives in ROADMAP.md.

Vault integration

dltaf resolves manifest Vault references through vault-kv-client.

Supported reference forms:

  • vault://mount/path
  • mount:path
  • mapping form with mount_point, path, and optional kv_version

That contract is intentionally simple and portable across local runs, CI, and Airflow.

Shipped examples

Canonical examples live under dltaf/examples/manifests/:

  • smoke_sqldb_catalog.yaml
  • smoke_sqldb_query.yaml
  • smoke_mongodb.yaml

Compatibility examples are also shipped for migration and search continuity:

  • smoke_sql_database_catalog.yaml
  • smoke_oracle_custom_sql.yaml
  • smoke_mongodb_catalog.yaml

All examples are sanitized. Replace the sample Vault refs and connection overrides with values from your own environment.

Documentation

Full docs live on GitHub Pages:

Development

Run the standard checks locally:

ruff check .
pytest
python -m build
mkdocs build --strict

Roadmap

The near-term focus is:

  • keep sqldb and mongodb boring, explicit, and stable
  • improve self-service docs, templates, and examples
  • make private registries easy to adopt from a monorepo or a private package index
  • preserve compatibility aliases long enough for staged migrations without surprise breakage

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dltaf-0.2.0.tar.gz (205.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dltaf-0.2.0-py3-none-any.whl (295.0 kB view details)

Uploaded Python 3

File details

Details for the file dltaf-0.2.0.tar.gz.

File metadata

  • Download URL: dltaf-0.2.0.tar.gz
  • Upload date:
  • Size: 205.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dltaf-0.2.0.tar.gz
Algorithm Hash digest
SHA256 84523a6318fdf5542b870127d66cdf8022cd8007f8a7bc655bd7f37063c7195e
MD5 df9372340f115458fca7850e2faedc32
BLAKE2b-256 959c0ec027702c3427f7b7e7bc4b3444164e418677048db2f97a29b37edf543a

See more details on using hashes here.

File details

Details for the file dltaf-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dltaf-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 295.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dltaf-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67e8491c983edd42b83b290cc4a1e48938724ce7622404fa06bc84762d524419
MD5 8e659499bf929b93a58b5a1802eefc21
BLAKE2b-256 7df919a781dfaddf61ebe0e44ad1ce816daacdd98420c3f0d3477fe98677d5dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page