Skip to main content

Manifest-driven data loading framework with canonical SQLDB manifests, Airflow helpers, and pluggable private integrations.

Project description

dltaf

CI Docs PyPI License

dltaf is a manifest-driven data loading framework built around three ideas:

  • canonical, reviewable YAML manifests
  • a stable OSS core for generic sources
  • extension registries that let private integrations stay private

The public repository ships a clean stage63-based core with:

  • canonical source.kind: sqldb for relational ingestion
  • built-in mongodb support
  • compatibility aliases for legacy SQL manifests such as sql_database, oracle_custom_sql, and oracle
  • Airflow DAG generation helpers
  • manifest linting, doctoring, scaffolding, and lineage tooling
  • Vault-backed secrets resolution through vault-kv-client

Private connectors such as internal APIs, Kafka-backed flows, or company-specific uploaders are intentionally not bundled into the OSS package. They should live in your monorepo or private package index and plug into the same runner, hook, and infra-check registries.

Why dltaf

  • Manifest-first: pipeline behavior stays diffable and reviewable
  • Canonical SQL model: one public SQL contract, with legacy aliases supported as migration shims
  • Plugin-first: private integrations extend the framework without forking it
  • Airflow-friendly: the same manifest can be linted locally, planned in CI, and executed in DAG wrappers
  • Self-service: example manifests, template generation, and migration guidance ship with the package

Installation

Lean core install for linting, planning, docs, template generation, and non-runtime tooling:

pip install dltaf

Common runtime profiles:

# Generic Airflow bridge + DAG builder helpers
pip install "dltaf[airflow]"

# ClickHouse destination + Vault-backed private plugin flows
pip install "dltaf[runtime]"

# PostgreSQL or other SQLDB catalog ingestion into ClickHouse
pip install "dltaf[clickhouse,sqldb,postgres]"

# Oracle query-driven ingestion into ClickHouse
pip install "dltaf[clickhouse,sqldb,oracle]"

# MongoDB ingestion into ClickHouse
pip install "dltaf[clickhouse,mongodb]"

Developer install:

git clone https://github.com/PaulKov/dltaf.git
cd dltaf
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .[dev]

The public package is intentionally split into extras so Airflow PythonVirtualenvOperator tasks and CI smoke jobs do not have to install Oracle, MongoDB, Vault, and every SQL driver when they only need one runtime slice.

Quick start

Validate the canonical SQL example:

dltaf manifest lint --manifest dltaf/examples/manifests/smoke_sqldb_catalog.yaml --allow-filename-mismatch

Render a safe execution plan without side effects:

dltaf manifest run \
  --manifest dltaf/examples/manifests/smoke_sqldb_catalog.yaml \
  --plan

Generate a new public-safe template:

dltaf manifest doctor \
  --template-kind sqldb_query \
  --pipeline-name dlt__oracle__to__clickhouse__raw

Generate Airflow DAG wrappers:

dltaf dags generate --manifests-dir ./manifests --output-dir ./generated_dags

Show lineage:

dltaf lineage show --format mermaid

Canonical built-ins

sqldb

sqldb is the canonical relational source kind.

Use mode: catalog when you want schema-and-table driven extraction:

  • PostgreSQL, MySQL, MSSQL, or other generic SQL databases
  • catalog-level table selection
  • canonical shape under source.catalog

Use mode: query when you want explicit Oracle SQL queries:

  • one or more named queries
  • query files under dltaf/examples/sql/ or your own repo
  • Oracle-specific options under source.dialect_options

mongodb

Use mongodb when you want one or more collections loaded through the bundled generic runtime:

  • explicit collection selection
  • optional table nesting control
  • manifest-level replace/append behavior through run.write_disposition

Compatibility aliases

dltaf still accepts older SQL source kinds as compatibility shims:

  • sql_database -> canonicalized to sqldb + dialect=generic + mode=catalog
  • oracle_custom_sql -> canonicalized to sqldb + dialect=oracle + mode=query
  • oracle -> canonical alias for Oracle query mode

The public recommendation is still to write new manifests directly in canonical sqldb form.

Private integrations

The OSS core uses three extension registries:

  • runner plugins
  • hook plugins
  • infra-check plugins

You can load private modules either from the environment or directly from a manifest:

run:
  runners:
    plugins:
      - internal.dltaf_plugins.customer_export.runner_plugin
  hooks:
    plugins:
      - internal.dltaf_plugins.shared.hooks
  online_checks:
    plugins:
      - internal.dltaf_plugins.customer_export.infra_checks

Or through environment variables:

export DLT_RUNNER_PLUGINS="internal.dltaf_plugins.customer_export.runner_plugin"
export DLT_HOOK_PLUGINS="internal.dltaf_plugins.shared.hooks"
export DLT_INFRA_CHECK_PLUGINS="internal.dltaf_plugins.customer_export.infra_checks"

This keeps the manifest contract stable even if the private catalog later moves from a monorepo to a private wheel.

The roadmap for evolving this split between OSS core and private integrations lives in ROADMAP.md.

Vault integration

dltaf resolves manifest Vault references through vault-kv-client.

Supported reference forms:

  • vault://mount/path
  • mount:path
  • mapping form with mount_point, path, and optional kv_version
  • mapping form with ref plus explicit kv_version

Recommended explicit KV v2 pattern:

connections:
  source:
    kind: postgres
    vault:
      ref: ${ENV:POSTGRES__VAULT_REF|company:postgres/example}
      kv_version: "2"

That contract is intentionally simple and portable across local runs, CI, and Airflow.

Shipped examples

Canonical examples live under dltaf/examples/manifests/:

  • smoke_sqldb_catalog.yaml
  • smoke_sqldb_query.yaml
  • smoke_mongodb.yaml

Compatibility examples are also shipped for migration and search continuity:

  • smoke_sql_database_catalog.yaml
  • smoke_oracle_custom_sql.yaml
  • smoke_mongodb_catalog.yaml

All examples are sanitized. Replace the sample Vault refs and connection overrides with values from your own environment.

Documentation

Full docs live on GitHub Pages:

Development

Run the standard checks locally:

ruff check .
pytest
python -m build
mkdocs build --strict

Roadmap

The near-term focus is:

  • keep sqldb and mongodb boring, explicit, and stable
  • improve self-service docs, templates, and examples
  • make private registries easy to adopt from a monorepo or a private package index
  • preserve compatibility aliases long enough for staged migrations without surprise breakage

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dltaf-0.2.7.tar.gz (239.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dltaf-0.2.7-py3-none-any.whl (325.6 kB view details)

Uploaded Python 3

File details

Details for the file dltaf-0.2.7.tar.gz.

File metadata

  • Download URL: dltaf-0.2.7.tar.gz
  • Upload date:
  • Size: 239.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dltaf-0.2.7.tar.gz
Algorithm Hash digest
SHA256 eb9827a5c63151ad0a9e2994dd77aa48938bc0fd552513ce3e9c2390463a9586
MD5 089620b1b1d318e0164cee86fabb90b1
BLAKE2b-256 995a46ba90ba96d49749383f1c7a88de55f324ce28503fdebdf1524a342a5789

See more details on using hashes here.

File details

Details for the file dltaf-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: dltaf-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 325.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dltaf-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c2455c9b7757486c54d55cc6d1689850e6f01a6e323ab1073fe2e4ec8f1613df
MD5 68c11e32a16460f3d03aa1e8dcaeab09
BLAKE2b-256 2a146bc82a0f60bc38a2dc83b2360c2f6078c4b16c4ac1ddd635a18640d4026b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page