Manifest-driven data loading framework with canonical SQLDB manifests, Airflow helpers, and pluggable private integrations.
Project description
dltaf
dltaf is a manifest-driven data loading framework built around three ideas:
- canonical, reviewable YAML manifests
- a stable OSS core for generic sources
- extension registries that let private integrations stay private
The public repository ships a clean stage63-based core with:
- canonical
source.kind: sqldbfor relational ingestion - built-in
mongodbsupport - compatibility aliases for legacy SQL manifests such as
sql_database,oracle_custom_sql, andoracle - Airflow DAG generation helpers
- manifest linting, doctoring, scaffolding, and lineage tooling
- Vault-backed secrets resolution through
vault-kv-client
Private connectors such as internal APIs, Kafka-backed flows, or company-specific uploaders are intentionally not bundled into the OSS package. They should live in your monorepo or private package index and plug into the same runner, hook, and infra-check registries.
Why dltaf
Manifest-first: pipeline behavior stays diffable and reviewableCanonical SQL model: one public SQL contract, with legacy aliases supported as migration shimsPlugin-first: private integrations extend the framework without forking itAirflow-friendly: the same manifest can be linted locally, planned in CI, and executed in DAG wrappersSelf-service: example manifests, template generation, and migration guidance ship with the package
Installation
Lean core install for linting, planning, docs, template generation, and non-runtime tooling:
pip install dltaf
Common runtime profiles:
# Generic Airflow bridge + DAG builder helpers
pip install "dltaf[airflow]"
# ClickHouse destination + Vault-backed private plugin flows
pip install "dltaf[runtime]"
# PostgreSQL or other SQLDB catalog ingestion into ClickHouse
pip install "dltaf[clickhouse,sqldb,postgres]"
# Oracle query-driven ingestion into ClickHouse
pip install "dltaf[clickhouse,sqldb,oracle]"
# MongoDB ingestion into ClickHouse
pip install "dltaf[clickhouse,mongodb]"
Developer install:
git clone https://github.com/PaulKov/dltaf.git
cd dltaf
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .[dev]
The public package is intentionally split into extras so Airflow PythonVirtualenvOperator
tasks and CI smoke jobs do not have to install Oracle, MongoDB, Vault, and every SQL driver
when they only need one runtime slice.
Quick start
Validate the canonical SQL example:
dltaf manifest lint --manifest dltaf/examples/manifests/smoke_sqldb_catalog.yaml --allow-filename-mismatch
Render a safe execution plan without side effects:
dltaf manifest run \
--manifest dltaf/examples/manifests/smoke_sqldb_catalog.yaml \
--plan
Generate a new public-safe template:
dltaf manifest doctor \
--template-kind sqldb_query \
--pipeline-name dlt__oracle__to__clickhouse__raw
Generate Airflow DAG wrappers:
dltaf dags generate --manifests-dir ./manifests --output-dir ./generated_dags
Show lineage:
dltaf lineage show --format mermaid
Canonical built-ins
sqldb
sqldb is the canonical relational source kind.
Use mode: catalog when you want schema-and-table driven extraction:
- PostgreSQL, MySQL, MSSQL, or other generic SQL databases
- catalog-level table selection
- canonical shape under
source.catalog
Use mode: query when you want explicit Oracle SQL queries:
- one or more named queries
- query files under
dltaf/examples/sql/or your own repo - Oracle-specific options under
source.dialect_options
mongodb
Use mongodb when you want one or more collections loaded through the bundled generic runtime:
- explicit collection selection
- optional table nesting control
- manifest-level replace/append behavior through
run.write_disposition
Compatibility aliases
dltaf still accepts older SQL source kinds as compatibility shims:
sql_database-> canonicalized tosqldb + dialect=generic + mode=catalogoracle_custom_sql-> canonicalized tosqldb + dialect=oracle + mode=queryoracle-> canonical alias for Oracle query mode
The public recommendation is still to write new manifests directly in canonical sqldb form.
Private integrations
The OSS core uses three extension registries:
- runner plugins
- hook plugins
- infra-check plugins
You can load private modules either from the environment or directly from a manifest:
run:
runners:
plugins:
- internal.dltaf_plugins.customer_export.runner_plugin
hooks:
plugins:
- internal.dltaf_plugins.shared.hooks
online_checks:
plugins:
- internal.dltaf_plugins.customer_export.infra_checks
Or through environment variables:
export DLT_RUNNER_PLUGINS="internal.dltaf_plugins.customer_export.runner_plugin"
export DLT_HOOK_PLUGINS="internal.dltaf_plugins.shared.hooks"
export DLT_INFRA_CHECK_PLUGINS="internal.dltaf_plugins.customer_export.infra_checks"
This keeps the manifest contract stable even if the private catalog later moves from a monorepo to a private wheel.
The roadmap for evolving this split between OSS core and private integrations lives in ROADMAP.md.
Vault integration
dltaf resolves manifest Vault references through vault-kv-client.
Supported reference forms:
vault://mount/pathmount:path- mapping form with
mount_point,path, and optionalkv_version - mapping form with
refplus explicitkv_version
Recommended explicit KV v2 pattern:
connections:
source:
kind: postgres
vault:
ref: ${ENV:POSTGRES__VAULT_REF|company:postgres/example}
kv_version: "2"
That contract is intentionally simple and portable across local runs, CI, and Airflow.
Shipped examples
Canonical examples live under dltaf/examples/manifests/:
smoke_sqldb_catalog.yamlsmoke_sqldb_query.yamlsmoke_mongodb.yaml
Compatibility examples are also shipped for migration and search continuity:
smoke_sql_database_catalog.yamlsmoke_oracle_custom_sql.yamlsmoke_mongodb_catalog.yaml
All examples are sanitized. Replace the sample Vault refs and connection overrides with values from your own environment.
Documentation
Full docs live on GitHub Pages:
- Docs: https://paulkov.github.io/dltaf/
- Getting started: https://paulkov.github.io/dltaf/getting-started/
- Installation profiles: https://paulkov.github.io/dltaf/installation-profiles/
- Examples: https://paulkov.github.io/dltaf/examples/
- Plugins: https://paulkov.github.io/dltaf/plugins/
- Airflow: https://paulkov.github.io/dltaf/airflow/
Development
Run the standard checks locally:
ruff check .
pytest
python -m build
mkdocs build --strict
Roadmap
The near-term focus is:
- keep
sqldbandmongodbboring, explicit, and stable - improve self-service docs, templates, and examples
- make private registries easy to adopt from a monorepo or a private package index
- preserve compatibility aliases long enough for staged migrations without surprise breakage
License
Apache-2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dltaf-0.2.6.tar.gz.
File metadata
- Download URL: dltaf-0.2.6.tar.gz
- Upload date:
- Size: 236.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f809843825a2b5e3ddc0aec9466ea284c2d275498fbc5dd6e134c98fa04f347d
|
|
| MD5 |
9eafb0b8c6d3176dfe7649b4d2bf5d3e
|
|
| BLAKE2b-256 |
20c2ee21fd19b425b66a3fe65f3794dafe5b068e105d6db0b976660cc83e8975
|
File details
Details for the file dltaf-0.2.6-py3-none-any.whl.
File metadata
- Download URL: dltaf-0.2.6-py3-none-any.whl
- Upload date:
- Size: 322.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0fd235673bb9bf41baba94381f5d81216f1b805a45dad3fa8ee05cc34ed469ff
|
|
| MD5 |
50e634606322218e63f444a79c430e8b
|
|
| BLAKE2b-256 |
98959542868760d0f46f4a2aedc61e394c4bf359c4ee4d081ba2a6377e7b6f9c
|