Skip to main content

Manifest-driven data loading toolkit with Airflow helpers and pluggable source integrations.

Project description

dltaf

dltaf is a manifest-driven toolkit for building repeatable data-loading pipelines with dlt, optional Airflow DAG generation, and a plugin-first extension model.

The public package ships a clean OSS core:

  • built-in source kinds for oracle_custom_sql, sql_database, and mongodb
  • a unified source plugin registry
  • Airflow runtime helpers for local, packaged, and virtualenv execution
  • a Vault integration layer powered by vault-kv-client
  • documentation and examples that stay safe to publish

Private integrations are intentionally not bundled into this repository. They can live in your monorepo, a private package index, or both, while still using the same source.kind contract.

Why dltaf

  • YAML-first: manifests stay readable and reviewable
  • plugin-first: internal connectors plug in without forking the OSS core
  • Airflow-friendly: isolated virtualenv tasks can resolve both the core package and private plugin requirements
  • Vault-ready: one consistent secrets contract for source and destination credentials
  • self-service: examples, docs, CLI inspection tools, and smoke-friendly workflows are included

Installation

Runtime install:

pip install dltaf

Developer install:

git clone https://github.com/PaulKov/dltaf.git
cd dltaf
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e .[dev]

Quick start

Validate an example manifest:

dltaf-run --manifest dltaf/examples/manifests/smoke_sql_database_catalog.yaml --validate-only

Inspect available plugins:

dltaf plugins list
dltaf plugins inspect sql_database
dltaf plugins doctor --manifest dltaf/examples/manifests/smoke_mongodb_catalog.yaml

Generate Airflow DAG files from a manifests directory:

dltaf-generate-dags --manifests-dir ./manifests --output-dir ./generated_dags

Render lineage for a manifests directory:

dltaf-show-lineage --manifests-dir ./manifests --format mermaid

Built-in source kinds

oracle_custom_sql

Use explicit SQL files and per-query metadata:

  • one manifest can drive multiple queries
  • merge mode can enforce primary_key
  • SQL files stay separate from YAML

sql_database

Use dlt.sources.sql_database in either:

  • single-schema mode with schema + tables
  • multi-schema mode with schemas: {schema_name: {tables: [...]}}

mongodb

Use the bundled MongoDB runtime for:

  • one or many collections
  • optional collection filters and nesting control
  • replace/append behavior through the manifest run section

Private plugin UX

The public core is designed so private connectors can stay private without degrading developer experience.

Option 1: local monorepo catalog

Point dltaf to a local plugin catalog:

export DLTAF_PLUGIN_PATHS="/path/to/monorepo/internal/dltaf_plugins"
dltaf plugins list

This is the softest rollout path when your private catalog still lives inside an existing monorepo.

Option 2: importable plugin modules

Point dltaf to importable module names:

export DLTAF_PLUGIN_MODULES="company_private_plugins,team_connectors"
dltaf plugins list

Option 3: installed private packages

Install a private package that exposes entry points in the dltaf.plugins group. dltaf will discover them automatically.

Plugin contract

Every plugin registers one or more SourcePlugin objects with:

  • kind
  • validate(manifest)
  • build_runtime_env(manifest) if needed
  • run(manifest)

Canonical recommendation for private kinds:

internal.customer_export
internal.partner_events
company.some_connector

Scaffold a new plugin

dltaf scaffold plugin --kind internal.customer_export --output-dir ./internal/dltaf_plugins

Airflow

dltaf ships Airflow helpers for:

  • generating DAGs from manifests
  • loading run_manifest() inside standard or virtualenv tasks
  • propagating plugin paths, plugin modules, and plugin-specific requirements to isolated runtimes

Useful runtime environment variables:

  • DLTAF_PACKAGE_ROOT
  • DLTAF_PLUGIN_PATHS
  • DLTAF_PLUGIN_MODULES
  • DLTAF_PLUGIN_REQUIREMENTS

See the full guide in GitHub Pages.

Vault integration

dltaf resolves manifest Vault references through vault-kv-client.

Supported reference forms:

  • vault://mount/path
  • mount:path
  • mapping form with mount_point, path, and optional kv_version

This keeps the secrets contract stable across local runs, CI, and Airflow.

Examples

The repository ships sanitized examples under dltaf/examples/:

  • smoke_oracle_custom_sql.yaml
  • smoke_sql_database_catalog.yaml
  • smoke_mongodb_catalog.yaml

They are intentionally generic. Replace the sample Vault refs and connection settings with your own environment before running them against a live system.

Documentation

Full docs live on GitHub Pages:

Development

Run the standard checks locally:

ruff check .
pytest
python -m build
mkdocs build

License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dltaf-0.1.0.tar.gz (68.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dltaf-0.1.0-py3-none-any.whl (76.1 kB view details)

Uploaded Python 3

File details

Details for the file dltaf-0.1.0.tar.gz.

File metadata

  • Download URL: dltaf-0.1.0.tar.gz
  • Upload date:
  • Size: 68.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dltaf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9e90a5576b4fc4855dc4ea6bc7d0bf9b88fbab8992705cbddc2bc1d5f7e4d8e4
MD5 7cb1b2d8a2514e62d9659be470b766f1
BLAKE2b-256 cf918eb3141127c0df461f79f371a0495becb79118753a833356a16569db4767

See more details on using hashes here.

File details

Details for the file dltaf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dltaf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 76.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dltaf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f29277a4bfa28a46f542285382b63f38576e939e402e7b2b95f525e0957bc85b
MD5 1a858d448e0242a28676d8fc1e7b3974
BLAKE2b-256 4276c003d09b9b6d6473bfe7011416b322af6534933422a1abba0e9e581c00cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page