Config-driven data ingestion and historization framework built on dlt

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

grindheim

These details have not been verified by PyPI

Project description

dlt-saga

Config-driven data ingestion and historization framework, built on dlt.

Why dlt-saga?

dlt is an excellent Python library for building data pipelines. dlt-saga adds the operational layer that teams need to run dlt at scale:

What you get	How
Zero-code pipelines	Drop a YAML file in `configs/` — no Python needed for common sources
SCD2 historization	`write_disposition: append+historize` turns any snapshot table into a full change history with `_dlt_valid_from` / `_dlt_valid_to`
dbt-style selectors	`saga ingest --select "tag:daily,group:api"` — union, intersection, glob patterns
Multi-environment profiles	`profiles.yml` with dev/prod targets, service account impersonation, per-environment datasets
Plugin architecture	Register custom sources and destinations via `packages.yml` or Python entry points — no framework fork needed
Cloud-agnostic	BigQuery today, Databricks and DuckDB included, more via plugins

If you are already using dlt directly and finding yourself re-implementing incremental state management, environment switching, or SCD2 transforms — dlt-saga is the config layer you are building.

Installation

pip install dlt-saga[bigquery]          # BigQuery
pip install dlt-saga[databricks,azure]  # Databricks on Azure
pip install dlt-saga                    # DuckDB only (no cloud dependencies)

Quick Start

# 1. Create and scaffold a project
mkdir my-pipelines && cd my-pipelines
saga init                               # prompts for destination and credentials

# 2. Authenticate to your destination (skip for DuckDB)
#    See: https://github.com/Glitni/dlt-saga/wiki/Getting-Started

# 3. List available pipelines
saga list

# 4. Run a pipeline
saga ingest --select "example__sample"

See the Getting Started guide for a full walkthrough, or browse example/ for a minimal runnable setup.

Local execution is the default. Use --orchestrate to fan out to parallel workers (requires orchestration: configured in saga_project.yml).

CLI Commands

All commands are subcommands under the saga entry point and share common options: --select, --verbose, --profile, --target.

Selectors (dbt-style)

Selectors filter which pipelines to run. They work across all commands.

Syntax	Meaning	Example
`name`	Exact pipeline name	`--select google_sheets__my_pipeline`
`glob`	Glob pattern	`--select "balance"`
`tag:name`	Filter by tag	`--select "tag:daily"` (schedule-aware — see Configuration → Scheduling tags)
`group:name`	Filter by source group	`--select "group:google_sheets"`
space-separated	UNION (OR)	`--select "tag:daily group:filesystem"`
comma-separated	INTERSECTION (AND)	`--select "tag:daily,group:google_sheets"`

Common Examples

# List pipelines
saga list                                        # All enabled pipelines
saga list --resource-type ingest                 # Ingest-enabled only
saga list --resource-type historize              # Historize-enabled only
saga list --select "tag:daily"                   # Filtered by tag

# Ingest
saga ingest --select "tag:daily"
saga ingest --select "group:api" --workers 8
saga ingest --full-refresh --select "my_pipeline"
saga ingest --select "group:api" --start-value-override "2026-01-01"  # Backfill

# Historize (SCD2)
saga historize --select "tag:daily"
saga historize --full-refresh --select "filesystem__*"

# Run (ingest + historize sequentially)
saga run --select "tag:daily"

# Update BigQuery access controls
saga update-access --select "group:google_sheets"

# Target a specific environment
saga ingest --target prod --select "tag:daily"   # production (with impersonation)

Adding a New Pipeline

Create a YAML config file in configs/<source_type>/ — that's it. The framework auto-discovers configs.

Supported source types out of the box: API, Database (PostgreSQL, MySQL, SQL Server, and more via ConnectorX), Filesystem (GCS, SFTP, local), Google Sheets, and SharePoint.

See the Pipeline Types guide for config examples for each source type, and the Configuration reference for all available fields.

Write Dispositions and Historize

The write_disposition field controls what operations are enabled for a pipeline:

Value	Ingest	Historize	Use Case
`append`	Yes	No	Raw event/log data
`merge`	Yes	No	Upsert on primary key
`replace`	Yes	No	Full refresh each run
`append+historize`	Yes	Yes	Snapshot → SCD2
`historize`	No	Yes	External data → SCD2

Historize transforms raw snapshot data into SCD2 tables with _dlt_valid_from, _dlt_valid_to, and _dlt_is_deleted columns. See the Historize guide for the full reference.

Community

GitHub Issues — bug reports and feature requests
GitHub Discussions — questions, ideas, show & tell
Contributing guide — how to get involved
dlt community — dlt Slack / Discord

Origin

dlt-saga is derived from an internal data ingestion framework originally built by Glitni for Amedia, a leading Nordic media group, as the ingestion layer of Amedia's data platform. Amedia supported open-sourcing the project and continues to fund ongoing development through their partnership with Glitni, enabling the framework to be shared with the broader community.

Project Structure

dlt-saga/
├── dlt_saga/              # Main package
│   ├── cli.py            #   CLI entry point (saga command)
│   ├── pipelines/        #   Built-in source implementations
│   │   ├── api/          #     Generic REST API pipeline
│   │   ├── database/     #     Database source (ConnectorX)
│   │   ├── filesystem/   #     Filesystem / GCS source
│   │   ├── google_sheets/#     Google Sheets source
│   │   └── sharepoint/   #     SharePoint source
│   ├── historize/        #   SCD2 historization engine
│   ├── destinations/     #   Destination implementations
│   │   ├── bigquery/     #     BigQuery
│   │   └── duckdb/       #     DuckDB (local development)
│   ├── pipeline_config/  #   Config discovery and parsing
│   ├── schemas/          #   Bundled static schemas (dlt_common.json)
│   └── utility/          #   Shared utilities (CLI, naming, orchestration)
├── example/              # Minimal runnable consumer project (DuckDB)
├── wiki/                 # Documentation (synced to GitHub wiki)
└── .dlt/                 # dlt runtime config overrides

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

grindheim

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.7

May 20, 2026

This version

0.2.6

May 19, 2026

0.2.5

May 19, 2026

0.2.4

May 13, 2026

0.2.3

May 13, 2026

0.2.2

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dlt_saga-0.2.6.tar.gz (276.3 kB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dlt_saga-0.2.6-py3-none-any.whl (324.7 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file dlt_saga-0.2.6.tar.gz.

File metadata

Download URL: dlt_saga-0.2.6.tar.gz
Upload date: May 19, 2026
Size: 276.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for dlt_saga-0.2.6.tar.gz
Algorithm	Hash digest
SHA256	`7449ab42f952b40ad03435af029dc37a37b7796bc5e6dbd0b910118c7ce5d8c5`
MD5	`21f5292299d0245326aa1877ab0f312a`
BLAKE2b-256	`e85c28fc505af400f9146622f999463091a185d21155e1b545de5867afb6724a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dlt_saga-0.2.6.tar.gz:

Publisher: publish.yml on Glitni/dlt-saga

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dlt_saga-0.2.6.tar.gz
- Subject digest: 7449ab42f952b40ad03435af029dc37a37b7796bc5e6dbd0b910118c7ce5d8c5
- Sigstore transparency entry: 1573255932
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: Glitni/dlt-saga@2046bfdc165b9080e86aecfb949fae34cd4bcec4
- Branch / Tag: refs/tags/v0.2.6
- Owner: https://github.com/Glitni
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2046bfdc165b9080e86aecfb949fae34cd4bcec4
- Trigger Event: push

File details

Details for the file dlt_saga-0.2.6-py3-none-any.whl.

File metadata

Download URL: dlt_saga-0.2.6-py3-none-any.whl
Upload date: May 19, 2026
Size: 324.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for dlt_saga-0.2.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`97e19f9a52cdd37466c601f968949cb523f07828cf779ff2104d44accf2306fd`
MD5	`85505da12edb75dee387a3dbffc6234e`
BLAKE2b-256	`6d55334b24522af0823b7ab4db3333afa315efb1cb0b5ec1e84241055ec3b1c2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dlt_saga-0.2.6-py3-none-any.whl:

Publisher: publish.yml on Glitni/dlt-saga

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dlt_saga-0.2.6-py3-none-any.whl
- Subject digest: 97e19f9a52cdd37466c601f968949cb523f07828cf779ff2104d44accf2306fd
- Sigstore transparency entry: 1573255970
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: Glitni/dlt-saga@2046bfdc165b9080e86aecfb949fae34cd4bcec4
- Branch / Tag: refs/tags/v0.2.6
- Owner: https://github.com/Glitni
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@2046bfdc165b9080e86aecfb949fae34cd4bcec4
- Trigger Event: push

dlt-saga 0.2.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

dlt-saga

Why dlt-saga?

Installation

Quick Start

CLI Commands

Selectors (dbt-style)

Common Examples

Adding a New Pipeline

Write Dispositions and Historize

Community

Further Reading

Origin

Project Structure

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance