DVT — cross-engine data transformation tool with DuckDB federation.
Project description
DVT — Data Virtualization Tool
DVT is a cross-engine data transformation tool built on top of dbt-core. Write SQL models that reference sources on any database, and DVT automatically handles cross-engine data movement and materializes results to any target.
No custom connectors. No complex config. Just SQL.
Architecture
DVT extends dbt with three execution paths:
┌─────────────────────────────────────────────────────────────┐
│ DVT Execution Engine │
│ │
│ ┌──────────────┐ Same engine? ┌──────────────────────┐ │
│ │ Model SQL │───── YES ───────▶│ Adapter Pushdown │ │
│ │ │ │ (standard dbt) │ │
│ │ source A │ └──────────────────────┘ │
│ │ source B │ │
│ │ ... │ Different ┌──────────────────────┐ │
│ │ │── engines? ──────▶│ Federation Path │ │
│ └──────────────┘ YES │ │ │
│ │ Sling extracts ──┐ │ │
│ │ DuckDB joins ──┤ │ │
│ │ Sling loads ──┘ │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Pushdown Path (Same Engine)
When all sources and the target are on the same engine, DVT behaves identically to dbt — the adapter executes SQL directly on the database.
Extraction Path (Cross-Engine Federation)
When sources live on different engines than the target:
- Extract: Sling pulls source data from each remote engine
- Cache: Data is cached in a local DuckDB instance (
.dvt/cache.duckdb) - Transform: DuckDB executes the model SQL (joins, aggregations, etc.)
- Load: Sling pushes the result to the target engine
This is invisible to the user — DVT shows standard dbt-like output.
Supported Engines
DVT supports 13 database engines through dvt-adapters (one package, all engines):
| Engine | Type | Pushdown | Extract From | Load To |
|---|---|---|---|---|
| PostgreSQL | OLTP | Yes | Yes | Yes |
| MySQL | OLTP | Yes | Yes | Yes |
| MariaDB | OLTP | Yes | Yes | Yes |
| SQL Server | OLTP | Yes | Yes | Yes |
| Oracle | OLTP | Yes | Yes | Yes |
| Snowflake | Cloud DW | Yes | Yes | Yes |
| Databricks | Cloud DW | Yes | Yes | Yes |
| BigQuery | Cloud DW | Yes | Yes | Yes |
| Redshift | Cloud DW | Yes | Yes | Yes |
| DuckDB | Embedded | Yes | Yes | Yes |
| Spark | Distributed | Yes | Yes | Yes |
| Fabric | Cloud DW | Yes | Yes | Yes |
| MySQL 5 | Legacy | Yes | Yes | Yes |
Any source can feed into any target. DVT handles the data movement automatically.
Installation
pip install dvt-ce dvt-adapters
Or with uv (recommended):
uv add dvt-ce dvt-adapters
Environment Bootstrap
After installing, run dvt sync to automatically install:
- Database drivers for all configured connections
- DuckDB extensions (httpfs, postgres, mysql, etc.)
- Cloud SDKs (for Snowflake, BigQuery, Databricks)
- Sling binary (for data extraction/loading)
dvt sync
dvt sync is self-healing — it can run even when dbt imports are broken, and will fix the environment.
Quick Start
# 1. Initialize a new project
dvt init my_project
cd my_project
# 2. Configure connections in ~/.dvt/profiles.yml (see below)
# 3. Bootstrap the environment
dvt sync
# 4. Test all connections
dvt debug
# 5. Load seed data
dvt seed
# 6. Run all models
dvt run
# 7. Generate documentation with cross-engine lineage
dvt docs generate
dvt docs serve
Configuration
Profiles (~/.dvt/profiles.yml)
DVT looks for profiles in ~/.dvt/ first, then falls back to ~/.dbt/.
my_project:
target: pg_dev # Default target
outputs:
# ─── PostgreSQL ───
pg_dev:
type: postgres
host: localhost
port: 5432
user: analyst
password: secret
dbname: warehouse
schema: public
# ─── Snowflake ───
sf_prod:
type: snowflake
account: my-account
user: loader
password: secret
database: ANALYTICS
schema: PUBLIC
warehouse: COMPUTE_WH
# ─── MySQL ───
mysql_prod:
type: mysql
host: mysql.example.com
port: 3306
user: reader
password: secret
database: crm
schema: crm
# ─── Databricks ───
dbx_dev:
type: databricks
host: my-workspace.cloud.databricks.com
http_path: /sql/1.0/warehouses/abc123
token: dapi...
catalog: main
schema: analytics
Sources (models/sources.yml)
The connection: field tells DVT which engine each source lives on:
version: 2
sources:
# Sources on PostgreSQL (default target — no connection: needed)
- name: app_db
schema: public
tables:
- name: users
- name: orders
- name: products
# Sources on MySQL (remote — needs connection:)
- name: crm
connection: mysql_prod # Maps to profiles.yml output name
schema: crm
tables:
- name: customers
- name: contacts
# Sources on Snowflake (remote)
- name: marketing
connection: sf_prod
schema: PUBLIC
tables:
- name: campaigns
- name: ad_spend
# Sources on Databricks (remote)
- name: data_lake
connection: dbx_dev
schema: raw
tables:
- name: events
- name: sessions
Rule: Sources on the default target's engine type don't need connection: — they follow --target automatically. Remote sources (different engines) must have connection:.
Model Configuration
-- models/dim_customer_360.sql
-- Cross-engine join: Postgres + MySQL + Snowflake → Databricks
{{
config(
materialized='table',
target='dbx_dev' -- Materialize to Databricks
)
}}
SELECT
u.user_id,
u.email,
c.customer_name,
c.lifetime_value,
m.total_ad_spend,
e.session_count
FROM {{ source('app_db', 'users') }} u -- Postgres
LEFT JOIN {{ source('crm', 'customers') }} c -- MySQL
ON u.email = c.email
LEFT JOIN {{ source('marketing', 'ad_spend') }} m -- Snowflake
ON u.user_id = m.user_id
LEFT JOIN (
SELECT user_id, COUNT(*) as session_count
FROM {{ source('data_lake', 'sessions') }} -- Databricks
GROUP BY user_id
) e ON u.user_id = e.user_id
DVT detects the 4 different engines, extracts all sources to DuckDB, executes the join, and loads the result to Databricks. The user sees standard dbt output.
Incremental Models
Incremental models work across engines with watermark-based extraction:
{{
config(
materialized='incremental',
incremental_strategy='append',
target='sf_prod'
)
}}
SELECT *
FROM {{ source('app_db', 'orders') }}
{% if is_incremental() %}
WHERE order_date > (SELECT MAX(order_date) FROM {{ this }})
{% endif %}
DVT reads the watermark from the target, extracts only new rows from the source, and appends them.
Two Dialects in One Project
A DVT project naturally contains SQL in two dialects:
| Execution Path | SQL Dialect | When |
|---|---|---|
| Pushdown | Target's native SQL (Snowflake SQL, PostgreSQL, etc.) | Source and target on same engine |
| Extraction | DuckDB SQL (Postgres-like, universal) | Sources on different engines than target |
Both coexist in the same project. The dialect is determined automatically by the execution path.
Commands Reference
Core Commands
| Command | Description |
|---|---|
dvt run |
Compile and execute models against targets |
dvt run --full-refresh |
Rebuild all models from scratch |
dvt run --select model_name |
Run specific models |
dvt run --select +model_name |
Run model and all ancestors |
dvt run --select tag:daily |
Run models by tag |
dvt build |
Seeds + models + snapshots + tests in DAG order |
dvt seed |
Load CSV seed files via Sling (10-100x faster than dbt) |
dvt test |
Run data tests |
dvt compile |
Compile SQL without executing |
DVT-Specific Commands
| Command | Description |
|---|---|
dvt sync |
Self-healing environment bootstrap — installs all drivers, DuckDB extensions, Sling binary, and cloud SDKs |
dvt debug |
Test all connections with clean status output |
dvt debug --target sf_prod |
Test a specific connection |
dvt show --select model_name |
Query model locally via DuckDB (no target needed) |
dvt retract |
Drop all models from targets in reverse DAG order |
dvt retract --select +model |
Drop a model and all its ancestors |
dvt clean |
Remove build artifacts and DuckDB cache |
Documentation Commands
| Command | Description |
|---|---|
dvt docs generate |
Generate cross-engine catalog with engine-colored lineage graph |
dvt docs serve |
Serve documentation website locally |
The documentation UI shows:
- Engine-colored nodes in the lineage graph (each engine has its brand color)
- Connection badges on source and model nodes
- Column metadata from all engines with native data types
- Target and engine info in model detail panels
Passthrough Commands (Standard dbt)
| Command | Description |
|---|---|
dvt snapshot |
Run snapshot models |
dvt deps |
Install dbt packages |
dvt init |
Initialize a new project |
dvt list |
List resources in the project |
DuckDB Cache
DVT maintains a persistent DuckDB cache at .dvt/cache.duckdb:
- Source tables:
{source}__{table}— shared across models, reused between runs - Model results:
__model__{name}— for incremental{{ this }}references dvt run --full-refreshrebuilds the cachedvt cleandeletes the.dvt/directory entirely
--target Philosophy
--target switches between same-engine environments, not between engines:
# Correct: switch between Snowflake environments
dvt run --target dev_snowflake
dvt run --target prod_snowflake
# Risky: switching engine types breaks pushdown models
dvt run --target dev_snowflake # Snowflake SQL
dvt run --target mysql_docker # MySQL can't parse Snowflake SQL
Extraction models (DuckDB SQL) are unaffected by --target changes. Pushdown models are written in the target's dialect and will fail if the engine type changes.
dbt Compatibility
When using a single adapter with no cross-engine references, DVT works identically to dbt. All dbt projects are valid DVT projects. DVT adds federation capabilities on top — it never removes dbt functionality.
Links
- PyPI: dvt-ce | dvt-adapters
- GitHub: dvt-ce | dvt-adapters
- Issues: Report a bug
License
DVT is built on top of dbt-core (Apache 2.0).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dvt_ce-0.1.7.tar.gz.
File metadata
- Download URL: dvt_ce-0.1.7.tar.gz
- Upload date:
- Size: 926.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba3725c7119121c086c1e09bca165d273b70b1d5159e8f3be78806ded4b8f99d
|
|
| MD5 |
68cff7205026a257dc5a72b17d2d2b3c
|
|
| BLAKE2b-256 |
f7ddad04907bffb846dfe7d445b2a530268aa5bc5dff73a6f84eb5ae2123df7e
|
File details
Details for the file dvt_ce-0.1.7-py3-none-any.whl.
File metadata
- Download URL: dvt_ce-0.1.7-py3-none-any.whl
- Upload date:
- Size: 1.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bfe8a378b3b24d52f06fa69fd38f9f1612c6c371370a9b1296692206796bf05
|
|
| MD5 |
5c23f86f3b30e85e5c7d503469abddc7
|
|
| BLAKE2b-256 |
d97a1a2f97b11b4a52a4e44ba4942681a0d6fc97063882115b1ca7aa170503a1
|