Git-native data modeling for dbt users
Project description
DataLex
Git-native data modeling for dbt users.
Point us at your dbt project and warehouse — we produce versioned, reviewable YAML with contracts, lineage, ERDs, and clean round-trip back to dbt.
60-second demo
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
pip install -e '.[duckdb]'
# 1. Build a local DuckDB warehouse (no external credentials)
python examples/jaffle_shop_demo/setup.py
# 2. Sync the dbt project into DataLex YAML
./datalex datalex dbt sync examples/jaffle_shop_demo \
--out-root examples/jaffle_shop_demo/datalex-out
# 3. Emit dbt-parseable YAML back, with contracts enforced
./datalex datalex dbt emit examples/jaffle_shop_demo/datalex-out \
--out-dir examples/jaffle_shop_demo/dbt-out
Open examples/jaffle_shop_demo/datalex-out/sources/jaffle_shop_raw.yaml —
every column has its warehouse type, descriptions from the manifest, and a
meta.datalex.dbt.unique_id stamp so re-running the sync never clobbers
anything you've hand-authored.
What it does
DataLex treats your data models as code. On top of a stricter YAML
substrate (the DataLex layout — one file per entity, kind:-dispatched,
streaming-safe for 10K+ entities), it gives you:
datalex datalex dbt sync <project>— readstarget/manifest.json+ yourprofiles.yml, introspects live column types, and merges them into DataLex YAML. Idempotent: user-authoreddescription:,tags:,sensitivity:, andtests:survive re-sync.datalex datalex dbt emit— writessources.ymlandschema.ymlwithcontract.enforced: trueanddata_type:on every column.dbt parsesucceeds out of the box.datalex datalex emit ddl --dialect ...— Postgres, Snowflake, BigQuery, Databricks, MySQL, SQL Server, Redshift. Same source, all dialects.datalex datalex diff— semantic diff with explicit rename tracking (previous_name:), breaking-change gate for CI.- Cross-repo package imports — pin
acme/warehouse-core@1.4.0inimports:, lockfile + content hash drift detection, Git-or-path resolution, on-disk parse cache for large projects. - Visual studio — React Flow UI for editing entities, relationships, and metadata; same YAML files as the CLI.
Supported warehouses
| Warehouse | dbt sync introspection |
Forward DDL | Reverse engineering |
|---|---|---|---|
| DuckDB | ✓ | — | — |
| PostgreSQL | ✓ | ✓ | ✓ |
| Snowflake | (fallback) | ✓ | ✓ |
| BigQuery | (fallback) | ✓ | ✓ |
| Databricks | (fallback) | ✓ | ✓ |
| MySQL | (fallback) | ✓ | ✓ |
| SQL Server / Azure SQL | (fallback) | ✓ | ✓ |
| Redshift | (fallback) | ✓ | ✓ |
"Fallback" = uses the existing full-schema connector (slower than the per-table path but already works today; a narrow introspection path ships per-dialect over time).
Install
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
python3 -m venv .venv
source .venv/bin/activate
pip install -e . # puts `datalex` on PATH
pip install -e '.[duckdb]' # add warehouse drivers you need
# optional — only needed for the Visual Studio
npm --prefix packages/api-server install
npm --prefix packages/web-app install
Available extras: duckdb, postgres, mysql, snowflake,
bigquery, databricks, sqlserver, redshift, or all.
Prereqs: Python 3.9+, Git. Node.js 18+ if you want the UI.
Project layout
DataLex/
packages/
core_engine/ # Python: loader, dialects, dbt integration, packages
src/datalex_core/
_schemas/datalex/ # JSON Schema per `kind:` — bundled with the package
cli/ # `datalex` entry point
api-server/ # Node.js API (UI backend)
web-app/ # React Flow studio
examples/
jaffle_shop_demo/ # zero-setup dbt-sync demo (DuckDB)
model-examples/ # sample projects and scenario walkthroughs
docs/ # architecture, specs, runbooks
tests/ # unittest suite (core engine + datalex)
Visual Studio (optional)
If you want the UI on top of your DataLex project, run the two dev servers:
# Terminal 1
npm --prefix packages/api-server run dev
# Terminal 2
npm --prefix packages/web-app run dev
Then open http://localhost:5173. The UI reads and writes the same YAML
files the CLI does — no database, no hosted service.
CI / GitOps
DataLex is designed to live in your repo next to your dbt project. A typical CI step:
./datalex datalex validate datalex/
./datalex datalex diff datalex-main/ datalex/ --exit-on-breaking
./datalex datalex dbt emit datalex/ --out-dir dbt/
dbt parse
Documentation
- Tutorial: dbt sync in 5 minutes — the full jaffle_shop walkthrough with explanations.
- DataLex layout reference — what each
kind:file looks like and how the loader discovers them. - CLI cheat sheet — every
datalex datalex …subcommand on one page. - Architecture — core engine modules and end-to-end data flow.
- Pre-DataLex specs have moved to docs/archive/.
Community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datalex_cli-0.1.1.tar.gz.
File metadata
- Download URL: datalex_cli-0.1.1.tar.gz
- Upload date:
- Size: 185.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d41b567c2db2f458bf432e3416a4eaf1a4ac1a75736478a16d5fc4db73eda820
|
|
| MD5 |
386adb0399587e934c75dac639828f8c
|
|
| BLAKE2b-256 |
c3b3f2deb6c425d9552c7ad1c9caf325560f260f345c3dbed25b753616740a97
|
Provenance
The following attestation bundles were made for datalex_cli-0.1.1.tar.gz:
Publisher:
publish.yml on duckcode-ai/DataLex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datalex_cli-0.1.1.tar.gz -
Subject digest:
d41b567c2db2f458bf432e3416a4eaf1a4ac1a75736478a16d5fc4db73eda820 - Sigstore transparency entry: 1339988481
- Sigstore integration time:
-
Permalink:
duckcode-ai/DataLex@aab375cb6d930957633a2fbe9031881d17f1a989 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/duckcode-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aab375cb6d930957633a2fbe9031881d17f1a989 -
Trigger Event:
push
-
Statement type:
File details
Details for the file datalex_cli-0.1.1-py3-none-any.whl.
File metadata
- Download URL: datalex_cli-0.1.1-py3-none-any.whl
- Upload date:
- Size: 180.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96d184b1c0044416d786c1ec5cc5b2a14d6b225a9db80f71542d9b3f946594cd
|
|
| MD5 |
66a36098d39a215e56d513b673c76125
|
|
| BLAKE2b-256 |
3268cf6a91c70704287c56db891a7e1dca337b0f72798c1b8f130c3cff5bab7c
|
Provenance
The following attestation bundles were made for datalex_cli-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on duckcode-ai/DataLex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datalex_cli-0.1.1-py3-none-any.whl -
Subject digest:
96d184b1c0044416d786c1ec5cc5b2a14d6b225a9db80f71542d9b3f946594cd - Sigstore transparency entry: 1339988484
- Sigstore integration time:
-
Permalink:
duckcode-ai/DataLex@aab375cb6d930957633a2fbe9031881d17f1a989 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/duckcode-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@aab375cb6d930957633a2fbe9031881d17f1a989 -
Trigger Event:
push
-
Statement type: