Git-native data modeling for dbt users
Project description
DataLex
Git-native data modeling for dbt users.
Point us at your dbt project and warehouse — we produce versioned, reviewable YAML with contracts, lineage, ERDs, and clean round-trip back to dbt.
Quickstart — two commands
pip install -U 'datalex-cli[serve]' # CLI + bundled Node — one command, no prereqs
datalex serve # opens http://localhost:3030
That's it for most machines. No Docker, no database, and only one
terminal. The [serve] extra pulls a portable Node runtime. If you
already have Node 20+ on PATH, plain pip install datalex-cli works
too.
Prefer Docker?
Skip the local Python/Node setup — clone the repo, build the image once,
run it. Useful when a company laptop blocks pip install, or when
you'd rather not pin a venv:
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
docker build -t datalex:local .
docker run --rm -p 3030:3001 datalex:local # opens http://localhost:3030
To point Docker at an existing dbt repo on your host, mount it and set
REPO_ROOT:
cd ~/path/to/your-dbt-project
docker run --rm -p 3030:3001 \
-v "$PWD":/workspace \
-e REPO_ROOT=/workspace \
-e DM_CLI=/app/datalex \
datalex:local
In the UI, use /workspace as the dbt repository path. Full Docker
notes (env vars, troubleshooting) live in
Install → Docker Fallback.
When the app opens — the Onboarding Journey (1.4.1)
A 480px right-rail panel slides in on first launch and walks you through six concrete actions. Each step has a primary button that opens the right dialog and auto-advances when the underlying event fires. Progress is saved — close the panel anytime and resume where you left off.
| # | Step | What it does |
|---|---|---|
| 1 | Welcome to DataLex | Two-line value prop — click Let's go |
| 2 | Connect your project | Opens the Import dialog. Paste a Git URL (try https://github.com/duckcode-ai/jaffle-shop-DataLex) or pick a local dbt folder. Edit in place writes back to your real YAML. |
| 3 | See what's missing | Activates the Validation drawer; click any red file to view readiness gaps |
| 4 | Design your first business domain | + opens the New Logical Entity dialog (Customer, Order, …) |
| 5 | Add your AI provider | Opens Settings → AI; paste an OpenAI / Anthropic / local-LLM key |
| 6 | Ask AI to draw a diagram | One-click Conceptualizer proposes entities + relationships from your staging models |
Replay anytime via Settings → Replay onboarding. The deeper 13-step spotlight tour is still there under Settings → Deep feature tour.
Point it at your dbt repo:
cd ~/my-dbt-project # folder containing dbt_project.yml
datalex serve --project-dir .
The folder auto-registers as your active project; the browser opens
straight into your real file tree. Every UI edit writes back to the
original .yml files — git status shows real diffs.
See docs/getting-started.md for the full path matrix (demo → local dbt → git URL → live warehouse).
Want your warehouse drivers too?
pip install 'datalex-cli[serve,postgres]' # or snowflake, bigquery, databricks…
pip install 'datalex-cli[serve,all]' # every driver + Node
Pick a tutorial
Once datalex serve is running, follow the path that matches what you
have in hand:
| You have... | Tutorial | Time |
|---|---|---|
| Nothing — want to try with a known-good dbt repo | Walk through jaffle-shop end-to-end | 5 min |
| An existing dbt project (folder or git) | Import an existing dbt project | 5 min |
| A live warehouse (Snowflake/Postgres/…) | Pull a warehouse schema | 7 min |
| CLI-only, no UI | CLI dbt-sync tutorial | 5 min |
New here? Start with docs/getting-started.md — it's the map across all four paths plus the mental model.
60-second demo (offline, no warehouse)
pip install 'datalex-cli[duckdb]'
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
# 1. Build a local DuckDB warehouse (no external credentials)
python examples/jaffle_shop_demo/setup.py
# 2. Sync the dbt project into DataLex YAML
datalex datalex dbt sync examples/jaffle_shop_demo \
--out-root examples/jaffle_shop_demo/datalex-out
# 3. Emit dbt-parseable YAML back, with contracts enforced
datalex datalex dbt emit examples/jaffle_shop_demo/datalex-out \
--out-dir examples/jaffle_shop_demo/dbt-out
Open examples/jaffle_shop_demo/datalex-out/sources/jaffle_shop_raw.yaml —
every column has its warehouse type, descriptions from the manifest, and a
meta.datalex.dbt.unique_id stamp so re-running the sync never clobbers
anything you've hand-authored.
What it does
DataLex treats your data models as code. On top of a stricter YAML
substrate (the DataLex layout — one file per entity, kind:-dispatched,
streaming-safe for 10K+ entities), it gives you:
datalex datalex dbt sync <project>— readstarget/manifest.json+ yourprofiles.yml, introspects live column types, and merges them into DataLex YAML. Idempotent: user-authoreddescription:,tags:,sensitivity:, andtests:survive re-sync.datalex datalex dbt emit— writessources.ymlandschema.ymlwithcontract.enforced: trueanddata_type:on every column.dbt parsesucceeds out of the box.datalex datalex emit ddl --dialect ...— Postgres, Snowflake, BigQuery, Databricks, MySQL, SQL Server, Redshift. Same source, all dialects.datalex datalex diff— semantic diff with explicit rename tracking (previous_name:), breaking-change gate for CI.datalex datalex mesh check <repo> --strict— verifies dbt mesh Interface readiness for shared models declared withmeta.datalex.interface. See docs/mesh-interfaces.md.- Cross-repo package imports — pin
acme/warehouse-core@1.4.0inimports:, lockfile + content hash drift detection, Git-or-path resolution, on-disk parse cache for large projects. - Visual studio — React Flow UI for editing entities, relationships, and metadata; same YAML files as the CLI.
- Agentic modeling assistant — local-first AI workflow for explaining
selected objects, reverse-engineering dbt repos into conceptual/logical
views, proposing focused YAML patches, and applying approved changes
through the same guarded save APIs as manual edits. Context comes from
structured dbt/DataLex facts, manifest/catalog metadata, BM25 lexical
search, validation output, project memory, and team skills under
DataLex/Skills/*.md; no vector search is used for code/YAML retrieval.
Supported warehouses
| Warehouse | dbt sync introspection |
Forward DDL | Reverse engineering |
|---|---|---|---|
| DuckDB | ✓ | — | — |
| PostgreSQL | ✓ | ✓ | ✓ |
| Snowflake | (fallback) | ✓ | ✓ |
| BigQuery | (fallback) | ✓ | ✓ |
| Databricks | (fallback) | ✓ | ✓ |
| MySQL | (fallback) | ✓ | ✓ |
| SQL Server / Azure SQL | (fallback) | ✓ | ✓ |
| Redshift | (fallback) | ✓ | ✓ |
"Fallback" = uses the existing full-schema connector (slower than the per-table path but already works today; a narrow introspection path ships per-dialect over time).
Install
Use the path that matches what you are trying to do:
| Goal | Recommended path |
|---|---|
| Try DataLex or use it with your dbt repo | PyPI install |
| Develop DataLex itself from this repo | Source checkout |
| Avoid local Python/Node setup differences | Docker fallback |
PyPI Install (Recommended)
From PyPI:
pip install -U 'datalex-cli[serve]' # CLI + UI (recommended)
pip install -U 'datalex-cli[serve,postgres]' # add a warehouse driver
pip install -U 'datalex-cli[serve,all]' # every driver + UI
pip install -U datalex-cli # CLI-only, no UI
Available extras: serve, duckdb, postgres, mysql, snowflake,
bigquery, databricks, sqlserver, redshift, all.
Prereqs: Python 3.9+ and Git. Node 20+ is optional because
[serve] bundles a portable Node runtime.
Verify the installed package:
datalex --version
Configure AI providers in Settings → AI. DataLex supports local fallback responses plus OpenAI, Anthropic, Gemini, and Ollama-compatible endpoints. Provider keys are stored locally in the browser; generated YAML is never written until you approve an explicit proposal.
For the local DuckDB-based example repo, install the matching driver too:
pip install -U 'datalex-cli[serve,duckdb]'
Source Checkout For Contributors
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[serve,duckdb]'
npm --prefix packages/api-server install
npm --prefix packages/web-app install
datalex serve # auto-builds the UI on first run
Source checkouts need Node 20+ with npm. If you skip the npm install
commands, datalex serve will try to install missing API/web
dependencies on first run.
Docker Fallback (Optional)
Docker is useful when you want to avoid local Python/Node version drift or when a company laptop blocks global installs. It is not required for normal use.
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
docker build -t datalex:local .
docker run --rm -p 3030:3001 datalex:local
Open http://localhost:3030.
To run DataLex against an existing dbt repo from Docker, mount that repo
and point REPO_ROOT at the mounted path:
cd ~/path/to/your-dbt-project
docker run --rm -p 3030:3001 \
-v "$PWD":/workspace \
-e REPO_ROOT=/workspace \
-e DM_CLI=/app/datalex \
datalex:local
In the UI, use /workspace as the dbt repository path.
Update to the latest
When you want to pick up the newest fixes (or a fix that's already on
main but not yet on PyPI), run:
bash scripts/update.sh
The script auto-detects how datalex-cli is installed in your active
Python environment and runs the right upgrade — git pull + pip install -e . for source checkouts, pip install -U from PyPI for
PyPI installs, and a re-install from main for git-based installs.
To force-pull the very latest from GitHub (skipping PyPI), pass:
bash scripts/update.sh --from-source
That's the workaround when a recent fix has merged to main but the
next PyPI release hasn't shipped yet. Equivalent one-liner without the
script:
pip install -U "git+https://github.com/duckcode-ai/DataLex.git#subdirectory=packages/cli"
Install Troubleshooting
If datalex serve fails with:
ERR_MODULE_NOT_FOUND ... datalex_core/_server/ai/providerMeta.js
you are using a wheel that did not include the full API server runtime.
Upgrade to datalex-cli 1.3.4 or newer:
pip install -U 'datalex-cli[serve]'
Until that patch is available in your package index, install from the current source checkout:
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[serve,duckdb]'
datalex serve
Project layout
DataLex/
packages/
core_engine/ # Python: loader, dialects, dbt integration, packages
src/datalex_core/
_schemas/datalex/ # JSON Schema per `kind:` — bundled with the package
cli/ # `datalex` entry point
api-server/ # Node.js API (UI backend)
web-app/ # React Flow studio
examples/
jaffle_shop_demo/ # zero-setup dbt-sync demo (DuckDB)
model-examples/ # sample projects and scenario walkthroughs
docs/ # architecture, specs, runbooks
tests/ # unittest suite (core engine + datalex)
Visual Studio
datalex serve ships the full UI — no extra setup. If you're hacking
on the web app itself and want hot-reload, run the two dev servers from
a source checkout:
# Terminal 1 — api (port 3030)
npm --prefix packages/api-server run dev
# Terminal 2 — web (port 5173)
npm --prefix packages/web-app run dev
The UI reads and writes the same YAML files the CLI does — no database, no hosted service.
CI / GitOps
DataLex is designed to live in your repo next to your dbt project. A typical CI step:
./datalex datalex validate datalex/
./datalex datalex diff datalex-main/ datalex/ --exit-on-breaking
./datalex datalex dbt emit datalex/ --out-dir dbt/
dbt parse
Documentation
Onboarding
- Getting started — the one-page map covering install, the three GUI paths, and the mental model.
- Jaffle-shop walkthrough — end-to-end demo: clone the DataLex-ready jaffle-shop repo, build it with DuckDB, review conceptual/logical/physical diagrams, and commit normal dbt/DataLex YAML diffs.
- Import an existing dbt project — 5-minute bring-your-own-repo flow (local folder or git URL).
- Pull a warehouse schema — 7-minute live-connection flow with inferred PKs/FKs and streaming progress.
- Agentic AI modeling — how Ask AI, skills, memory, search/indexing, proposal review, and auto-refresh work.
- CLI dbt-sync tutorial — original CLI-only jaffle_shop walkthrough.
Reference
- DataLex layout reference — what each
kind:file looks like and how the loader discovers them. - CLI cheat sheet — every
datalex datalex …subcommand on one page. - API contracts — HTTP API reference for integrators.
- Architecture — core engine modules and end-to-end data flow.
- Pre-DataLex specs have moved to docs/archive/.
Community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datalex_cli-1.7.3.tar.gz.
File metadata
- Download URL: datalex_cli-1.7.3.tar.gz
- Upload date:
- Size: 5.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85aaf9ca189db2947b22729d4e54c510fdbf3ebfbc78f69769c79e3e4d8034e6
|
|
| MD5 |
6741312b7dce5dd051c584c207e4288e
|
|
| BLAKE2b-256 |
a5e26431df0e512b60f61518f2510eb9b97b83e5620ed03ed314de5ad1b50847
|
Provenance
The following attestation bundles were made for datalex_cli-1.7.3.tar.gz:
Publisher:
publish.yml on duckcode-ai/DataLex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datalex_cli-1.7.3.tar.gz -
Subject digest:
85aaf9ca189db2947b22729d4e54c510fdbf3ebfbc78f69769c79e3e4d8034e6 - Sigstore transparency entry: 1409678514
- Sigstore integration time:
-
Permalink:
duckcode-ai/DataLex@e8b0133e58b4e60ddf42e902e1b886045f8489a9 -
Branch / Tag:
refs/tags/v1.7.3 - Owner: https://github.com/duckcode-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e8b0133e58b4e60ddf42e902e1b886045f8489a9 -
Trigger Event:
push
-
Statement type:
File details
Details for the file datalex_cli-1.7.3-py3-none-any.whl.
File metadata
- Download URL: datalex_cli-1.7.3-py3-none-any.whl
- Upload date:
- Size: 6.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f9734512359738a9215d4d9ed6634078c76d0c6558231cf82a089ebae0d6a2e
|
|
| MD5 |
22070f7f036d884ba2b3f907d99788ba
|
|
| BLAKE2b-256 |
1af96598686001f05d276dc7f4d84e3124c56a76ee6fd6eb11fee9b91363c739
|
Provenance
The following attestation bundles were made for datalex_cli-1.7.3-py3-none-any.whl:
Publisher:
publish.yml on duckcode-ai/DataLex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
datalex_cli-1.7.3-py3-none-any.whl -
Subject digest:
3f9734512359738a9215d4d9ed6634078c76d0c6558231cf82a089ebae0d6a2e - Sigstore transparency entry: 1409678519
- Sigstore integration time:
-
Permalink:
duckcode-ai/DataLex@e8b0133e58b4e60ddf42e902e1b886045f8489a9 -
Branch / Tag:
refs/tags/v1.7.3 - Owner: https://github.com/duckcode-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e8b0133e58b4e60ddf42e902e1b886045f8489a9 -
Trigger Event:
push
-
Statement type: