Git-native data modeling for dbt users

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

duckcode.ai

These details have not been verified by PyPI

Project links

Homepage

Project description

DataLex

Git-native data modeling for dbt users.

Point us at your dbt project and warehouse — we produce versioned, reviewable YAML with contracts, lineage, ERDs, and clean round-trip back to dbt.

DataLex Visual Studio — file tree, YAML editor, and React Flow ERD on the same entity

Quickstart — two commands

pip install -U 'datalex-cli[serve]'    # CLI + bundled Node — one command, no prereqs
datalex serve                          # opens http://localhost:3030

That's it for most machines. No Docker, no database, and only one terminal. The [serve] extra pulls a portable Node runtime. If you already have Node 20+ on PATH, plain pip install datalex-cli works too.

Point it at your dbt repo:

cd ~/my-dbt-project                    # folder containing dbt_project.yml
datalex serve --project-dir .

The folder auto-registers as your active project; the browser opens straight into your real file tree. Every UI edit writes back to the original .yml files — git status shows real diffs.

Build your first ER diagram:

Click Import dbt repo → Local folder → pick your project root
Click New modeling asset and choose Conceptual, Logical, or Physical. New assets use the domain-first structure DataLex/<domain>/<conceptual|logical|physical>/....
Open the new .diagram.yaml. Conceptual and logical diagrams can create boxes directly; physical diagrams are dbt-first, so drag any schema.yml / .model.yaml from the Explorer onto the canvas. Relationship handles on each card create business, logical, or physical relationships for the active layer.
Open Ask AI from the right panel, canvas, Explorer, validation row, or selected object when you want the agent to explain the model, reverse-engineer business concepts, or propose YAML changes. AI proposals are approval-gated; use Review plan to inspect the full context and proposed YAML before applying.
Drag to reposition → Save All → positions persist in the diagram file; git commit picks them up. Save All is merge-safe: multiple in-memory docs targeting the same schema.yml are merged through the core-engine merge_models_preserving_docs helper instead of clobbering siblings.

See docs/getting-started.md for the full path matrix (demo → local dbt → git URL → live warehouse).

Want your warehouse drivers too?

pip install 'datalex-cli[serve,postgres]'        # or snowflake, bigquery, databricks…
pip install 'datalex-cli[serve,all]'             # every driver + Node

Pick a tutorial

Once datalex serve is running, follow the path that matches what you have in hand:

You have...	Tutorial	Time
Nothing — want to try with a known-good dbt repo	Walk through jaffle-shop end-to-end	5 min
An existing dbt project (folder or git)	Import an existing dbt project	5 min
A live warehouse (Snowflake/Postgres/…)	Pull a warehouse schema	7 min
CLI-only, no UI	CLI dbt-sync tutorial	5 min

New here? Start with docs/getting-started.md — it's the map across all four paths plus the mental model.

60-second demo (offline, no warehouse)

DataLex dbt sync demo — build a DuckDB warehouse, sync into DataLex YAML, emit back to dbt with contracts enforced

pip install 'datalex-cli[duckdb]'
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex

# 1. Build a local DuckDB warehouse (no external credentials)
python examples/jaffle_shop_demo/setup.py

# 2. Sync the dbt project into DataLex YAML
datalex datalex dbt sync examples/jaffle_shop_demo \
    --out-root examples/jaffle_shop_demo/datalex-out

# 3. Emit dbt-parseable YAML back, with contracts enforced
datalex datalex dbt emit examples/jaffle_shop_demo/datalex-out \
    --out-dir examples/jaffle_shop_demo/dbt-out

Open examples/jaffle_shop_demo/datalex-out/sources/jaffle_shop_raw.yaml — every column has its warehouse type, descriptions from the manifest, and a meta.datalex.dbt.unique_id stamp so re-running the sync never clobbers anything you've hand-authored.

What it does

DataLex treats your data models as code. On top of a stricter YAML substrate (the DataLex layout — one file per entity, kind:-dispatched, streaming-safe for 10K+ entities), it gives you:

datalex datalex dbt sync <project> — reads target/manifest.json + your profiles.yml, introspects live column types, and merges them into DataLex YAML. Idempotent: user-authored description:, tags:, sensitivity:, and tests: survive re-sync.
datalex datalex dbt emit — writes sources.yml and schema.yml with contract.enforced: true and data_type: on every column. dbt parse succeeds out of the box.
datalex datalex emit ddl --dialect ... — Postgres, Snowflake, BigQuery, Databricks, MySQL, SQL Server, Redshift. Same source, all dialects.
datalex datalex diff — semantic diff with explicit rename tracking (previous_name:), breaking-change gate for CI.
datalex datalex mesh check <repo> --strict — verifies dbt mesh Interface readiness for shared models declared with meta.datalex.interface. See docs/mesh-interfaces.md.
Cross-repo package imports — pin acme/warehouse-core@1.4.0 in imports:, lockfile + content hash drift detection, Git-or-path resolution, on-disk parse cache for large projects.
Visual studio — React Flow UI for editing entities, relationships, and metadata; same YAML files as the CLI.
Agentic modeling assistant — local-first AI workflow for explaining selected objects, reverse-engineering dbt repos into conceptual/logical views, proposing focused YAML patches, and applying approved changes through the same guarded save APIs as manual edits. Context comes from structured dbt/DataLex facts, manifest/catalog metadata, BM25 lexical search, validation output, project memory, and team skills under DataLex/Skills/*.md; no vector search is used for code/YAML retrieval.

Supported warehouses

Warehouse	`dbt sync` introspection	Forward DDL	Reverse engineering
DuckDB	✓	—	—
PostgreSQL	✓	✓	✓
Snowflake	(fallback)	✓	✓
BigQuery	(fallback)	✓	✓
Databricks	(fallback)	✓	✓
MySQL	(fallback)	✓	✓
SQL Server / Azure SQL	(fallback)	✓	✓
Redshift	(fallback)	✓	✓

"Fallback" = uses the existing full-schema connector (slower than the per-table path but already works today; a narrow introspection path ships per-dialect over time).

Install

Use the path that matches what you are trying to do:

Goal	Recommended path
Try DataLex or use it with your dbt repo	PyPI install
Develop DataLex itself from this repo	Source checkout
Avoid local Python/Node setup differences	Docker fallback

PyPI Install (Recommended)

From PyPI:

pip install -U 'datalex-cli[serve]'                 # CLI + UI (recommended)
pip install -U 'datalex-cli[serve,postgres]'        # add a warehouse driver
pip install -U 'datalex-cli[serve,all]'             # every driver + UI
pip install -U datalex-cli                          # CLI-only, no UI

Available extras: serve, duckdb, postgres, mysql, snowflake, bigquery, databricks, sqlserver, redshift, all.

Prereqs: Python 3.9+ and Git. Node 20+ is optional because [serve] bundles a portable Node runtime.

Verify the installed package:

datalex --version

Configure AI providers in Settings → AI. DataLex supports local fallback responses plus OpenAI, Anthropic, Gemini, and Ollama-compatible endpoints. Provider keys are stored locally in the browser; generated YAML is never written until you approve an explicit proposal.

For the local DuckDB-based example repo, install the matching driver too:

pip install -U 'datalex-cli[serve,duckdb]'

Source Checkout For Contributors

git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[serve,duckdb]'
npm --prefix packages/api-server install
npm --prefix packages/web-app install
datalex serve                                    # auto-builds the UI on first run

Source checkouts need Node 20+ with npm. If you skip the npm install commands, datalex serve will try to install missing API/web dependencies on first run.

Docker Fallback (Optional)

Docker is useful when you want to avoid local Python/Node version drift or when a company laptop blocks global installs. It is not required for normal use.

git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
docker build -t datalex:local .
docker run --rm -p 3030:3001 datalex:local

Open http://localhost:3030.

To run DataLex against an existing dbt repo from Docker, mount that repo and point REPO_ROOT at the mounted path:

cd ~/path/to/your-dbt-project
docker run --rm -p 3030:3001 \
  -v "$PWD":/workspace \
  -e REPO_ROOT=/workspace \
  -e DM_CLI=/app/datalex \
  datalex:local

In the UI, use /workspace as the dbt repository path.

Install Troubleshooting

If datalex serve fails with:

ERR_MODULE_NOT_FOUND ... datalex_core/_server/ai/providerMeta.js

you are using a wheel that did not include the full API server runtime. Upgrade to datalex-cli 1.3.4 or newer:

pip install -U 'datalex-cli[serve]'

Until that patch is available in your package index, install from the current source checkout:

git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
python3 -m venv .venv && source .venv/bin/activate
pip install -e '.[serve,duckdb]'
datalex serve

Project layout

DataLex/
  packages/
    core_engine/           # Python: loader, dialects, dbt integration, packages
      src/datalex_core/
        _schemas/datalex/  # JSON Schema per `kind:` — bundled with the package
    cli/                   # `datalex` entry point
    api-server/            # Node.js API (UI backend)
    web-app/               # React Flow studio
  examples/
    jaffle_shop_demo/      # zero-setup dbt-sync demo (DuckDB)
  model-examples/          # sample projects and scenario walkthroughs
  docs/                    # architecture, specs, runbooks
  tests/                   # unittest suite (core engine + datalex)

Visual Studio

datalex serve ships the full UI — no extra setup. If you're hacking on the web app itself and want hot-reload, run the two dev servers from a source checkout:

# Terminal 1 — api (port 3030)
npm --prefix packages/api-server run dev
# Terminal 2 — web (port 5173)
npm --prefix packages/web-app run dev

The UI reads and writes the same YAML files the CLI does — no database, no hosted service.

CI / GitOps

DataLex is designed to live in your repo next to your dbt project. A typical CI step:

./datalex datalex validate datalex/
./datalex datalex diff datalex-main/ datalex/ --exit-on-breaking
./datalex datalex dbt emit datalex/ --out-dir dbt/
dbt parse

Documentation

Onboarding

Getting started — the one-page map covering install, the three GUI paths, and the mental model.
Jaffle-shop walkthrough — end-to-end demo: clone the DataLex-ready jaffle-shop repo, build it with DuckDB, review conceptual/logical/physical diagrams, and commit normal dbt/DataLex YAML diffs.
Import an existing dbt project — 5-minute bring-your-own-repo flow (local folder or git URL).
Pull a warehouse schema — 7-minute live-connection flow with inferred PKs/FKs and streaming progress.
Agentic AI modeling — how Ask AI, skills, memory, search/indexing, proposal review, and auto-refresh work.
CLI dbt-sync tutorial — original CLI-only jaffle_shop walkthrough.

Reference

DataLex layout reference — what each kind: file looks like and how the loader discovers them.
CLI cheat sheet — every datalex datalex … subcommand on one page.
API contracts — HTTP API reference for integrators.
Architecture — core engine modules and end-to-end data flow.
Pre-DataLex specs have moved to docs/archive/.

Community

Discord:
Issues:
Contributing: CONTRIBUTING.md
License:

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

duckcode.ai

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.10.0

May 1, 2026

1.8.2

May 1, 2026

1.8.1

Apr 30, 2026

1.8.0

Apr 30, 2026

1.7.4

Apr 30, 2026

1.7.3

Apr 30, 2026

1.7.2

Apr 29, 2026

1.7.1

Apr 29, 2026

1.7.0

Apr 29, 2026

1.6.1

Apr 29, 2026

1.6.0

Apr 29, 2026

1.5.0

Apr 28, 2026

1.4.1

Apr 28, 2026

1.4.0

Apr 28, 2026

This version

1.3.7

Apr 26, 2026

1.3.6

Apr 26, 2026

1.3.5

Apr 26, 2026

1.3.4

Apr 26, 2026

1.3.3

Apr 25, 2026

1.3.2

Apr 24, 2026

1.3.1

Apr 24, 2026

1.3.0

Apr 24, 2026

1.2.0

Apr 24, 2026

1.1.1

Apr 22, 2026

1.1.0

Apr 22, 2026

1.0.6

Apr 22, 2026

1.0.5

Apr 21, 2026

1.0.4

Apr 21, 2026

1.0.3

Apr 21, 2026

1.0.2

Apr 21, 2026

1.0.1

Apr 21, 2026

1.0.0

Apr 21, 2026

0.5.1

Apr 21, 2026

0.5.0

Apr 21, 2026

0.4.2

Apr 21, 2026

0.4.1

Apr 21, 2026

0.4.0

Apr 21, 2026

0.3.4

Apr 21, 2026

0.3.3

Apr 21, 2026

0.3.2

Apr 21, 2026

0.3.1

Apr 21, 2026

0.3.0

Apr 21, 2026

0.2.3

Apr 21, 2026

0.2.2

Apr 20, 2026

0.2.1

Apr 20, 2026

0.2.0

Apr 20, 2026

0.1.1

Apr 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalex_cli-1.3.7.tar.gz (4.6 MB view details)

Uploaded Apr 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datalex_cli-1.3.7-py3-none-any.whl (5.0 MB view details)

Uploaded Apr 26, 2026 Python 3

File details

Details for the file datalex_cli-1.3.7.tar.gz.

File metadata

Download URL: datalex_cli-1.3.7.tar.gz
Upload date: Apr 26, 2026
Size: 4.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datalex_cli-1.3.7.tar.gz
Algorithm	Hash digest
SHA256	`375dc360fce6ed1038a49162caa2a531f1cc8390ef31ed888dcf2218e1657dc7`
MD5	`32df91cb716fd3d3a57af8ad1349aeba`
BLAKE2b-256	`2ba84937bee713d3b5e95ba0cc59c6a677aee577dfa59c6bc5271ec71391f3a8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalex_cli-1.3.7.tar.gz:

Publisher: publish.yml on duckcode-ai/DataLex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datalex_cli-1.3.7.tar.gz
- Subject digest: 375dc360fce6ed1038a49162caa2a531f1cc8390ef31ed888dcf2218e1657dc7
- Sigstore transparency entry: 1385862697
- Sigstore integration time: Apr 26, 2026
Source repository:
- Permalink: duckcode-ai/DataLex@259baa03fb1e846df1d0519e25e947ff0f37b03b
- Branch / Tag: refs/tags/v1.3.7
- Owner: https://github.com/duckcode-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@259baa03fb1e846df1d0519e25e947ff0f37b03b
- Trigger Event: push

File details

Details for the file datalex_cli-1.3.7-py3-none-any.whl.

File metadata

Download URL: datalex_cli-1.3.7-py3-none-any.whl
Upload date: Apr 26, 2026
Size: 5.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datalex_cli-1.3.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`02e7087aeafc619761f005106e82918145b4cf36c412239055011da54dd6590f`
MD5	`2a3be9cbd8cfb2b15e93eb17246588b5`
BLAKE2b-256	`a1e0bae04a399182d9d35efc92f88d891222776483a4c5a099b98f1e5493dfa4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalex_cli-1.3.7-py3-none-any.whl:

Publisher: publish.yml on duckcode-ai/DataLex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: datalex_cli-1.3.7-py3-none-any.whl
- Subject digest: 02e7087aeafc619761f005106e82918145b4cf36c412239055011da54dd6590f
- Sigstore transparency entry: 1385862824
- Sigstore integration time: Apr 26, 2026
Source repository:
- Permalink: duckcode-ai/DataLex@259baa03fb1e846df1d0519e25e947ff0f37b03b
- Branch / Tag: refs/tags/v1.3.7
- Owner: https://github.com/duckcode-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@259baa03fb1e846df1d0519e25e947ff0f37b03b
- Trigger Event: push

datalex-cli 1.3.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataLex

Quickstart — two commands

Pick a tutorial

60-second demo (offline, no warehouse)

What it does

Supported warehouses

Install

PyPI Install (Recommended)

Source Checkout For Contributors

Docker Fallback (Optional)

Install Troubleshooting

Project layout

Visual Studio

CI / GitOps

Documentation

Community

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance