Skip to main content

Git-native data modeling for dbt users

Project description

DataLex by DuckCode AI Labs

DataLex

Git-native data modeling for dbt users.

Point us at your dbt project and warehouse — we produce versioned, reviewable YAML with contracts, lineage, ERDs, and clean round-trip back to dbt.

PyPI MIT License Discord Community GitHub Stars

DataLex Visual Studio — file tree, YAML editor, and React Flow ERD on the same entity

Quickstart (recommended)

pip install 'datalex-cli[serve]'       # CLI + bundled Node — one command, no prereqs
datalex serve                          # opens http://localhost:3030

That's it. [serve] pulls a portable Node so you don't need to install Node separately — and yes, that's intentional. If you already have Node 20+ on your PATH, plain pip install datalex-cli works too.

Working with your own dbt project?

cd ~/my-dbt-project                    # folder containing dbt_project.yml
datalex serve --project-dir .

DataLex auto-registers the folder as a project on first launch, so the browser opens directly into your real tree — no "set up a workspace" click-through. Every UI edit writes back to the original .yml files on disk. See docs/getting-started.md for the full path matrix (demo → local dbt → git URL → live warehouse).

Want your warehouse drivers too?

pip install 'datalex-cli[serve,postgres]'        # or snowflake, bigquery, databricks…
pip install 'datalex-cli[serve,all]'             # every driver + Node

Pick a tutorial

Once datalex serve is running, follow the path that matches what you have in hand:

You have... Tutorial Time
Nothing — just want the demo Jaffle-shop one-click walkthrough 3 min
An existing dbt project (folder or git) Import an existing dbt project 5 min
A live warehouse (Snowflake/Postgres/…) Pull a warehouse schema 7 min
CLI-only, no UI CLI dbt-sync tutorial 5 min

New here? Start with docs/getting-started.md — it's the map across all four paths plus the mental model.

60-second demo (offline, no warehouse)

DataLex dbt sync demo — build a DuckDB warehouse, sync into DataLex YAML, emit back to dbt with contracts enforced

pip install 'datalex-cli[duckdb]'
git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex

# 1. Build a local DuckDB warehouse (no external credentials)
python examples/jaffle_shop_demo/setup.py

# 2. Sync the dbt project into DataLex YAML
datalex datalex dbt sync examples/jaffle_shop_demo \
    --out-root examples/jaffle_shop_demo/datalex-out

# 3. Emit dbt-parseable YAML back, with contracts enforced
datalex datalex dbt emit examples/jaffle_shop_demo/datalex-out \
    --out-dir examples/jaffle_shop_demo/dbt-out

Open examples/jaffle_shop_demo/datalex-out/sources/jaffle_shop_raw.yaml — every column has its warehouse type, descriptions from the manifest, and a meta.datalex.dbt.unique_id stamp so re-running the sync never clobbers anything you've hand-authored.

What it does

DataLex treats your data models as code. On top of a stricter YAML substrate (the DataLex layout — one file per entity, kind:-dispatched, streaming-safe for 10K+ entities), it gives you:

  • datalex datalex dbt sync <project> — reads target/manifest.json + your profiles.yml, introspects live column types, and merges them into DataLex YAML. Idempotent: user-authored description:, tags:, sensitivity:, and tests: survive re-sync.
  • datalex datalex dbt emit — writes sources.yml and schema.yml with contract.enforced: true and data_type: on every column. dbt parse succeeds out of the box.
  • datalex datalex emit ddl --dialect ... — Postgres, Snowflake, BigQuery, Databricks, MySQL, SQL Server, Redshift. Same source, all dialects.
  • datalex datalex diff — semantic diff with explicit rename tracking (previous_name:), breaking-change gate for CI.
  • Cross-repo package imports — pin acme/warehouse-core@1.4.0 in imports:, lockfile + content hash drift detection, Git-or-path resolution, on-disk parse cache for large projects.
  • Visual studio — React Flow UI for editing entities, relationships, and metadata; same YAML files as the CLI.

Supported warehouses

Warehouse dbt sync introspection Forward DDL Reverse engineering
DuckDB
PostgreSQL
Snowflake (fallback)
BigQuery (fallback)
Databricks (fallback)
MySQL (fallback)
SQL Server / Azure SQL (fallback)
Redshift (fallback)

"Fallback" = uses the existing full-schema connector (slower than the per-table path but already works today; a narrow introspection path ships per-dialect over time).

Install

From PyPI:

pip install datalex-cli               # puts `datalex` on PATH
pip install 'datalex-cli[duckdb]'     # add warehouse drivers you need

From source (for contributors or editable installs):

git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
python3 -m venv .venv
source .venv/bin/activate
pip install -e '.[duckdb]'

# optional — only needed for the Visual Studio
npm --prefix packages/api-server install
npm --prefix packages/web-app install

Available extras: duckdb, postgres, mysql, snowflake, bigquery, databricks, sqlserver, redshift, or all.

Prereqs: Python 3.9+, Git. Node.js 18+ if you want the UI.

Project layout

DataLex/
  packages/
    core_engine/           # Python: loader, dialects, dbt integration, packages
      src/datalex_core/
        _schemas/datalex/  # JSON Schema per `kind:` — bundled with the package
    cli/                   # `datalex` entry point
    api-server/            # Node.js API (UI backend)
    web-app/               # React Flow studio
  examples/
    jaffle_shop_demo/      # zero-setup dbt-sync demo (DuckDB)
  model-examples/          # sample projects and scenario walkthroughs
  docs/                    # architecture, specs, runbooks
  tests/                   # unittest suite (core engine + datalex)

Visual Studio (optional)

If you want the UI on top of your DataLex project, run the two dev servers:

# Terminal 1
npm --prefix packages/api-server run dev
# Terminal 2
npm --prefix packages/web-app run dev

Then open http://localhost:5173. The UI reads and writes the same YAML files the CLI does — no database, no hosted service.

CI / GitOps

DataLex is designed to live in your repo next to your dbt project. A typical CI step:

./datalex datalex validate datalex/
./datalex datalex diff datalex-main/ datalex/ --exit-on-breaking
./datalex datalex dbt emit datalex/ --out-dir dbt/
dbt parse

Documentation

Onboarding

Reference

Community

  • Discord: Join Discord
  • Issues: GitHub Issues
  • Contributing: CONTRIBUTING.md
  • License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalex_cli-0.2.3.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datalex_cli-0.2.3-py3-none-any.whl (4.8 MB view details)

Uploaded Python 3

File details

Details for the file datalex_cli-0.2.3.tar.gz.

File metadata

  • Download URL: datalex_cli-0.2.3.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datalex_cli-0.2.3.tar.gz
Algorithm Hash digest
SHA256 3965ce9dfe4b6fcf28989da58fdde85e10c7aa1f413734ec6c54aa75c789e91f
MD5 639d6f6ef2e98743fbce813c7b9de10c
BLAKE2b-256 8538e1940734487f042135f0279c0f4de1ce6abb42722c67337113251eab921d

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalex_cli-0.2.3.tar.gz:

Publisher: publish.yml on duckcode-ai/DataLex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datalex_cli-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: datalex_cli-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datalex_cli-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 905bd4be43d0eadc3d2a04f3ecb46a07c4b9eab315d3dc7e96aac74e745cdf80
MD5 dabaa09ffd5f60693dbb329cd52faf73
BLAKE2b-256 28e3f9457a2253bed512ba6010fe0ed156e4a6b58955a4e1aa10db5b3529636d

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalex_cli-0.2.3-py3-none-any.whl:

Publisher: publish.yml on duckcode-ai/DataLex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page