Skip to main content

Git-native data modeling for dbt users

Project description

DataLex by DuckCode AI Labs

DataLex

Git-native data modeling for dbt users.

Point us at your dbt project and warehouse — we produce versioned, reviewable YAML with contracts, lineage, ERDs, and clean round-trip back to dbt.

MIT License Discord Community GitHub Stars

DataLex Visual Studio — file tree, YAML editor, and React Flow ERD on the same entity

60-second demo

git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex
pip install -e '.[duckdb]'

# 1. Build a local DuckDB warehouse (no external credentials)
python examples/jaffle_shop_demo/setup.py

# 2. Sync the dbt project into DataLex YAML
./datalex datalex dbt sync examples/jaffle_shop_demo \
    --out-root examples/jaffle_shop_demo/datalex-out

# 3. Emit dbt-parseable YAML back, with contracts enforced
./datalex datalex dbt emit examples/jaffle_shop_demo/datalex-out \
    --out-dir examples/jaffle_shop_demo/dbt-out

Open examples/jaffle_shop_demo/datalex-out/sources/jaffle_shop_raw.yaml — every column has its warehouse type, descriptions from the manifest, and a meta.datalex.dbt.unique_id stamp so re-running the sync never clobbers anything you've hand-authored.

What it does

DataLex treats your data models as code. On top of a stricter YAML substrate (the DataLex layout — one file per entity, kind:-dispatched, streaming-safe for 10K+ entities), it gives you:

  • datalex datalex dbt sync <project> — reads target/manifest.json + your profiles.yml, introspects live column types, and merges them into DataLex YAML. Idempotent: user-authored description:, tags:, sensitivity:, and tests: survive re-sync.
  • datalex datalex dbt emit — writes sources.yml and schema.yml with contract.enforced: true and data_type: on every column. dbt parse succeeds out of the box.
  • datalex datalex emit ddl --dialect ... — Postgres, Snowflake, BigQuery, Databricks, MySQL, SQL Server, Redshift. Same source, all dialects.
  • datalex datalex diff — semantic diff with explicit rename tracking (previous_name:), breaking-change gate for CI.
  • Cross-repo package imports — pin acme/warehouse-core@1.4.0 in imports:, lockfile + content hash drift detection, Git-or-path resolution, on-disk parse cache for large projects.
  • Visual studio — React Flow UI for editing entities, relationships, and metadata; same YAML files as the CLI.

Supported warehouses

Warehouse dbt sync introspection Forward DDL Reverse engineering
DuckDB
PostgreSQL
Snowflake (fallback)
BigQuery (fallback)
Databricks (fallback)
MySQL (fallback)
SQL Server / Azure SQL (fallback)
Redshift (fallback)

"Fallback" = uses the existing full-schema connector (slower than the per-table path but already works today; a narrow introspection path ships per-dialect over time).

Install

git clone https://github.com/duckcode-ai/DataLex.git
cd DataLex

python3 -m venv .venv
source .venv/bin/activate
pip install -e .              # puts `datalex` on PATH
pip install -e '.[duckdb]'    # add warehouse drivers you need

# optional — only needed for the Visual Studio
npm --prefix packages/api-server install
npm --prefix packages/web-app install

Available extras: duckdb, postgres, mysql, snowflake, bigquery, databricks, sqlserver, redshift, or all.

Prereqs: Python 3.9+, Git. Node.js 18+ if you want the UI.

Project layout

DataLex/
  packages/
    core_engine/           # Python: loader, dialects, dbt integration, packages
      src/datalex_core/
        _schemas/datalex/  # JSON Schema per `kind:` — bundled with the package
    cli/                   # `datalex` entry point
    api-server/            # Node.js API (UI backend)
    web-app/               # React Flow studio
  examples/
    jaffle_shop_demo/      # zero-setup dbt-sync demo (DuckDB)
  model-examples/          # sample projects and scenario walkthroughs
  docs/                    # architecture, specs, runbooks
  tests/                   # unittest suite (core engine + datalex)

Visual Studio (optional)

If you want the UI on top of your DataLex project, run the two dev servers:

# Terminal 1
npm --prefix packages/api-server run dev
# Terminal 2
npm --prefix packages/web-app run dev

Then open http://localhost:5173. The UI reads and writes the same YAML files the CLI does — no database, no hosted service.

CI / GitOps

DataLex is designed to live in your repo next to your dbt project. A typical CI step:

./datalex datalex validate datalex/
./datalex datalex diff datalex-main/ datalex/ --exit-on-breaking
./datalex datalex dbt emit datalex/ --out-dir dbt/
dbt parse

Documentation

Community

  • Discord: Join Discord
  • Issues: GitHub Issues
  • Contributing: CONTRIBUTING.md
  • License: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalex_cli-0.1.1.tar.gz (185.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datalex_cli-0.1.1-py3-none-any.whl (180.6 kB view details)

Uploaded Python 3

File details

Details for the file datalex_cli-0.1.1.tar.gz.

File metadata

  • Download URL: datalex_cli-0.1.1.tar.gz
  • Upload date:
  • Size: 185.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datalex_cli-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d41b567c2db2f458bf432e3416a4eaf1a4ac1a75736478a16d5fc4db73eda820
MD5 386adb0399587e934c75dac639828f8c
BLAKE2b-256 c3b3f2deb6c425d9552c7ad1c9caf325560f260f345c3dbed25b753616740a97

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalex_cli-0.1.1.tar.gz:

Publisher: publish.yml on duckcode-ai/DataLex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datalex_cli-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: datalex_cli-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 180.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for datalex_cli-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 96d184b1c0044416d786c1ec5cc5b2a14d6b225a9db80f71542d9b3f946594cd
MD5 66a36098d39a215e56d513b673c76125
BLAKE2b-256 3268cf6a91c70704287c56db891a7e1dca337b0f72798c1b8f130c3cff5bab7c

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalex_cli-0.1.1-py3-none-any.whl:

Publisher: publish.yml on duckcode-ai/DataLex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page