DataEngineX — open-source, self-hosted, local-first Data + ML + AI workbench library
Project description
dataenginex
The Python library that powers DEX Studio — an open-source, self-hosted, local-first Data + ML + AI workbench. Use the library directly when you want code, not a UI.
Pre-1.0 status.
0.4.0is honest about that. See the scope reset CHANGELOG for the rationale.
Install
pip install dataenginex # lean base — DuckDB, structlog, Pydantic, Click, pyarrow
Optional integrations — install only what you need:
pip install 'dataenginex[postgres]' # asyncpg-backed lineage, persistence
pip install 'dataenginex[qdrant]' # Qdrant vector store backend
pip install 'dataenginex[queue]' # ARQ async job queue (pulls redis)
pip install 'dataenginex[cloud]' # S3, GCS, BigQuery storage backends
pip install 'dataenginex[ml]' # scikit-learn, xgboost, sentence-transformers
pip install 'dataenginex[tracking]' # MLflow integration
pip install 'dataenginex[data]' # PySpark, databricks-cli
LiteLLM: install separately — it pins
python-dotenv==1.0.1which conflicts with our>=1.2.2:pip install 'litellm>=1.83.3' --no-deps
Quick start
from pathlib import Path
from dataenginex.engine import DexEngine
# Load config and initialize all backends
engine = DexEngine(Path("dex.yaml"))
# Data — run pipelines defined in dex.yaml
engine.run_pipeline("clean_users")
# ML — train, register, predict
models = engine.model_registry.list_models()
result = engine.model_registry.predict("churn_model", features)
# AI — chat with an agent over your data
response = engine.agents["assistant"].chat("summarise the latest pipeline run")
# Persistence — query DuckDB-backed history
runs = engine.store.list_pipeline_runs(limit=10)
Smaller surfaces — use only what you need:
from dataenginex.config import load_config
cfg = load_config("dex.yaml")
from dataenginex.core.interfaces import BaseConnector
from dataenginex.core.registry import BackendRegistry
from dataenginex.ml import ModelRegistry
from dataenginex.ai.llm import get_llm_provider
from dataenginex.ai.vectorstore import VectorStoreBackend
Submodules
| Module | Description |
|---|---|
dataenginex.engine |
DexEngine — single entry point; loads config, inits store, wires backends |
dataenginex.store |
DexStore — DuckDB-backed persistence (.dex/store.duckdb) |
dataenginex.config |
dex.yaml schema, loader, env-var resolution |
dataenginex.core |
Exceptions, Base* ABCs, BackendRegistry |
dataenginex.cli |
dex CLI (validate, version, init) |
dataenginex.data |
Connectors (CSV, Parquet, DuckDB, HTTP, …), pipeline runner, schema registry |
dataenginex.ml |
Classical ML — training, model registry, serving, drift |
dataenginex.ai |
LLM providers, agents, RAG, vector store, memory, observability |
dataenginex.orchestration |
Scheduler, background workers |
dataenginex.middleware |
structlog config, Prometheus metrics |
dataenginex.lakehouse |
Storage backends, catalog, partitioning |
dataenginex.warehouse |
Transforms, lineage tracking |
dataenginex.secops |
PII detection, masking, audit logging |
dataenginex.api |
Pydantic response models (no HTTP server bundled) |
dataenginex.plugins |
Entry-point plugin discovery |
Want the UI?
dataenginex is the engine. The web UI lives in a separate repo:
git clone https://github.com/TheDataEngineX/dex-studio && cd dex-studio
docker compose up # open http://localhost:7860
DEX Studio imports dataenginex directly — no separate API server.
Links
- Source: github.com/TheDataEngineX/dex
- Docs: docs.thedataenginex.org
- Roadmap: docs/docs/roadmap/DESIGN-2026.md
- ADRs: docs/adr/
- Issues: github.com/TheDataEngineX/dex/issues
- License: MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataenginex-0.4.0.tar.gz.
File metadata
- Download URL: dataenginex-0.4.0.tar.gz
- Upload date:
- Size: 520.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7108487706e20cafcb899e114cc2f87e470e273b58ba9f16dcc99d3b684a5ecd
|
|
| MD5 |
ef704b006c5c7e216afc86f68b8c1d49
|
|
| BLAKE2b-256 |
56481306a685d7c20972df0b6fbf6a93900ddda626e070ba340e5d1bc3371aad
|
Provenance
The following attestation bundles were made for dataenginex-0.4.0.tar.gz:
Publisher:
release.yml on TheDataEngineX/dex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataenginex-0.4.0.tar.gz -
Subject digest:
7108487706e20cafcb899e114cc2f87e470e273b58ba9f16dcc99d3b684a5ecd - Sigstore transparency entry: 1712029558
- Sigstore integration time:
-
Permalink:
TheDataEngineX/dex@5ee5df89e089358cd2632f54803d808720423b00 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/TheDataEngineX
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5ee5df89e089358cd2632f54803d808720423b00 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dataenginex-0.4.0-py3-none-any.whl.
File metadata
- Download URL: dataenginex-0.4.0-py3-none-any.whl
- Upload date:
- Size: 194.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24b37f60e89e634ebe233eff77ee585695f0d00564abb4d1d0a2b4a5850b2a3a
|
|
| MD5 |
f09bece7b62964427288469b049242b8
|
|
| BLAKE2b-256 |
c724d486a27ecccb9133c7b3f279808d539945de4340a0e860a1a4bf138d967c
|
Provenance
The following attestation bundles were made for dataenginex-0.4.0-py3-none-any.whl:
Publisher:
release.yml on TheDataEngineX/dex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataenginex-0.4.0-py3-none-any.whl -
Subject digest:
24b37f60e89e634ebe233eff77ee585695f0d00564abb4d1d0a2b4a5850b2a3a - Sigstore transparency entry: 1712029569
- Sigstore integration time:
-
Permalink:
TheDataEngineX/dex@5ee5df89e089358cd2632f54803d808720423b00 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/TheDataEngineX
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@5ee5df89e089358cd2632f54803d808720423b00 -
Trigger Event:
push
-
Statement type: