Skip to main content

DataEngineX — open-source, self-hosted, local-first Data + ML + AI workbench library

Project description

dataenginex

The Python library that powers DEX Studio — an open-source, self-hosted, local-first Data + ML + AI workbench. Use the library directly when you want code, not a UI.

Pre-1.0 status. 0.4.0 is honest about that. See the scope reset CHANGELOG for the rationale.

Install

pip install dataenginex                    # lean base — DuckDB, structlog, Pydantic, Click, pyarrow

Optional integrations — install only what you need:

pip install 'dataenginex[postgres]'        # asyncpg-backed lineage, persistence
pip install 'dataenginex[qdrant]'          # Qdrant vector store backend
pip install 'dataenginex[queue]'           # ARQ async job queue (pulls redis)
pip install 'dataenginex[cloud]'           # S3, GCS, BigQuery storage backends
pip install 'dataenginex[ml]'              # scikit-learn, xgboost, sentence-transformers
pip install 'dataenginex[tracking]'        # MLflow integration
pip install 'dataenginex[data]'            # PySpark, databricks-cli

LiteLLM: install separately — it pins python-dotenv==1.0.1 which conflicts with our >=1.2.2:

pip install 'litellm>=1.83.3' --no-deps

Quick start

from pathlib import Path
from dataenginex.engine import DexEngine

# Load config and initialize all backends
engine = DexEngine(Path("dex.yaml"))

# Data — run pipelines defined in dex.yaml
engine.run_pipeline("clean_users")

# ML — train, register, predict
models = engine.model_registry.list_models()
result = engine.model_registry.predict("churn_model", features)

# AI — chat with an agent over your data
response = engine.agents["assistant"].chat("summarise the latest pipeline run")

# Persistence — query DuckDB-backed history
runs = engine.store.list_pipeline_runs(limit=10)

Smaller surfaces — use only what you need:

from dataenginex.config import load_config
cfg = load_config("dex.yaml")

from dataenginex.core.interfaces import BaseConnector
from dataenginex.core.registry import BackendRegistry

from dataenginex.ml import ModelRegistry
from dataenginex.ai.llm import get_llm_provider
from dataenginex.ai.vectorstore import VectorStoreBackend

Submodules

Module Description
dataenginex.engine DexEngine — single entry point; loads config, inits store, wires backends
dataenginex.store DexStore — DuckDB-backed persistence (.dex/store.duckdb)
dataenginex.config dex.yaml schema, loader, env-var resolution
dataenginex.core Exceptions, Base* ABCs, BackendRegistry
dataenginex.cli dex CLI (validate, version, init)
dataenginex.data Connectors (CSV, Parquet, DuckDB, HTTP, …), pipeline runner, schema registry
dataenginex.ml Classical ML — training, model registry, serving, drift
dataenginex.ai LLM providers, agents, RAG, vector store, memory, observability
dataenginex.orchestration Scheduler, background workers
dataenginex.middleware structlog config, Prometheus metrics
dataenginex.lakehouse Storage backends, catalog, partitioning
dataenginex.warehouse Transforms, lineage tracking
dataenginex.secops PII detection, masking, audit logging
dataenginex.api Pydantic response models (no HTTP server bundled)
dataenginex.plugins Entry-point plugin discovery

Want the UI?

dataenginex is the engine. The web UI lives in a separate repo:

git clone https://github.com/TheDataEngineX/dex-studio && cd dex-studio
docker compose up         # open http://localhost:7860

DEX Studio imports dataenginex directly — no separate API server.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataenginex-0.4.5.tar.gz (516.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataenginex-0.4.5-py3-none-any.whl (211.7 kB view details)

Uploaded Python 3

File details

Details for the file dataenginex-0.4.5.tar.gz.

File metadata

  • Download URL: dataenginex-0.4.5.tar.gz
  • Upload date:
  • Size: 516.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataenginex-0.4.5.tar.gz
Algorithm Hash digest
SHA256 ce6410ddc8b4a539bd89d8dc6279738a45e46b7ae43d26493dcc98aa3a7f110c
MD5 bca8d6349ca50e536e0e69f80a9479a9
BLAKE2b-256 36cba15f80d0120eb14b7927866a97f181644c824a259985e4e83428b55304f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataenginex-0.4.5.tar.gz:

Publisher: release.yml on TheDataEngineX/dataenginex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataenginex-0.4.5-py3-none-any.whl.

File metadata

  • Download URL: dataenginex-0.4.5-py3-none-any.whl
  • Upload date:
  • Size: 211.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataenginex-0.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e54c1dc75b37169139b2b6020477736f14ea4557738eb7075b68f60254d7f9b9
MD5 e7b44f1d0ebdc892a004cef900f4f932
BLAKE2b-256 3234bbc55da2f7c85947c8ccc23b0866b0383d5814948329c7c0585d46ef0f3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataenginex-0.4.5-py3-none-any.whl:

Publisher: release.yml on TheDataEngineX/dataenginex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page