Skip to main content

DataEngineX — open-source, self-hosted, local-first Data + ML + AI workbench library

Project description

dataenginex

The Python library that powers DEX Studio — an open-source, self-hosted, local-first Data + ML + AI workbench. Use the library directly when you want code, not a UI.

Pre-1.0 status. 0.4.0 is honest about that. See the scope reset CHANGELOG for the rationale.

Install

pip install dataenginex                    # lean base — DuckDB, structlog, Pydantic, Click, pyarrow

Optional integrations — install only what you need:

pip install 'dataenginex[postgres]'        # asyncpg-backed lineage, persistence
pip install 'dataenginex[qdrant]'          # Qdrant vector store backend
pip install 'dataenginex[queue]'           # ARQ async job queue (pulls redis)
pip install 'dataenginex[cloud]'           # S3, GCS, BigQuery storage backends
pip install 'dataenginex[ml]'              # scikit-learn, xgboost, sentence-transformers
pip install 'dataenginex[tracking]'        # MLflow integration
pip install 'dataenginex[data]'            # PySpark, databricks-cli

LiteLLM: install separately — it pins python-dotenv==1.0.1 which conflicts with our >=1.2.2:

pip install 'litellm>=1.83.3' --no-deps

Quick start

from pathlib import Path
from dataenginex.engine import DexEngine

# Load config and initialize all backends
engine = DexEngine(Path("dex.yaml"))

# Data — run pipelines defined in dex.yaml
engine.run_pipeline("clean_users")

# ML — train, register, predict
models = engine.model_registry.list_models()
result = engine.model_registry.predict("churn_model", features)

# AI — chat with an agent over your data
response = engine.agents["assistant"].chat("summarise the latest pipeline run")

# Persistence — query DuckDB-backed history
runs = engine.store.list_pipeline_runs(limit=10)

Smaller surfaces — use only what you need:

from dataenginex.config import load_config
cfg = load_config("dex.yaml")

from dataenginex.core.interfaces import BaseConnector
from dataenginex.core.registry import BackendRegistry

from dataenginex.ml import ModelRegistry
from dataenginex.ai.llm import get_llm_provider
from dataenginex.ai.vectorstore import VectorStoreBackend

Submodules

Module Description
dataenginex.engine DexEngine — single entry point; loads config, inits store, wires backends
dataenginex.store DexStore — DuckDB-backed persistence (.dex/store.duckdb)
dataenginex.config dex.yaml schema, loader, env-var resolution
dataenginex.core Exceptions, Base* ABCs, BackendRegistry
dataenginex.cli dex CLI (validate, version, init)
dataenginex.data Connectors (CSV, Parquet, DuckDB, HTTP, …), pipeline runner, schema registry
dataenginex.ml Classical ML — training, model registry, serving, drift
dataenginex.ai LLM providers, agents, RAG, vector store, memory, observability
dataenginex.orchestration Scheduler, background workers
dataenginex.middleware structlog config, Prometheus metrics
dataenginex.lakehouse Storage backends, catalog, partitioning
dataenginex.warehouse Transforms, lineage tracking
dataenginex.secops PII detection, masking, audit logging
dataenginex.api Pydantic response models (no HTTP server bundled)
dataenginex.plugins Entry-point plugin discovery

Want the UI?

dataenginex is the engine. The web UI lives in a separate repo:

git clone https://github.com/TheDataEngineX/dex-studio && cd dex-studio
docker compose up         # open http://localhost:7860

DEX Studio imports dataenginex directly — no separate API server.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataenginex-0.4.0.tar.gz (520.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataenginex-0.4.0-py3-none-any.whl (194.8 kB view details)

Uploaded Python 3

File details

Details for the file dataenginex-0.4.0.tar.gz.

File metadata

  • Download URL: dataenginex-0.4.0.tar.gz
  • Upload date:
  • Size: 520.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataenginex-0.4.0.tar.gz
Algorithm Hash digest
SHA256 7108487706e20cafcb899e114cc2f87e470e273b58ba9f16dcc99d3b684a5ecd
MD5 ef704b006c5c7e216afc86f68b8c1d49
BLAKE2b-256 56481306a685d7c20972df0b6fbf6a93900ddda626e070ba340e5d1bc3371aad

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataenginex-0.4.0.tar.gz:

Publisher: release.yml on TheDataEngineX/dex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataenginex-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: dataenginex-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 194.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataenginex-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24b37f60e89e634ebe233eff77ee585695f0d00564abb4d1d0a2b4a5850b2a3a
MD5 f09bece7b62964427288469b049242b8
BLAKE2b-256 c724d486a27ecccb9133c7b3f279808d539945de4340a0e860a1a4bf138d967c

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataenginex-0.4.0-py3-none-any.whl:

Publisher: release.yml on TheDataEngineX/dex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page