Skip to main content

DataEngineX — open-source, self-hosted, local-first Data + ML + AI workbench library

Project description

dataenginex

The Python library that powers DEX Studio — an open-source, self-hosted, local-first Data + ML + AI workbench. Use the library directly when you want code, not a UI.

Pre-1.0 status. 0.4.0 is honest about that. See the scope reset CHANGELOG for the rationale.

Install

pip install dataenginex                    # lean base — DuckDB, structlog, Pydantic, Click, pyarrow

Optional integrations — install only what you need:

pip install 'dataenginex[postgres]'        # asyncpg-backed lineage, persistence
pip install 'dataenginex[qdrant]'          # Qdrant vector store backend
pip install 'dataenginex[queue]'           # ARQ async job queue (pulls redis)
pip install 'dataenginex[cloud]'           # S3, GCS, BigQuery storage backends
pip install 'dataenginex[ml]'              # scikit-learn, xgboost, sentence-transformers
pip install 'dataenginex[tracking]'        # MLflow integration
pip install 'dataenginex[data]'            # PySpark, databricks-cli

LiteLLM: install separately — it pins python-dotenv==1.0.1 which conflicts with our >=1.2.2:

pip install 'litellm>=1.83.3' --no-deps

Quick start

from pathlib import Path
from dataenginex.engine import DexEngine

# Load config and initialize all backends
engine = DexEngine(Path("dex.yaml"))

# Data — run pipelines defined in dex.yaml
engine.run_pipeline("clean_users")

# ML — train, register, predict
models = engine.model_registry.list_models()
result = engine.model_registry.predict("churn_model", features)

# AI — chat with an agent over your data
response = engine.agents["assistant"].chat("summarise the latest pipeline run")

# Persistence — query DuckDB-backed history
runs = engine.store.list_pipeline_runs(limit=10)

Smaller surfaces — use only what you need:

from dataenginex.config import load_config
cfg = load_config("dex.yaml")

from dataenginex.core.interfaces import BaseConnector
from dataenginex.core.registry import BackendRegistry

from dataenginex.ml import ModelRegistry
from dataenginex.ai.llm import get_llm_provider
from dataenginex.ai.vectorstore import VectorStoreBackend

Submodules

Module Description
dataenginex.engine DexEngine — single entry point; loads config, inits store, wires backends
dataenginex.store DexStore — DuckDB-backed persistence (.dex/store.duckdb)
dataenginex.config dex.yaml schema, loader, env-var resolution
dataenginex.core Exceptions, Base* ABCs, BackendRegistry
dataenginex.cli dex CLI (validate, version, init)
dataenginex.data Connectors (CSV, Parquet, DuckDB, HTTP, …), pipeline runner, schema registry
dataenginex.ml Classical ML — training, model registry, serving, drift
dataenginex.ai LLM providers, agents, RAG, vector store, memory, observability
dataenginex.orchestration Scheduler, background workers
dataenginex.middleware structlog config, Prometheus metrics
dataenginex.lakehouse Storage backends, catalog, partitioning
dataenginex.warehouse Transforms, lineage tracking
dataenginex.secops PII detection, masking, audit logging
dataenginex.api Pydantic response models (no HTTP server bundled)
dataenginex.plugins Entry-point plugin discovery

Want the UI?

dataenginex is the engine. The web UI lives in a separate repo:

git clone https://github.com/TheDataEngineX/dex-studio && cd dex-studio
docker compose up         # open http://localhost:7860

DEX Studio imports dataenginex directly — no separate API server.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataenginex-0.4.4.tar.gz (514.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataenginex-0.4.4-py3-none-any.whl (211.5 kB view details)

Uploaded Python 3

File details

Details for the file dataenginex-0.4.4.tar.gz.

File metadata

  • Download URL: dataenginex-0.4.4.tar.gz
  • Upload date:
  • Size: 514.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataenginex-0.4.4.tar.gz
Algorithm Hash digest
SHA256 260c405e438f2cf83bf04ea1dd43cbe151fa3406949cd495dea90c97bd1bef59
MD5 fdb27b62098447a9e55a4a7d30928941
BLAKE2b-256 f7f85c4537f925df2589d23b96d728f5e4a1f5c691db4be35b697988db1f0970

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataenginex-0.4.4.tar.gz:

Publisher: release.yml on TheDataEngineX/dataenginex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataenginex-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: dataenginex-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 211.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataenginex-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e20ff397d25f7f7f83d48cc912d8c4da9cd8065d3037e0c592bcdede2c288672
MD5 ccb6a74d28ebbb0692adaf749ddaa9fa
BLAKE2b-256 87edd70f0a2ecf9fb0a4956d70fc1f0e3662513a042f36c702d44b0af644cf9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataenginex-0.4.4-py3-none-any.whl:

Publisher: release.yml on TheDataEngineX/dataenginex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page