Skip to main content

DataEngineX — open-source, self-hosted, local-first Data + ML + AI workbench library

Project description

dataenginex

The Python library that powers DEX Studio — an open-source, self-hosted, local-first Data + ML + AI workbench. Use the library directly when you want code, not a UI.

Pre-1.0 status. 0.4.0 is honest about that. See the scope reset CHANGELOG for the rationale.

Install

pip install dataenginex                    # lean base — DuckDB, structlog, Pydantic, Click, pyarrow

Optional integrations — install only what you need:

pip install 'dataenginex[postgres]'        # asyncpg-backed lineage, persistence
pip install 'dataenginex[qdrant]'          # Qdrant vector store backend
pip install 'dataenginex[queue]'           # ARQ async job queue (pulls redis)
pip install 'dataenginex[cloud]'           # S3, GCS, BigQuery storage backends
pip install 'dataenginex[ml]'              # scikit-learn, xgboost, sentence-transformers
pip install 'dataenginex[tracking]'        # MLflow integration
pip install 'dataenginex[data]'            # PySpark, databricks-cli

LiteLLM: install separately — it pins python-dotenv==1.0.1 which conflicts with our >=1.2.2:

pip install 'litellm>=1.83.3' --no-deps

Quick start

from pathlib import Path
from dataenginex.engine import DexEngine

# Load config and initialize all backends
engine = DexEngine(Path("dex.yaml"))

# Data — run pipelines defined in dex.yaml
engine.run_pipeline("clean_users")

# ML — train, register, predict
models = engine.model_registry.list_models()
result = engine.model_registry.predict("churn_model", features)

# AI — chat with an agent over your data
response = engine.agents["assistant"].chat("summarise the latest pipeline run")

# Persistence — query DuckDB-backed history
runs = engine.store.list_pipeline_runs(limit=10)

Smaller surfaces — use only what you need:

from dataenginex.config import load_config
cfg = load_config("dex.yaml")

from dataenginex.core.interfaces import BaseConnector
from dataenginex.core.registry import BackendRegistry

from dataenginex.ml import ModelRegistry
from dataenginex.ai.llm import get_llm_provider
from dataenginex.ai.vectorstore import VectorStoreBackend

Submodules

Module Description
dataenginex.engine DexEngine — single entry point; loads config, inits store, wires backends
dataenginex.store DexStore — DuckDB-backed persistence (.dex/store.duckdb)
dataenginex.config dex.yaml schema, loader, env-var resolution
dataenginex.core Exceptions, Base* ABCs, BackendRegistry
dataenginex.cli dex CLI (validate, version, init)
dataenginex.data Connectors (CSV, Parquet, DuckDB, HTTP, …), pipeline runner, schema registry
dataenginex.ml Classical ML — training, model registry, serving, drift
dataenginex.ai LLM providers, agents, RAG, vector store, memory, observability
dataenginex.orchestration Scheduler, background workers
dataenginex.middleware structlog config, Prometheus metrics
dataenginex.lakehouse Storage backends, catalog, partitioning
dataenginex.warehouse Transforms, lineage tracking
dataenginex.secops PII detection, masking, audit logging
dataenginex.api Pydantic response models (no HTTP server bundled)
dataenginex.plugins Entry-point plugin discovery

Want the UI?

dataenginex is the engine. The web UI lives in a separate repo:

git clone https://github.com/TheDataEngineX/dex-studio && cd dex-studio
docker compose up         # open http://localhost:7860

DEX Studio imports dataenginex directly — no separate API server.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataenginex-0.4.1.tar.gz (517.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataenginex-0.4.1-py3-none-any.whl (204.0 kB view details)

Uploaded Python 3

File details

Details for the file dataenginex-0.4.1.tar.gz.

File metadata

  • Download URL: dataenginex-0.4.1.tar.gz
  • Upload date:
  • Size: 517.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataenginex-0.4.1.tar.gz
Algorithm Hash digest
SHA256 d8baaaad7f8c2b73a9ecbe229fef0d18c4c1269aa141ff766ffd6b51cf1cae74
MD5 1bd02affa8766f612c6fac1aac78406a
BLAKE2b-256 6fc4c5ab7f31353e314139f8db288d5f12f9b4f3bd7f3a2e50a1a6ad5b9da9c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataenginex-0.4.1.tar.gz:

Publisher: release.yml on TheDataEngineX/dex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataenginex-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: dataenginex-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 204.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dataenginex-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bd963b3cfa7f0f8af996270fb35d01dd325b3b75b2b2e3a7102a7dd93a25d3af
MD5 da60d6b5c0e4477263c9731f4f65dd4f
BLAKE2b-256 34d40c7aaef4a334eab55f4a2294b44ae20fa8ca372551bcba6873deca29f9eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataenginex-0.4.1-py3-none-any.whl:

Publisher: release.yml on TheDataEngineX/dex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page