DataEngineX - Core framework for data engineering projects
Project description
dataenginex
dataenginex is the core DataEngineX framework package for building observable, production-ready data and API services.
It provides:
- FastAPI application primitives and API extensions
- Middleware for structured logging, metrics, and tracing
- Data quality and validation utilities
- Lakehouse and warehouse building blocks (S3, GCS, BigQuery, Parquet)
- Reusable ML support modules for model-serving workflows
Install
# Core (no web framework dependencies)
pip install dataenginex
# With FastAPI, middleware, auth, health checks
pip install dataenginex[api]
# With cloud storage backends
pip install dataenginex[s3] # AWS S3 via boto3
pip install dataenginex[gcs] # Google Cloud Storage
pip install dataenginex[bq] # Google BigQuery
pip install dataenginex[cloud] # All cloud storage (S3 + GCS)
# Everything
pip install dataenginex[all]
Package Scope
dataenginex is the core library from the DEX monorepo. It is the only published package — applications and examples are built on top of it.
Submodules
| Module | Requires Extra | Description |
|---|---|---|
dataenginex.core |
— | Medallion architecture, schemas, quality gates, validators |
dataenginex.data |
— | Schema registry, data contracts, catalog |
dataenginex.lakehouse |
optional [s3] [gcs] [bq] |
Storage backends (JSON, Parquet, S3, GCS, BigQuery), catalog, partitioning |
dataenginex.warehouse |
— | Warehouse layers, lineage tracking |
dataenginex.ml |
— | Model registry, vectorstore, LLM adapters, drift detection |
dataenginex.api |
[api] |
Auth (JWT), health checks, error handling, pagination, rate limiting |
dataenginex.middleware |
[api] |
Structured logging, Prometheus metrics, OpenTelemetry tracing |
Quick Usage
# Core — always available
from dataenginex.core import MedallionArchitecture, QualityGate
from dataenginex.data import SchemaRegistry
from dataenginex.ml import ModelRegistry
# API — requires pip install dataenginex[api]
from dataenginex.api import HealthChecker, AuthMiddleware, paginate
from dataenginex.middleware import configure_logging, configure_tracing
# Storage — requires the relevant extra
from dataenginex.lakehouse import JsonStorage, get_storage
storage = get_storage("file://./data") # always works
storage = get_storage("s3://my-bucket") # requires [s3]
storage = get_storage("gs://my-bucket") # requires [gcs]
storage = get_storage("bq://my-project/ds") # requires [bq]
Source and Docs
- Repository: https://github.com/TheDataEngineX/DEX
- CI/CD guide:
docs/CI_CD.md - Release notes:
src/dataenginex/RELEASE_NOTES.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataenginex-0.8.3.tar.gz.
File metadata
- Download URL: dataenginex-0.8.3.tar.gz
- Upload date:
- Size: 410.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7c82e0d15b7fb03cdb42dce827116d5c8222207ce58c484473299d8bbb7f710
|
|
| MD5 |
36d8ce560ef0b5399ef83bb9b6058af6
|
|
| BLAKE2b-256 |
9f7864845a553600c6d1d6cefb46a3a3ab5bf181d35dee08f6afd7b94960acc2
|
Provenance
The following attestation bundles were made for dataenginex-0.8.3.tar.gz:
Publisher:
pypi-publish.yml on TheDataEngineX/dex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataenginex-0.8.3.tar.gz -
Subject digest:
f7c82e0d15b7fb03cdb42dce827116d5c8222207ce58c484473299d8bbb7f710 - Sigstore transparency entry: 1122044925
- Sigstore integration time:
-
Permalink:
TheDataEngineX/dex@67128a1ebd0ab56cc4c4aa19811e6f14a318dec3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/TheDataEngineX
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@67128a1ebd0ab56cc4c4aa19811e6f14a318dec3 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file dataenginex-0.8.3-py3-none-any.whl.
File metadata
- Download URL: dataenginex-0.8.3-py3-none-any.whl
- Upload date:
- Size: 97.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00eaa6abaef6aae326f66e92b9aa09a62272c27a023398bb6e6984c4d41a8021
|
|
| MD5 |
575494440a6293ffc30d699b461a0dba
|
|
| BLAKE2b-256 |
662e6dc27b806835191b63a676d8bc7eb7f4911093adcb463de82d8b02a4e53d
|
Provenance
The following attestation bundles were made for dataenginex-0.8.3-py3-none-any.whl:
Publisher:
pypi-publish.yml on TheDataEngineX/dex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dataenginex-0.8.3-py3-none-any.whl -
Subject digest:
00eaa6abaef6aae326f66e92b9aa09a62272c27a023398bb6e6984c4d41a8021 - Sigstore transparency entry: 1122044961
- Sigstore integration time:
-
Permalink:
TheDataEngineX/dex@67128a1ebd0ab56cc4c4aa19811e6f14a318dec3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/TheDataEngineX
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@67128a1ebd0ab56cc4c4aa19811e6f14a318dec3 -
Trigger Event:
workflow_dispatch
-
Statement type: