data operations related code - extended
Project description
tgedr-dataops-ext
Data operations library — extended.
motivation
tgedr-dataops-ext builds on top of tgedr-dataops (the abstract contracts layer) and provides concrete, tested implementations for distributed data processing with PySpark and Delta Lake. It covers session management, ETL pipelines, Delta table storage, data validation, and Databricks job integration, all following consistent code quality and structural standards.
installation
pip install tgedr-dataops-ext
package contents
commons
| Class | Description | Example |
|---|---|---|
Dataset |
Immutable wrapper pairing a Spark DataFrame with its Metadata |
test |
Metadata |
Immutable dataclass describing a dataset (name, version, framing, sources) | test |
UtilsSpark |
Utility class for creating and configuring Spark sessions (local, AWS Glue, or active session) and building PySpark schemas from type dictionaries | test |
UtilsDatabricks |
Utility class for retrieving the Databricks dbutils object from the active Spark session |
test |
EtlDatabricks |
Abstract intermediate ETL class extending Etl with Databricks job integration: captures run_id, publishes outputs via dbutils.jobs.taskValues, and provides the inject_configuration decorator for auto-wiring method parameters from configuration or upstream task values |
test |
quality
| Class | Description | Example |
|---|---|---|
PysparkValidation |
GreatExpectationsValidation implementation for validating PySpark DataFrames using the Great Expectations library |
test |
source
| Class | Description | Example |
|---|---|---|
DeltaTableSource |
Abstract Source base class for reading Delta Lake datasets, returning a pandas DataFrame |
test |
LocalDeltaTable |
Concrete Source reading Delta Lake datasets from the local filesystem using pure Python (no PySpark required) |
test |
S3DeltaTable |
Concrete Source reading Delta Lake datasets from S3 using pure Python (no PySpark required) |
test |
store
| Class | Description | Example |
|---|---|---|
SparkDeltaStore |
Store implementation for PySpark distributed processing with Delta Lake format. Supports versioned reads, append/overwrite writes, upserts, partitioning, schema evolution, retention policies, metadata management, and column comments |
test |
development
Requirements:
uvbash
# clone
git clone git@github.com:tgedr/dataops-ext
cd dataops-ext
# install dependencies
./helper.sh reqs
# run tests
./helper.sh test
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tgedr_dataops_ext-0.0.5.tar.gz.
File metadata
- Download URL: tgedr_dataops_ext-0.0.5.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86f275b0171adf17efd2efef6d893bd9ba3dc58a02ee31db62d12e9f37e6d667
|
|
| MD5 |
b109e51868fcc9d2b6be5326b90a4d44
|
|
| BLAKE2b-256 |
1e5732700ac7746f6e4900c4e0235660df6d083408bbd6e3f54992edce6613df
|
File details
Details for the file tgedr_dataops_ext-0.0.5-py3-none-any.whl.
File metadata
- Download URL: tgedr_dataops_ext-0.0.5-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ac998d4387659b8a0fbe17a4796a54a55cad59aa95aa85ddb480c1929b2d38f
|
|
| MD5 |
5c1d61a48f16b0297eeb2a05814074e5
|
|
| BLAKE2b-256 |
394aa8338fca5dc42cf4e8933a23e4423eae91ada28546db64ff0dceefbf57c0
|