data operations related code - extended
Project description
tgedr-dataops-ext
data operations related code - extended
motivation
dataops-ext is a library with tested and used code aligning on some standards regarding code structure and quality and to avoid reinventing the wheel. It builds on top of dataops-abs and dataops providing distributed processing features based on pyspark.
installation
`pip install tgedr-dataops-ext`
package namespaces and its contents
commons
- Dataset: immutable class to wrap up a dataframe along with metadata (example)
- Metadata: immutable class depicting dataset metadata (example)
- UtilsSpark: utility class to work with spark, mostly helping on creating a session (example)
quality
- PysparkValidation : GreatExpectationsValidation implementation to validate pyspark dataframes with Great Expectations library (example)
source
- DeltaTableSource: abstract Source class used to read delta lake format datasets returning a pandas dataframe" (example)
- LocalDeltaTable: Source class used to read delta lake format datasets from local fs with python only, pyspark not needed, returning a pandas dataframe (example)
- S3DeltaTable: Source class used to read delta lake format datasets from s3 bucket with python only, pyspark not needed, returning a pandas dataframe (example)
store
- SparkDeltaStore : Store implementation for pyspark distributed processing with delta table format (example)
development
-
main requirements:
- uv
- bash
-
Clone the repository like this:
git clone git@github.com:tgedr/dataops-ext
-
cd into the folder:
cd dataops-ext -
install requirements:
./helper.sh reqs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tgedr_dataops_ext-0.0.4.tar.gz.
File metadata
- Download URL: tgedr_dataops_ext-0.0.4.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a170b824dc839f26aa91b503e0d71e955cb982f1808823a68bd269c2a9d868e4
|
|
| MD5 |
2f3530b42d5ce976673003618cd864b3
|
|
| BLAKE2b-256 |
27caf9815d18df66c08bb88646f533257fd87d75f0797754e68765a9ccb6a550
|
File details
Details for the file tgedr_dataops_ext-0.0.4-py3-none-any.whl.
File metadata
- Download URL: tgedr_dataops_ext-0.0.4-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ec617a688a4a0c3f445e6d77b256461fa40907a64f0d7df005f1865cbdd21cc
|
|
| MD5 |
64bd9e7d1c2d0b2235fdea0d60aa24c4
|
|
| BLAKE2b-256 |
b0e028146284460c561dbae8d4e82394021fd2fc8e891ca32413db02b9ceaf74
|