Skip to main content

data operations related code - extended

Project description

tgedr-dataops-ext

Coverage PyPI

data operations related code - extended

motivation

dataops-ext is a library with tested and used code aligning on some standards regarding code structure and quality and to avoid reinventing the wheel. It builds on top of dataops-abs and dataops providing distributed processing features based on pyspark.

installation

    `pip install tgedr-dataops-ext`

package namespaces and its contents

commons

  • Dataset: immutable class to wrap up a dataframe along with metadata (example)
  • Metadata: immutable class depicting dataset metadata (example)
  • UtilsSpark: utility class to work with spark, mostly helping on creating a session (example)

quality

  • PysparkValidation : GreatExpectationsValidation implementation to validate pyspark dataframes with Great Expectations library (example)

source

  • DeltaTableSource: abstract Source class used to read delta lake format datasets returning a pandas dataframe" (example)
  • LocalDeltaTable: Source class used to read delta lake format datasets from local fs with python only, pyspark not needed, returning a pandas dataframe (example)
  • S3DeltaTable: Source class used to read delta lake format datasets from s3 bucket with python only, pyspark not needed, returning a pandas dataframe (example)

store

  • SparkDeltaStore : Store implementation for pyspark distributed processing with delta table format (example)

development

  • main requirements:

    • uv
    • bash
  • Clone the repository like this:

    git clone git@github.com:tgedr/dataops-ext
    
  • cd into the folder: cd dataops-ext

  • install requirements: ./helper.sh reqs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tgedr_dataops_ext-0.0.4.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tgedr_dataops_ext-0.0.4-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file tgedr_dataops_ext-0.0.4.tar.gz.

File metadata

  • Download URL: tgedr_dataops_ext-0.0.4.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops_ext-0.0.4.tar.gz
Algorithm Hash digest
SHA256 a170b824dc839f26aa91b503e0d71e955cb982f1808823a68bd269c2a9d868e4
MD5 2f3530b42d5ce976673003618cd864b3
BLAKE2b-256 27caf9815d18df66c08bb88646f533257fd87d75f0797754e68765a9ccb6a550

See more details on using hashes here.

File details

Details for the file tgedr_dataops_ext-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: tgedr_dataops_ext-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.7 {"installer":{"name":"uv","version":"0.10.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops_ext-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 4ec617a688a4a0c3f445e6d77b256461fa40907a64f0d7df005f1865cbdd21cc
MD5 64bd9e7d1c2d0b2235fdea0d60aa24c4
BLAKE2b-256 b0e028146284460c561dbae8d4e82394021fd2fc8e891ca32413db02b9ceaf74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page