Skip to main content

data operations related code - extended

Project description

tgedr-dataops-ext

Coverage PyPI

data operations related code - extended

motivation

dataops-ext is a library with tested and used code aligning on some standards regarding code structure and quality and to avoid reinventing the wheel. It builds on top of dataops-abs and dataops providing distributed processing features based on pyspark.

installation

    `pip install tgedr-dataops-ext`

package namespaces and its contents

commons

  • Dataset: immutable class to wrap up a dataframe along with metadata (example)
  • Metadata: immutable class depicting dataset metadata (example)
  • UtilsSpark: utility class to work with spark, mostly helping on creating a session (example)

quality

  • PysparkValidation : GreatExpectationsValidation implementation to validate pyspark dataframes with Great Expectations library (example)

source

  • DeltaTableSource: abstract Source class used to read delta lake format datasets returning a pandas dataframe" (example)
  • LocalDeltaTable: Source class used to read delta lake format datasets from local fs with python only, pyspark not needed, returning a pandas dataframe (example)
  • S3DeltaTable: Source class used to read delta lake format datasets from s3 bucket with python only, pyspark not needed, returning a pandas dataframe (example)

store

  • SparkDeltaStore : Store implementation for pyspark distributed processing with delta table format (example)

development

  • main requirements:

    • uv
    • bash
  • Clone the repository like this:

    git clone git@github.com:tgedr/dataops-ext
    
  • cd into the folder: cd dataops-ext

  • install requirements: ./helper.sh reqs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tgedr_dataops_ext-0.0.3.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tgedr_dataops_ext-0.0.3-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file tgedr_dataops_ext-0.0.3.tar.gz.

File metadata

  • Download URL: tgedr_dataops_ext-0.0.3.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops_ext-0.0.3.tar.gz
Algorithm Hash digest
SHA256 9ae61fb5632d0fd55b1c9e23d505ae3db193dc453b9f1a7926637551159c4ed7
MD5 67bd874ef6b701ce90f4b0e6a40f2143
BLAKE2b-256 f465979d0cf492e2d05ea8596e52ea4081f53baa054d3f306f235550ba3de622

See more details on using hashes here.

File details

Details for the file tgedr_dataops_ext-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: tgedr_dataops_ext-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops_ext-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 05952c755b0c5a62c76024bc5bae1378d7b62ad20194757faa6d800cbeb11b50
MD5 700ad4a4b619be9a4e5656ec79f2029c
BLAKE2b-256 3f211aa7368791a5487441e455e847971cf6ad6df694a5496e52482b2f5d80ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page