Skip to main content

data operations related code

Project description

data-ops

Coverage PyPI

data operations related code

motivation

data-ops is a library with tested and used code aligning on some standards regarding code structure and quality and to avoid reinventing the wheel. It builds on top of dataops-abs.

installation

    `pip install tgedr-dataops`

package namespaces and its contents

commons

  • S3Connector: base class to be extended, providing a connection session with aws s3 resources
  • utils_fs: utility module with file system related functions (example)

quality

  • PandasValidation : GreatExpectationsValidation implementation to validate pandas dataframes with Great Expectations library (example)

sink

  • LocalFsFileSink: Sink implementation class used to save/persist an object/file to a local fs location (example)
  • S3FileSink: Sink implementation class used to save/persist a local object/file to an s3 bucket (example)

source

  • AbstractS3FileSource: abstract Source class used to retrieve objects/files from s3 bucket to local fs location circumventing some formats download limitation
  • LocalFsFileSource: Source implementation class used to retrieve local objects/files to another local fs location (example)
  • PdDfS3Source: Source implementation class used to read a pandas dataframe from s3, whether a csv or an excel (xslx) file (example csv, example excel)
  • S3FileCopy: Source implementation class used to copy objects/files from an s3 bucket to another s3 bucket (example)
  • S3FileExtendedSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location with the extra method get_metadata providing sile metadata ("LastModified", "ContentLength", "ETag", "VersionId", "ContentType")(example)
  • S3FileSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location (example)

store

  • FsSinglePartitionParquetStore : abstract Store implementation defining persistence on parquet files with an optional single partition, regardless of the location it should persist
  • LocalFsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using local file system (example)
  • S3FsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using aws s3 file system (example)
  • ParquetStore : Store implementation class for interacting with Parquet files using a filesystem interface (example)

known issues/further development

development

  • main requirements:

    • uv
    • bash
  • Clone the repository like this:

    git clone git@github.com:tgedr/dataops
    
  • cd into the folder: cd dataops

  • install requirements: ./helper.sh reqs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tgedr_dataops-1.0.8.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tgedr_dataops-1.0.8-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file tgedr_dataops-1.0.8.tar.gz.

File metadata

  • Download URL: tgedr_dataops-1.0.8.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops-1.0.8.tar.gz
Algorithm Hash digest
SHA256 be320ae13cc5333a58ce31ecca038be7d47db8bbbcfd19293e0798656fe8da0a
MD5 ecdf84cf1dc875f8e75cbd3930ad5a32
BLAKE2b-256 349a49ce6db4f9420c51c81507de52e87635d0839191c6936ab26b41bd7d2bb4

See more details on using hashes here.

File details

Details for the file tgedr_dataops-1.0.8-py3-none-any.whl.

File metadata

  • Download URL: tgedr_dataops-1.0.8-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ce554bda5b4aec0704bab875982fd87c086a7a9a945eb1bbe786798556ce00cc
MD5 cd123c20ab4a15ad8279d3e8a96b9694
BLAKE2b-256 016ef0b6ac5c2d72cddc9721c9a4e600915f75d485b8418aacd40cbc386ec6c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page