Skip to main content

data operations related code

Project description

data-ops

Coverage PyPI

data operations related code

motivation

data-ops is a library with tested and used code aligning on some standards regarding code structure and quality and to avoid reinventing the wheel. It builds on top of dataops-abs.

installation

    `pip install tgedr-dataops`

package namespaces and its contents

commons

  • S3Connector: base class to be extended, providing a connection session with aws s3 resources
  • utils_fs: utility module with file system related functions (example)

quality

  • PandasValidation : GreatExpectationsValidation implementation to validate pandas dataframes with Great Expectations library (example)

sink

  • LocalFsFileSink: Sink implementation class used to save/persist an object/file to a local fs location (example)
  • S3FileSink: Sink implementation class used to save/persist a local object/file to an s3 bucket (example)

source

  • AbstractS3FileSource: abstract Source class used to retrieve objects/files from s3 bucket to local fs location circumventing some formats download limitation
  • LocalFsFileSource: Source implementation class used to retrieve local objects/files to another local fs location (example)
  • PdDfS3Source: Source implementation class used to read a pandas dataframe from s3, whether a csv or an excel (xslx) file (example csv, example excel)
  • S3FileCopy: Source implementation class used to copy objects/files from an s3 bucket to another s3 bucket (example)
  • S3FileExtendedSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location with the extra method get_metadata providing sile metadata ("LastModified", "ContentLength", "ETag", "VersionId", "ContentType")(example)
  • S3FileSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location (example)

store

  • FsSinglePartitionParquetStore : abstract Store implementation defining persistence on parquet files with an optional single partition, regardless of the location it should persist
  • LocalFsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using local file system (example)
  • S3FsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using aws s3 file system (example)
  • ParquetStore : Store implementation class for interacting with Parquet files using a filesystem interface (example)

known issues/further development

development

  • main requirements:

    • uv
    • bash
  • Clone the repository like this:

    git clone git@github.com:tgedr/dataops
    
  • cd into the folder: cd dataops

  • install requirements: ./helper.sh reqs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tgedr_dataops-1.0.9.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tgedr_dataops-1.0.9-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file tgedr_dataops-1.0.9.tar.gz.

File metadata

  • Download URL: tgedr_dataops-1.0.9.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops-1.0.9.tar.gz
Algorithm Hash digest
SHA256 36efca6e72773ebb56a012b7db1c0bdc7d62dcd9c444bdb8f8cbc83c332533cf
MD5 7022994575532f9a3258ee5a13e0be6f
BLAKE2b-256 746e19302eacf965c02c9ef79ac170a4de292456032c5e1c6b6558b1711f1d96

See more details on using hashes here.

File details

Details for the file tgedr_dataops-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: tgedr_dataops-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 3954ad759b6b1576d254741c2687d251914b60ae90835ea9601da451173140d1
MD5 9fd17d926582e263a25d5762ae3abffd
BLAKE2b-256 75f4b77749bc5d0a09cb0bf2aa9e7e1d8754812eaf8d5a6db6b4eabf0b93302d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page