Skip to main content

data operations related code

Project description

data-ops

Coverage PyPI

data operations related code

motivation

data-ops is a library with tested and used code aligning on some standards regarding code structure and quality and to avoid reinventing the wheel. It builds on top of dataops-abs.

installation

    `pip install tgedr-dataops`

package namespaces and its contents

commons

  • S3Connector: base class to be extended, providing a connection session with aws s3 resources
  • utils_fs: utility module with file system related functions (example)

quality

  • PandasValidation : GreatExpectationsValidation implementation to validate pandas dataframes with Great Expectations library (example)

sink

  • LocalFsFileSink: Sink implementation class used to save/persist an object/file to a local fs location (example)
  • S3FileSink: Sink implementation class used to save/persist a local object/file to an s3 bucket (example)

source

  • AbstractS3FileSource: abstract Source class used to retrieve objects/files from s3 bucket to local fs location circumventing some formats download limitation
  • LocalFsFileSource: Source implementation class used to retrieve local objects/files to another local fs location (example)
  • PdDfS3Source: Source implementation class used to read a pandas dataframe from s3, whether a csv or an excel (xslx) file (example csv, example excel)
  • S3FileCopy: Source implementation class used to copy objects/files from an s3 bucket to another s3 bucket (example)
  • S3FileExtendedSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location with the extra method get_metadata providing sile metadata ("LastModified", "ContentLength", "ETag", "VersionId", "ContentType")(example)
  • S3FileSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location (example)

store

  • FsSinglePartitionParquetStore : abstract Store implementation defining persistence on parquet files with an optional single partition, regardless of the location it should persist
  • LocalFsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using local file system (example)
  • S3FsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using aws s3 file system (example)
  • ParquetStore : Store implementation class for interacting with Parquet files using a filesystem interface (example)

known issues/further development

development

  • main requirements:

    • uv
    • bash
  • Clone the repository like this:

    git clone git@github.com:tgedr/dataops
    
  • cd into the folder: cd dataops

  • install requirements: ./helper.sh reqs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tgedr_dataops-1.0.5.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tgedr_dataops-1.0.5-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file tgedr_dataops-1.0.5.tar.gz.

File metadata

  • Download URL: tgedr_dataops-1.0.5.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops-1.0.5.tar.gz
Algorithm Hash digest
SHA256 8545f0f7186bd84b4df34ea79dfadb9e3ec0ab9ce78f2aa8dd865bca87debd68
MD5 2d900ae115cce28ced1fdf83c7bee875
BLAKE2b-256 e0bfbb7570c357ad5b13f11dfcdf2f8c550fbb913e73379961286cb69a149d6d

See more details on using hashes here.

File details

Details for the file tgedr_dataops-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: tgedr_dataops-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d42b81f8de3c6bf66c913d773a021cd6bcd9c9c034925440bed63bc008ec4b28
MD5 6daa94a2a6542e4ae24e37662cfad8f3
BLAKE2b-256 f62654bedc5bd2299c07076b860913786addfe2d8388c65a91eaa637e532e90a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page