Skip to main content

data operations related code

Project description

data-ops

Coverage PyPI

data operations related code

motivation

data-ops is a library with tested and used code aligning on some standards regarding code structure and quality and to avoid reinventing the wheel. It builds on top of dataops-abs.

installation

    `pip install tgedr-dataops`

package namespaces and its contents

commons

  • S3Connector: base class to be extended, providing a connection session with aws s3 resources
  • utils_fs: utility module with file system related functions (example)

quality

  • PandasValidation : GreatExpectationsValidation implementation to validate pandas dataframes with Great Expectations library (example)

sink

  • LocalFsFileSink: Sink implementation class used to save/persist an object/file to a local fs location (example)
  • S3FileSink: Sink implementation class used to save/persist a local object/file to an s3 bucket (example)

source

  • AbstractS3FileSource: abstract Source class used to retrieve objects/files from s3 bucket to local fs location circumventing some formats download limitation
  • LocalFsFileSource: Source implementation class used to retrieve local objects/files to another local fs location (example)
  • PdDfS3Source: Source implementation class used to read a pandas dataframe from s3, whether a csv or an excel (xslx) file (example csv, example excel)
  • S3FileCopy: Source implementation class used to copy objects/files from an s3 bucket to another s3 bucket (example)
  • S3FileExtendedSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location with the extra method get_metadata providing sile metadata ("LastModified", "ContentLength", "ETag", "VersionId", "ContentType")(example)
  • S3FileSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location (example)

store

  • FsSinglePartitionParquetStore : abstract Store implementation defining persistence on parquet files with an optional single partition, regardless of the location it should persist
  • LocalFsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using local file system (example)
  • S3FsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using aws s3 file system (example)
  • ParquetStore : Store implementation class for interacting with Parquet files using a filesystem interface (example)

known issues/further development

development

  • main requirements:

    • uv
    • bash
  • Clone the repository like this:

    git clone git@github.com:tgedr/dataops
    
  • cd into the folder: cd dataops

  • install requirements: ./helper.sh reqs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tgedr_dataops-1.0.7.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tgedr_dataops-1.0.7-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file tgedr_dataops-1.0.7.tar.gz.

File metadata

  • Download URL: tgedr_dataops-1.0.7.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops-1.0.7.tar.gz
Algorithm Hash digest
SHA256 8f57247a2e219c4edfdb232f1754983c8a05fc01b7bc22aa181b14a31ec7c731
MD5 206d0a2b52205dfb269a7a462d8e5129
BLAKE2b-256 8c79d5ef5de2ed7119409ea6c65983cf8a3ddee9b306ccfa03c47ff77f06f66d

See more details on using hashes here.

File details

Details for the file tgedr_dataops-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: tgedr_dataops-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tgedr_dataops-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 904c021a68504d9d9ef8da7a6895bf8890062f94029302c69e207d7c80592ac2
MD5 6bd3d096e7de8ddd52d5fe61304adca1
BLAKE2b-256 367a7d4e67e8e9db21b22330db102527cb8210b64404f134497c38b41268e4ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page