data operations related code
Project description
data-ops
data operations related code
motivation
data-ops is a library with tested and used code aligning on some standards regarding code structure and quality and to avoid reinventing the wheel. It builds on top of dataops-abs.
installation
`pip install tgedr-dataops`
package namespaces and its contents
commons
- S3Connector: base class to be extended, providing a connection session with aws s3 resources
- utils_fs: utility module with file system related functions (example)
quality
- PandasValidation : GreatExpectationsValidation implementation to validate pandas dataframes with Great Expectations library (example)
sink
- LocalFsFileSink: Sink implementation class used to save/persist an object/file to a local fs location (example)
- S3FileSink: Sink implementation class used to save/persist a local object/file to an s3 bucket (example)
source
- AbstractS3FileSource: abstract Source class used to retrieve objects/files from s3 bucket to local fs location circumventing some formats download limitation
- LocalFsFileSource: Source implementation class used to retrieve local objects/files to another local fs location (example)
- PdDfS3Source: Source implementation class used to read a pandas dataframe from s3, whether a csv or an excel (xslx) file (example csv, example excel)
- S3FileCopy: Source implementation class used to copy objects/files from an s3 bucket to another s3 bucket (example)
- S3FileExtendedSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location with the extra method
get_metadataproviding sile metadata ("LastModified", "ContentLength", "ETag", "VersionId", "ContentType")(example) - S3FileSource: Source implementation class used to retrieve objects/files from s3 bucket to local fs location (example)
store
- FsSinglePartitionParquetStore : abstract Store implementation defining persistence on parquet files with an optional single partition, regardless of the location it should persist
- LocalFsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using local file system (example)
- S3FsSinglePartitionParquetStore : FsSinglePartitionParquetStore implementation using aws s3 file system (example)
- ParquetStore : Store implementation class for interacting with Parquet files using a filesystem interface (example)
known issues/further development
- update data while changing its partition value (check unit test)
development
-
main requirements:
- uv
- bash
-
Clone the repository like this:
git clone git@github.com:tgedr/dataops
-
cd into the folder:
cd dataops -
install requirements:
./helper.sh reqs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tgedr_dataops-1.0.5.tar.gz.
File metadata
- Download URL: tgedr_dataops-1.0.5.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8545f0f7186bd84b4df34ea79dfadb9e3ec0ab9ce78f2aa8dd865bca87debd68
|
|
| MD5 |
2d900ae115cce28ced1fdf83c7bee875
|
|
| BLAKE2b-256 |
e0bfbb7570c357ad5b13f11dfcdf2f8c550fbb913e73379961286cb69a149d6d
|
File details
Details for the file tgedr_dataops-1.0.5-py3-none-any.whl.
File metadata
- Download URL: tgedr_dataops-1.0.5-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d42b81f8de3c6bf66c913d773a021cd6bcd9c9c034925440bed63bc008ec4b28
|
|
| MD5 |
6daa94a2a6542e4ae24e37662cfad8f3
|
|
| BLAKE2b-256 |
f62654bedc5bd2299c07076b860913786addfe2d8388c65a91eaa637e532e90a
|