Data helper package
Project description
data-toolz
This repository contains reusable python code for data projects.
The motivation for this project was to create a package which allows to abstract dataset read/write operations from
- destination type (
local,s3,<tbd...>) and - target file type (
delimiter-separated values,jsonlines,parquet)
This would allow to write code easily transferable between local and cloud applications.
installation
pip install data-toolz
usage
datatoolz.filesystem.FileSystem class gives you an abstraction for accesing both local and remote object using the well know pythonic open() interface.
from datatoolz.filesystem import FileSystem
for fs_type in ("local", "s3"):
fs = FileSystem(name=fs_type)
# common pythonic interface for both local and remote file systems
with fs.open("my-folder-or-bucket/my-file", mode="wt") as fo:
fo.write("Hello World!")
datatoolz.io.DataIO class gives you a versatile Reader/Writer interface for handling of typical data files (jsonlines, dsv, parquet)
import pandas as pd
from datatoolz.io import DataIO
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
dio = DataIO() # defaults to "local" FileSystem
# write as parquet
dio.write(dataframe=df, path="my-file.parquet", filetype="parquet")
dio.read(path="my-file.parquet", filetype="parquet")
# write as gzip-compressed jsonlines
dio.write(dataframe=df, path="my-file.json.gz", filetype="jsonlines", gzip=True)
dio.read(path="my-file.json.gz", filetype="jsonlines", gzip=True)
# write as delimiter-separated-values in multiple partitions
dio.write(dataframe=df, path="my-file.tsv", filetype="dsv", sep="\t", partition_by=["col1"])
dio.read(path="my-file.tsv", filetype="dsv", sep="\t")
# write output in multiple chunks per partition
dio.write(dataframe=df, path="my-prefix", filetype="dsv", sep="\t", partition_by=["col1"], suffix=["chunk01.tsv", "chunk02.tsv"])
dio.read(path="my-prefix", filetype="dsv", sep="\t")
datatoolz.logging.JsonLogger is a wrapper logger for outputting JSON-structured logs
from datatoolz.logging import JsonLogger
logger = JsonLogger(name="my-custom-logger", env="dev")
logger.info(msg="what is my purpose?", meaning_of_life=42)
{"logger": {"application": "my-custom-logger", "environment": "dev"}, "level": "info", "timestamp": "2020-11-03 18:31:07.757534", "message": "what is my purpose?", "extra": {"meaning_of_life": 42}}
It can also be used to decorate functions and log their execution details
from datatoolz.logging import JsonLogger
logger = JsonLogger(name="my-custom-logger", env="dev")
@logger.decorate(msg="my-custom-log", duration=True, memory=True, my_value="my-value", output_length=lambda x: len(x))
def my_func(x, y):
return x + y, x * y
print(my_func(42, 2))
{"logger": {"application": "my-custom-logger", "environment": "dev"}, "level": "info", "timestamp": "2021-03-24 18:10:47.054703", "message": "my-custom-log", "extra": {"function": "my_func", "memory": {"current": 432, "peak": 432}, "duration": 2.5980000000203063e-06, "my_value": "my-value", "output_length": 2}}
(44, 84)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data-toolz-0.1.11.tar.gz.
File metadata
- Download URL: data-toolz-0.1.11.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e797f0debb194b1e8f4c82b6fd2d989370f0498eb16c47bb60077a2451faa333
|
|
| MD5 |
416fef45cadcd1e48804de241f5fd44a
|
|
| BLAKE2b-256 |
b4d96c6a0ba91e4695efe104ae13c3d2b15df0250c7f664f634a1595fea216db
|
File details
Details for the file data_toolz-0.1.11-py3-none-any.whl.
File metadata
- Download URL: data_toolz-0.1.11-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53af2e4b9a0d8884a2ff705869ee7926acf73d70b6ca3aaed4f068cc15fed1c8
|
|
| MD5 |
c5a6e40ff1e4a898e1575283cdb92ae0
|
|
| BLAKE2b-256 |
941fe5e4291cee90a05d22640fe778ebcfe123dc8c9b4218f3ce03dbe3f9e873
|