function annotations to check properties on pandas dataframe transformations
Project description
Pandas transform checker
what is it ?
This library is focused on data quality checking on pandas transformations. Transformations are functions that takes a pandas DataFrame as input ( plus other params ) and output a DataFrame.
This library allow the user to specify a contract that the function must respect. In this contract the user can specify:
- the added columns
- the deleted columns
- the modified columns
- if the function add/drop records
- if the function modify the index ( ex: resampling )
Once the contract if specified, the function will raise a RuntimeError if one of it's specifications is violated.
how to use it ?
The package contains the decorator that performs the check it can be imported the following way:
from pandas_transform_checker.decorator_contract_checker import input_df_contract
Args
df_param: name of the param of the function that is the input df contract_params: dict defining the contract of the function in the following format:
contract_dict = {
"col_additions": {
"col_a": "int",
"col_b": "float"
},
"col_deletions": {
"col_c",
"col_d"
},
"col_editions": {
"col_e",
"col_f"
},
"allow_index_edition": False,
"allow_drop_record": True
}
which means that the function must create "col_a", "col_b", delete "col_c", "col_d", must not modify any column data except "col_e", "col_f", and must not edit the index
here is the list of keys allowed in this dict:
- col_additions: dict where keys are column names and values are dtypes (string)
- col_deletions: set of str representing the deleted columns
- col_editions: set of str representing the modified columns
- allow_index_edition: bool indicating if the function modify the index
- allow_add_drop_record (bool): indicate if the function can drop some records (ex. when dropna is used)
Usage
when you have a function that takes a df as input:
def super_func(df_input):
...
just add the annotation to automatically check properties
@input_df_contract(df_param="df_input", contract_dict={"col_editions": {"col_e","col_f"}})
def super_func(df_input):
...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pandas_transform_checker-0.1.1.tar.gz
.
File metadata
- Download URL: pandas_transform_checker-0.1.1.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a1f55f591d232353fe153b9da5cca410c0bb9b4f301852169534ed89b8769de |
|
MD5 | 67295be1d52594e66acdc8ea0cbd6d00 |
|
BLAKE2b-256 | c5c9d8b9e8afc30b0215589ab2677273b754c0e589e925e86d2b8f1ed451b88e |