function annotations to check properties on pandas dataframe transformations
Project description
Pandas transform checker
what is it ?
This library is focused on data quality checking on pandas transformations. Transformations are functions that takes a pandas DataFrame as input ( plus other params ) and output a DataFrame.
This library allow the user to specify a contract that the function must respect. In this contract the user can specify:
- the added columns
- the deleted columns
- the modified columns
- if the function add/drop records
- if the function modify the index ( ex: resampling )
Once the contract if specified, the function will raise a RuntimeError if one of it's specifications is violated.
how to use it ?
The package contains the decorator that performs the check it can be imported the following way:
from pandas_transform_checker.decorator_contract_checker import input_df_contract
Args
df_param: name of the param of the function that is the input df contract_params: dict defining the contract of the function in the following format:
contract_dict = {
"col_additions": {
"col_a": "int",
"col_b": "float"
},
"col_deletions": {
"col_c",
"col_d"
},
"col_editions": {
"col_e",
"col_f"
},
"allow_index_edition": False,
"allow_drop_record": True
}
which means that the function must create "col_a", "col_b", delete "col_c", "col_d", must not modify any column data except "col_e", "col_f", and must not edit the index
here is the list of keys allowed in this dict:
- col_additions: dict where keys are column names and values are dtypes (string)
- col_deletions: set of str representing the deleted columns
- col_editions: set of str representing the modified columns
- allow_index_edition: bool indicating if the function modify the index
- allow_add_drop_record (bool): indicate if the function can drop some records (ex. when dropna is used)
Usage
when you have a function that takes a df as input:
def super_func(df_input):
...
just add the annotation to automatically check properties
@input_df_contract(df_param="df_input", contract_dict={"col_editions": {"col_e","col_f"}})
def super_func(df_input):
...
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for pandas_transform_checker-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a1f55f591d232353fe153b9da5cca410c0bb9b4f301852169534ed89b8769de |
|
MD5 | 67295be1d52594e66acdc8ea0cbd6d00 |
|
BLAKE2b-256 | c5c9d8b9e8afc30b0215589ab2677273b754c0e589e925e86d2b8f1ed451b88e |