Skip to main content

function annotations to check properties on pandas dataframe transformations

Project description

Pandas transform checker

what is it ?

This library is focused on data quality checking on pandas transformations. Transformations are functions that takes a pandas DataFrame as input ( plus other params ) and output a DataFrame.

This library allow the user to specify a contract that the function must respect. In this contract the user can specify:

  • the added columns
  • the deleted columns
  • the modified columns
  • if the function add/drop records
  • if the function modify the index ( ex: resampling )

Once the contract if specified, the function will raise a RuntimeError if one of it's specifications is violated.

how to use it ?

The package contains the decorator that performs the check it can be imported the following way:

from pandas_transform_checker.decorator_contract_checker import input_df_contract

Args

df_param: name of the param of the function that is the input df contract_params: dict defining the contract of the function in the following format:

contract_dict = {
    "col_additions": {
        "col_a": "int",
        "col_b": "float"
    },
    "col_deletions": {
        "col_c",
        "col_d"
    },
    "col_editions": {
        "col_e",
        "col_f"
    },
    "allow_index_edition": False,
    "allow_drop_record": True
}

which means that the function must create "col_a", "col_b", delete "col_c", "col_d", must not modify any column data except "col_e", "col_f", and must not edit the index

here is the list of keys allowed in this dict:

  • col_additions: dict where keys are column names and values are dtypes (string)
  • col_deletions: set of str representing the deleted columns
  • col_editions: set of str representing the modified columns
  • allow_index_edition: bool indicating if the function modify the index
  • allow_add_drop_record (bool): indicate if the function can drop some records (ex. when dropna is used)

Usage

when you have a function that takes a df as input:

def super_func(df_input):
    ...

just add the annotation to automatically check properties

@input_df_contract(df_param="df_input", contract_dict={"col_editions": {"col_e","col_f"}})
def super_func(df_input):
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pandas-transform-checker, version 0.1.1
Filename, size File type Python version Upload date Hashes
Filename, size pandas_transform_checker-0.1.1.tar.gz (3.7 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page