Skip to main content

function annotations to check properties on pandas dataframe transformations

Project description

Pandas transform checker

what is it ?

This library is focused on data quality checking on pandas transformations. Transformations are functions that takes a pandas DataFrame as input ( plus other params ) and output a DataFrame.

This library allow the user to specify a contract that the function must respect. In this contract the user can specify:

  • the added columns
  • the deleted columns
  • the modified columns
  • if the function add/drop records
  • if the function modify the index ( ex: resampling )

Once the contract if specified, the function will raise a RuntimeError if one of it's specifications is violated.

how to use it ?

The package contains the decorator that performs the check it can be imported the following way:

from pandas_transform_checker.decorator_contract_checker import input_df_contract

Args

df_param: name of the param of the function that is the input df contract_params: dict defining the contract of the function in the following format:

contract_dict = {
    "col_additions": {
        "col_a": "int",
        "col_b": "float"
    },
    "col_deletions": {
        "col_c",
        "col_d"
    },
    "col_editions": {
        "col_e",
        "col_f"
    },
    "allow_index_edition": False,
    "allow_drop_record": True
}

which means that the function must create "col_a", "col_b", delete "col_c", "col_d", must not modify any column data except "col_e", "col_f", and must not edit the index

here is the list of keys allowed in this dict:

  • col_additions: dict where keys are column names and values are dtypes (string)
  • col_deletions: set of str representing the deleted columns
  • col_editions: set of str representing the modified columns
  • allow_index_edition: bool indicating if the function modify the index
  • allow_add_drop_record (bool): indicate if the function can drop some records (ex. when dropna is used)

Usage

when you have a function that takes a df as input:

def super_func(df_input):
    ...

just add the annotation to automatically check properties

@input_df_contract(df_param="df_input", contract_dict={"col_editions": {"col_e","col_f"}})
def super_func(df_input):
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_transform_checker-0.1.1.tar.gz (3.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page