Skip to main content

function annotations to check properties on pandas dataframe transformations

Project description

Pandas transform checker

what is it ?

This library is focused on data quality checking on pandas transformations. Transformations are functions that takes a pandas DataFrame as input ( plus other params ) and output a DataFrame.

This library allow the user to specify a contract that the function must respect. In this contract the user can specify:

  • the added columns
  • the deleted columns
  • the modified columns
  • if the function add/drop records
  • if the function modify the index ( ex: resampling )

Once the contract if specified, the function will raise a RuntimeError if one of it's specifications is violated.

how to use it ?

The package contains the decorator that performs the check it can be imported the following way:

from pandas_transform_checker.decorator_contract_checker import input_df_contract

Args

df_param: name of the param of the function that is the input df contract_params: dict defining the contract of the function in the following format:

contract_dict = {
    "col_additions": {
        "col_a": "int",
        "col_b": "float"
    },
    "col_deletions": {
        "col_c",
        "col_d"
    },
    "col_editions": {
        "col_e",
        "col_f"
    },
    "allow_index_edition": False,
    "allow_drop_record": True
}

which means that the function must create "col_a", "col_b", delete "col_c", "col_d", must not modify any column data except "col_e", "col_f", and must not edit the index

here is the list of keys allowed in this dict:

  • col_additions: dict where keys are column names and values are dtypes (string)
  • col_deletions: set of str representing the deleted columns
  • col_editions: set of str representing the modified columns
  • allow_index_edition: bool indicating if the function modify the index
  • allow_add_drop_record (bool): indicate if the function can drop some records (ex. when dropna is used)

Usage

when you have a function that takes a df as input:

def super_func(df_input):
    ...

just add the annotation to automatically check properties

@input_df_contract(df_param="df_input", contract_dict={"col_editions": {"col_e","col_f"}})
def super_func(df_input):
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_transform_checker-0.1.1.tar.gz (3.7 kB view details)

Uploaded Source

File details

Details for the file pandas_transform_checker-0.1.1.tar.gz.

File metadata

  • Download URL: pandas_transform_checker-0.1.1.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.7

File hashes

Hashes for pandas_transform_checker-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2a1f55f591d232353fe153b9da5cca410c0bb9b4f301852169534ed89b8769de
MD5 67295be1d52594e66acdc8ea0cbd6d00
BLAKE2b-256 c5c9d8b9e8afc30b0215589ab2677273b754c0e589e925e86d2b8f1ed451b88e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page