Skip to main content

Package to perform comparison between data frames

Project description

input_checker


The input_checker package enables users to compare a given data frame against a benchmark data frame. The package executes that by keeping track of the key information in the benchmark data frame and cross-checking the comparison data frame against those tracked characteristics.

The package currently contains five main checks;

  • Null checker: ensures that columns with missing values in the benchmark data frame are the only columns with missing values in the comparison data frame
  • Dtype checker: ensures that columns in the comparison data frame are of the same data type as in the benchmark data frame
  • Categorical value checker: ensures that categorical columns in the comparison data frame only contain values that exist in the benchmark data frame
  • Numerical checker: ensures that the values of the numerical columns in the comparison data frame lie within the minimum and maximum range of the numerical columns in the benchmark data frame
  • Datetime checker: ensures that the values of datetime columns in the comparison data frame lie beyond the minimum date (optionally maximum) of datetime columns in the benchmark data frame

The package has multiple usage areas including but not limited to ensuring that data points sent to the model in live environment matches key characteristics of the data the model was initially trained on.

Here is a simple example of using input_checker to compare training data to test data;

import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split

import input_checker
from input_checker.checker import InputChecker

# load and prepare sklearn wine dataset
wine = load_wine()

df_wine = pd.DataFrame(wine['data'], columns = wine['feature_names'])
df_wine['target'] = wine['target']

# split into train/test sets
df_train, df_test = train_test_split(df_wine, test_size=0.2)

# define numerical columns 
# please note; the original wine dataset only has numerical fields
# please refer to the example notebook under the examples folder for 
# using input_checker with different dtypes and missing values
numerical_columns = ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium',
       'total_phenols', 'flavanoids', 'nonflavanoid_phenols',
       'proanthocyanins', 'color_intensity', 'hue',
       'od280/od315_of_diluted_wines', 'proline']

# define input_checker
checker = InputChecker(columns=numerical_columns,
                       numerical_columns=numerical_columns) 

# fitting input_checker
checker.fit(df_train)

# compare test data frame to the training data frame
df_test_checked = checker.transform(df_test)

Installation

input_checker can be installed from PyPI simply with;

pip install input_checker

Documentation

Documentation for input_checker can be found on readthedocs.

Examples

To help get started there is an example notebook in the examples folder that shows how to use input_checker.

Build and test

The test framework we are using for this project is pytest, to run the tests follow the steps below.

First clone the repo and move to the root directory;

git clone https://github.com/lvgig/input_checker.git
cd input_checker

Then install input_checker in editable mode;

pip install -e . -r requirements-dev.txt

Then run the tests simply with pytest

pytest

Contribute

input_checker is under active development, we're super excited if you're interested in contributing! See the CONTRIBUTING.md for the full details of our working practices.

For bugs and feature requests please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

input_checker-0.3.9.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

input_checker-0.3.9-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file input_checker-0.3.9.tar.gz.

File metadata

  • Download URL: input_checker-0.3.9.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for input_checker-0.3.9.tar.gz
Algorithm Hash digest
SHA256 958cd4e50412704ac7f8e62f3be808b438504e702ddd6b0a4d123d8715c22589
MD5 eece36443f4b48895cd4ffaa04931092
BLAKE2b-256 117684d7a4c8986906567b08f3a9f915167d257fba063c472085d8da2985d17a

See more details on using hashes here.

File details

Details for the file input_checker-0.3.9-py3-none-any.whl.

File metadata

  • Download URL: input_checker-0.3.9-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for input_checker-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 b2a6c2b7bf90543534d71bd39e67effc973bbe449ebc72c35264379ca1697c9a
MD5 b52733accd5863ae00acd562a0b13fa5
BLAKE2b-256 fa28048e632219c2d0eed47ba75f0b5829d1be343973b6093bcb4430b34f8011

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page