Skip to main content

Function decorators for Pandas Dataframe column name and data type validation

Project description

DAFFY DataFrame Column Validator

PyPI PyPI - Python Version test codecov Code style: black

Description

In projects using Pandas, it's very common to have functions that take Pandas DataFrames as input or produce them as output. It's hard to figure out quickly what these DataFrames contain. This library offers simple decorators to annotate your functions so that they document themselves and that documentation is kept up-to-date by validating the input and output on runtime.

For example,

@df_in(columns=["Brand", "Price"])     # the function expects a DataFrame as input parameter with columns Brand and Price
@df_out(columns=["Brand", "Price"])    # the function will return a DataFrame with columns Brand and Price
def filter_cars(car_df):
    # before this code is executed, the input DataFrame is validated according to the above decorator
    # filter some cars..
    return filtered_cars_df

Table of Contents

Installation

Install with your favorite Python dependency manager like

pip install daffy

Usage

Start by importing the needed decorators:

from daffy import df_in, df_out

To check a DataFrame input to a function, annotate the function with @df_in. For example the following function expects to get a DataFrame with columns Brand and Price:

@df_in(columns=["Brand", "Price"])
def process_cars(car_df):
    # do stuff with cars

If your function takes multiple arguments, specify the field to be checked with it's name:

@df_in(name="car_df", columns=["Brand", "Price"])
def process_cars(year, style, car_df):
    # do stuff with cars

To check that a function returns a DataFrame with specific columns, use @df_out decorator:

@df_out(columns=["Brand", "Price"])
def get_all_cars():
    # get those cars
    return all_cars_df

In case one of the listed columns is missing from the DataFrame, a helpful assertion error is thrown:

AssertionError("Column Price missing from DataFrame. Got columns: ['Brand']")

To check both input and output, just use both annotations on the same function:

@df_in(columns=["Brand", "Price"])
@df_out(columns=["Brand", "Price"])
def filter_cars(car_df):
    # filter some cars
    return filtered_cars_df

If you want to also check the data types of each column, you can replace the column array:

columns=["Brand", "Price"]

with a dict:

columns={"Brand": "object", "Price": "int64"}

This will not only check that the specified columns are found from the DataFrame but also that their dtype is the expected. In case of a wrong dtype, an error message similar to following will explain the mismatch:

AssertionError("Column Price has wrong dtype. Was int64, expected float64")

To quickly check what the incoming and outgoing dataframes contain, you can add a @df_log annotation to the function. For example adding @df_log to the above filter_cars function will product log lines:

Function filter_cars parameters contained a DataFrame: columns: ['Brand', 'Price']
Function filter_cars returned a DataFrame: columns: ['Brand', 'Price']

or with @df_log(include_dtypes=True) you get:

Function filter_cars parameters contained a DataFrame: columns: ['Brand', 'Price'] with dtypes ['object', 'int64']
Function filter_cars returned a DataFrame: columns: ['Brand', 'Price'] with dtypes ['object', 'int64']

Contributing

Contributions are accepted. Include tests in PR's.

Development

To run the tests, clone the repository, install dependencies with Poetry and run tests with PyTest:

poetry install
poetry shell
pytest

To enable linting on each commit, run pre-commit install. After that, your every commit will be checked with isort, black and flake8.

License

MIT

Changelog

0.4.2

  • Added docstrings for the decorators
  • Fix import of @df_log

0.4.1

  • Add include_dtypes parameter for @df_log.
  • Fix handling of empty signature with @df_in.

0.4.0

  • Added @df_log for logging.
  • Improved assertion messages.

0.3.0

  • Added type hints.

0.2.1

  • Added Pypi classifiers.

0.2.0

  • Fixed decorator usage.
  • Added functools wraps.

0.1.0

  • Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daffy-0.4.2.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

daffy-0.4.2-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file daffy-0.4.2.tar.gz.

File metadata

  • Download URL: daffy-0.4.2.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.8.5 Linux/4.19.84-microsoft-standard

File hashes

Hashes for daffy-0.4.2.tar.gz
Algorithm Hash digest
SHA256 651cd6be16a41c4f0073962f7bbe54272f8c506b70df8ebfae38dfac951b9248
MD5 5535347332b17578cf0dbfefdbb90096
BLAKE2b-256 dd19e5f81c2ffa28bc34fbeb80bed22c2de3a0980758c7973f51dd34f9678157

See more details on using hashes here.

File details

Details for the file daffy-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: daffy-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.8.5 Linux/4.19.84-microsoft-standard

File hashes

Hashes for daffy-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c4b5988ccff4e1f766cd569ff455c9706665bf2aa4c348e99bb8252637c3bbd0
MD5 77a02a565deb29ebe619934c833d28eb
BLAKE2b-256 d320f8aac9a13a381c115834e18faa573074cd505435ab869667b2e5418f4446

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page