Skip to main content

Structural type checking for Pandas data frames.

Project description

Pandas Type Checks

Build Status Quality Gate Status Coverage PyPI Version PyPI Wheel

A Python library providing means for structural type checking of Pandas data frames and series:

  • A decorator pandas_type_check for specifying and checking the structure of Pandas DataFrame and Series arguments and return values of a function.
  • Support for "non-strict" type checking. In this mode data frames can contain columns which are not part of the type specification against which they are checked. Non-strict type checking in that sense allows a form of structural subtyping for data frames.
  • Configuration options to raise exceptions for type errors or alternatively log them.
  • Configuration option to globally enable/disable the type checks. This allows users to enable the type checking functionality in e.g. only testing environments.

This library focuses on providing utilities to check the structure (i.e. columns and their types) of Pandas data frames and series arguments and return values of functions. For checking individual data frame and series values, including formulating more sophisticated constraints on column values, Pandera is a great alternative.

Installation

Packages for all released versions are available at the Python Package Index (PyPI) and can be installed with pip:

pip install pandas-type-checks

The library can also be installed with support for additional functionality:

pip install pandas-type-checks[pandera] # Support for Pandera data frame and series schemas

Usage Example

The function filter_rows_and_remove_column is annotated with type check hints for the Pandas DataFrame and Series arguments and return value of the function:

import pandas as pd
import numpy as np
import pandas_type_checks as pd_types

@pd_types.pandas_type_check(
    pd_types.DataFrameArgument('data', {
        'A': np.dtype('float64'),
        'B': np.dtype('int64'),
        'C': np.dtype('bool')
    }),
    pd_types.SeriesArgument('filter_values', 'int64'),
    pd_types.DataFrameReturnValue({
        'B': np.dtype('int64'),
        'C': np.dtype('bool')
    })
)
def filter_rows_and_remove_column(data: pd.DataFrame, filter_values: pd.Series) -> pd.DataFrame:
    return data[data['B'].isin(filter_values.values)].drop('A', axis=1)

Applying the function filter_rows_and_remove_column to a filter values Series with the wrong type will result in a TypeError exception with a detailed type error message:

test_data = pd.DataFrame({
    'A': pd.Series(1, index=list(range(4)), dtype='float64'),
    'B': np.array([1, 2, 3, 4], dtype='int64'),
    'C': np.array([True] * 4, dtype='bool')
})
test_filter_values_with_wrong_type = pd.Series([3, 4], dtype='int32')

filter_rows_and_remove_column(test_data, test_filter_values_with_wrong_type)
TypeError: Pandas type error in function 'filter_rows_and_remove_column'
Type error in argument 'filter_values':
	Expected Series of type 'int64' but found type 'int32'

Applying the function filter_rows_and_remove_column to a data frame with a wrong column type and a missing column will result in a TypeError exception with a detailed type error message:

test_data_with_wrong_type_and_missing_column = pd.DataFrame({
    'A': pd.Series(1, index=list(range(4)), dtype='float64'),
    'B': np.array([1, 2, 3, 4], dtype='int32')
})
test_filter_values = pd.Series([3, 4], dtype='int64')

filter_rows_and_remove_column(test_data_with_wrong_type_and_missing_column, test_filter_values)
TypeError: Pandas type error in function 'filter_rows_and_remove_column'
Type error in argument 'data':
    Expected type 'int64' for column B' but found type 'int32'
    Missing column in DataFrame: 'C'
Type error in return value:
    Expected type 'int64' for column B' but found type 'int32'
    Missing column in DataFrame: 'C'

Configuration

The global configuration object pandas_type_checks.config can be used to configure the behavior of the library:

  • config.enable_type_checks (bool): Flag for enabling/disabling type checks for specified arguments and return values. This flag can be used to globally enable or disable the type checker in certain environments.

    Default: True

  • config.strict_type_checks (bool): Flag for strict type check mode. If strict type checking is enabled data frames cannot contain columns which are not part of the type specification against which they are checked. Non-strict type checking in that sense allows a form of structural subtyping for data frames.

    Default: False

  • config.log_type_errors (bool): Flag indicating that type errors for Pandas dataframes or series values should be logged instead of raising a TypeError exception. Type errors will be logged with log level ERROR.

    Default: False

  • config.logger (logging.Logger): Logger to be used for logging type errors when the log_type_errors flag is enabled. When no logger is specified via the configuration a built-in default logger is used.

Pandera Support

This library can be installed which additional support for Pandera:

pip install pandas-type-checks[pandera]

In this case Pandera DataFrameSchema and SeriesSchema can be used as type specifications for data frame and series arguments and return values.

import pandas as pd
import pandera as pa
import numpy as np
import pandas_type_checks as pd_types

@pd_types.pandas_type_check(
    pd_types.DataFrameArgument('data',
                               pa.DataFrameSchema({
                                 'A': pa.Column(np.dtype('float64'), checks=pa.Check.le(10.0)),
                                 'B': pa.Column(np.dtype('int64'), checks=pa.Check.lt(2)),
                                 'C': pa.Column(np.dtype('bool'))
                               })),
    pd_types.SeriesArgument('filter_values', 'int64'),
    pd_types.DataFrameReturnValue({
        'B': np.dtype('int64'),
        'C': np.dtype('bool')
    })
)
def filter_rows_and_remove_column(data: pd.DataFrame, filter_values: pd.Series) -> pd.DataFrame:
    return data[data['B'].isin(filter_values.values)].drop('A', axis=1)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_type_checks-1.1.3.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

pandas_type_checks-1.1.3-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file pandas_type_checks-1.1.3.tar.gz.

File metadata

  • Download URL: pandas_type_checks-1.1.3.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for pandas_type_checks-1.1.3.tar.gz
Algorithm Hash digest
SHA256 02127fb0b85caf681eb31e1293bf110c558888abf39c3891ceb2bfaca0e50fee
MD5 a5aa8fcd91cd85e137893bb84791586d
BLAKE2b-256 f398e50baa275200cd86bbaa6eb761de96a23bfd2d5de6686727ec89098e2157

See more details on using hashes here.

File details

Details for the file pandas_type_checks-1.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_type_checks-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 05101d590f7f2feac9109d2967a32a740747531af1f5b8f7bcea1d3cd1aeeed7
MD5 dd7266664c7facca57cc3272de5b5247
BLAKE2b-256 ece992b77a8d9e2a83e6f074b659d9794def1649d14b271dcbe3145f8def2f5a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page