Skip to main content

Define input and output columns for functions working on pandas dataframes.

Project description

pandas-contract

Provide decorators to check functions arguments and return values using pandas DataFrame.

The decorators utilize the pandera.io library to validate data types and constraints of the input arguments and output values of functions.

Documentation

Full documentation on https://pandas-contract.readthedocs.io/en/latest/

Installation

pip install pandas-contract

Usage

ℹ️ Info: Generally, the standard abbreviations for the package imports are

import pandas as pd
import pandas_contract as pc
import pandera as pa

Setup

See Setup for first-time setup information.

Check Dataframe structure

The following defines a function that takes a DataFrame with a column 'x' of type integer as input and returns a DataFrame with the column 'x' of type string as output.

See pandera.io for the full documentation.

import pandas as pd
import pandas_contract as pc
import pandera as pa

@pc.argument("df", schema=pa.DataFrameSchema({"x": pa.Int}))
@pc.result(schema=pa.DataFrameSchema({"x": pa.String}))
def col_x_to_string(df: pd.DataFrame) -> pd.DataFrame:
    """Convert column x to string"""
    return df.assign(x=df["x"].astype(str))

Retrieve dataframes from a more complex argument

Sometimes the dataframe is not a direct argument of the function, but is part of a more complex argument. In this case, the decorator argument key can be used to specify the key of the dataframe in the argument.

If key is a callable, the If it's a callable, it will be called with the argument and the result will be used as the dataframe. Otherwise, it will be used as a key to retrieve the dataframe from the argument, i.e. arg[key]

Dataframe result is wrapped within another object

import pandas as pd
import pandas_contract as pc

@pc.result(key="data")
def into_dict():
    """Dataframe wrapped in a dict"""
    return dict(data=pd.DataFrame())


@pc.result(key=0)
def into_list():
    """Dataframe wraped in a list"""
    return [pd.DataFrame(), ...]


@pc.result(key=lambda out: out.foo)
def into_object():
    """Dataframe wrapped in an object"""
    class Out:
        foo = pd.DataFrame()
    # result.foo holds the dataframe
    return Out()

Note, if the key is a callable, it must be wrapped in a lambda function, otherwise it will be called with the argument:

import pandas as pd
import pandas_contract as pc
import pandera as pa

def f1():
    ...

# Get the dataframe from the output item `f1`.
# @pc.result(key=f1, schema=pa.DataFrameSchema({"name": pa.String}))  - this will fail
@pc.result(key=lambda res: res[f1], schema=pa.DataFrameSchema({"name": pa.String}))
def return_generators():
    # f1 is a key to a dictionary holding the data frame to be tested.
    return {
        f1: pd.DataFrame([{"name": "f1"}])
    }

Dynamic Arguments and return values

Required columns and arguments can also be specified dynamically using a function that returns a schema.

import pandas as pd
import pandas_contract as pc
import pandera as pa

@pc.argument("df", schema=pa.DataFrameSchema(
    {pc.from_arg("col"): pa.Column()})
)
@pc.result(schema=pa.DataFrameSchema({pc.from_arg("col"): pa.String}))
def col_to_string(df: pd.DataFrame, col: str) -> pd.DataFrame:
    return df.assign(**{col: df[col].astype(str)})

Multiple columns in function argument

The decorator also supports multiple columns from the function argument.

import pandas as pd
import pandas_contract as pc
import pandera as pa

@pc.argument("df", schema=pa.DataFrameSchema(
        {pc.from_arg("cols"): pa.Column()}
    )
)
@pc.result(schema=pa.DataFrameSchema({pc.from_arg("cols"): pa.String}))
def cols_to_string(df: pd.DataFrame, cols: list[str]) -> pd.DataFrame:
    return df.assign(**{col: df[col].astype(str) for col in cols})

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_contract-0.7.0.tar.gz (94.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandas_contract-0.7.0-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file pandas_contract-0.7.0.tar.gz.

File metadata

  • Download URL: pandas_contract-0.7.0.tar.gz
  • Upload date:
  • Size: 94.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pandas_contract-0.7.0.tar.gz
Algorithm Hash digest
SHA256 ba65d4879927c8c95e6dda81b7ab54ede1d55a9247ff81687481d38d346c48af
MD5 bd502ab03087e5bfba84e9f0492d62bf
BLAKE2b-256 650ae76dcd1018f3019219748b4780946775cea4292af2fdeb361c1c7d418527

See more details on using hashes here.

Provenance

The following attestation bundles were made for pandas_contract-0.7.0.tar.gz:

Publisher: python.yml on schollm/pandas-contract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pandas_contract-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_contract-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d044e53bc8dc0052c1172861065502b071b13aff8ad00ad41516f2f01afbf958
MD5 4ea34207053053a0365ccffa16c57fe6
BLAKE2b-256 463e150f72999d138d00df417d7d6bed9018ebf121aa919faff12b0385177b8b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pandas_contract-0.7.0-py3-none-any.whl:

Publisher: python.yml on schollm/pandas-contract

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page