Skip to main content

pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests

Project description

PyPi CI GitHub stars DOI

:hatching_chick: pecking identifies the set of lowest-ranked groups and set of highest-ranked groups in a dataset using nonparametric statistical tests.

Install

python3 -m pip install pecking

Example Usage

>>> import pecking
>>> samples = [[1, 2, 3, 4, 5], [2, 3, 4, 4, 4], [8, 9, 7, 6, 4]]
>>> labels = ['Group 1', 'Group 2', 'Group 3']
>>> pecking.skim_highest(samples, labels)
['Group 1']

import functools
from matplotlib import pyplot as plt
import pecking
import seaborn as sns

g = peckplot(
    sns.load_dataset("titanic"),
    score="age",
    x="who",
    y="age",
    hue="class",
    col="survived",
    legend_kws=dict(prop={"size": 8}, bbox_to_anchor=(0.88, 0.5)),
    skimmers=(
        functools.partial(
            skim_highest, alpha=0.05, min_obs=8, nan_policy="omit"
        ),
        functools.partial(
            skim_lowest, alpha=0.05, min_obs=8, nan_policy="omit"
        ),
    ),
    skim_labels=["Oldest", "Youngest"],
    palette=sns.color_palette("tab10")[:3],
)
assert g is not None
g.map_dataframe(
    sns.stripplot,
    x="who",
    y="age",
    hue="class",
    s=2,
    color="black",
    dodge=True,
    jitter=0.3,
)

plt.show()

Example Plot

API

See function docstrings for full parameter and return value descriptions.

pecking.skim_lowest/pecking.skim_highest

Direct interface to the underlying statistical tests.

def skim_highest(
    samples: typing.Sequence[typing.Sequence[float]],
    labels: typing.Optional[typing.Sequence[typing.Union[str, int]]] = None,
    alpha: float = 0.05,
) -> typing.List[typing.Union[str, int]]:
    """Identify the set of highest-ranked groups that are statistically
    indistinguishable amongst themselves based on a Kruskal-Wallis H-test
    followed by multiple Mann-Whitney U-tests."""
def skim_highest(
    samples: typing.Sequence[typing.Sequence[float]],
    labels: typing.Optional[typing.Sequence[typing.Union[str, int]]] = None,
    alpha: float = 0.05,
) -> typing.List[typing.Union[str, int]]:
    """Identify the set of lowest-ranked groups that are statistically
    indistinguishable amongst themselves based on a Kruskal-Wallis H-test
    followed by multiple Mann-Whitney U-tests."""

pecking.mask_skimmed_rows

Tidy-data interface to calculate the results of skim_lowest/skim_highest among row groups in a DataFrame.

def mask_skimmed_rows(
    data: pd.DataFrame,
    score: str,
    groupby_inner: typing.Union[typing.Sequence[str], str],
    groupby_outer: typing.Union[typing.Sequence[str], str] = tuple(),
    skimmer: typing.Callable = skim_highest,
    **kwargs: dict,
) -> pd.Series:
    """Create a boolean mask for a DataFrame, identifying rows within
    significantly outstanding groups.

    This function applies a two-level grouping to the input DataFrame: an outer
    grouping ('groupby_outer') followed by an inner grouping ('groupby_inner').
    For each inner group, it uses a 'skimmer' function to determine which rows
    are part of significantly outstanding groups based on a specified 'score'
    column. Only inner groups within the same outer group are compared.

    Rows identified as members of significantly outstanding inner groups are
    marked True in the returned Series, while all others are marked False."""

pecking.peckplot

Wraps seaborn.catplot to add hatched backgrounds behind the best and worst groups within the each row/col facet. (Comparison scope/pooling can be controlled with *_group parameters.)

def peckplot(
    data: pd.DataFrame,
    score: str,
    x: typing.Optional[str] = None,
    y: typing.Optional[str] = None,
    hue: typing.Optional[str] = None,
    col: typing.Optional[str] = None,
    row: typing.Optional[str] = None,
    x_group: typing.Literal["inner", "outer", "ignore"] = "inner",
    y_group: typing.Literal["inner", "outer", "ignore"] = "inner",
    hue_group: typing.Literal["inner", "outer", "ignore"] = "inner",
    col_group: typing.Literal["inner", "outer", "ignore"] = "outer",
    row_group: typing.Literal["inner", "outer", "ignore"] = "outer",
    skimmers: typing.Sequence[typing.Callable] = (
        functools.partial(skim_highest, alpha=0.05),
        functools.partial(skim_lowest, alpha=0.05),
    ),
    skim_hatches: typing.Sequence[str] = ("*", "O.", "xx", "++"),
    skim_labels: typing.Sequence[str] = ("Best", "Worst"),
    skim_title: typing.Optional[str] = "Rank",
    orient: typing.Literal["v", "h"] = "v",
    **kwargs: dict,
) -> sns.FacetGrid:
    """Boxplot the distribution of a score across various categories,
    highlighting the best (and/or worst) performing groups.

    Uses nonparametric `skim_highest`/`skim_lowest` to distinguish the sets of
    groups with statistically indistinguishable highest/lowest scores. Uses
    `backstrip`'s `backplot` to add hatched backgrounds behind the best and
    worst groups."""

Citing

If pecking contributes to a scientific publication, please cite it as

Matthew Andres Moreno. (2024). mmore500/pecking. Zenodo. https://doi.org/10.5281/zenodo.10701185

@software{moreno2024pecking,
  author = {Matthew Andres Moreno},
  title = {mmore500/pecking},
  month = feb,
  year = 2024,
  publisher = {Zenodo},
  doi = {10.5281/zenodo.10701185},
  url = {https://doi.org/10.5281/zenodo.10701185}
}

Consider also citing matplotlib, seaborn, and SciPy. And don't forget to leave a star on GitHub!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pecking-0.2.2.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

pecking-0.2.2-py2.py3-none-any.whl (12.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file pecking-0.2.2.tar.gz.

File metadata

  • Download URL: pecking-0.2.2.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for pecking-0.2.2.tar.gz
Algorithm Hash digest
SHA256 6885168f3969a2f1dfbbf51e850207fed37eca9e7d0f46adb28c10b535169155
MD5 8eced8d6c6be08022b5379c3a1711932
BLAKE2b-256 dcb36a5d610a540c1dcf75c67c2afdb76070e09bdc488d09ccf82bfdeda15766

See more details on using hashes here.

File details

Details for the file pecking-0.2.2-py2.py3-none-any.whl.

File metadata

  • Download URL: pecking-0.2.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for pecking-0.2.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 10326e4aa8a28f587b511dbf5df4fa5ab4bda8f59f03b4c8fd704247441c3447
MD5 02bd2e7b099dc413c39b8665d567261e
BLAKE2b-256 8157d3d97e6e7fd759f661994f2ba1bee576303f9c33f412e05797fb4bb1e980

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page