Python port of TheseusPlot for decomposing differences in rate metrics.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hoxo_m

These details have not been verified by PyPI

Project description

TheseusPlot: Visualizing Decomposition of Differences in Rate Metrics

1. Overview

In data analysis, when a metric differs between two groups, we sometimes want to investigate whether a particular subgroup is driving that difference. For example, when a key metric decline is detected compared to the previous year, you may want to conduct a more detailed analysis. In this analysis, you may focus on gender among the attributes and examine whether the decline occurred among male, female, or both. However, this type of analysis is challenging when the metric is a rate, because the magnitude of each subgroup’s contribution to the rate cannot be simply calculated, unlike in the case of volume metrics.

To address this issue, we propose an approach inspired by the story of the Ship of Theseus. This approach involves gradually replacing the components of one group with those of another, recalculating the metric at each step. The change in the metric at each step can then be interpreted as the contribution of each subgroup to the overall difference.

For instance, suppose the metric was 6.2% in 2024 and decreased to 5.2% in 2025. Again, we focus on gender. We replace the male data within the 2024 dataset with the male data from 2025 and recalculate the metric. As a result, the metric would drop by 0.8 percentage points, reaching 5.4%. In this case, the contribution of the male group to the change in the metric is -0.8 percentage points. Next, we replace the female data from 2024 with that from 2025. The dataset then consists entirely of 2025 data, and the metric drops by 0.2 percentage points, reaching 5.2%. Thus, the contribution of the female group is -0.2 percentage points.

When visualized, the results appear as follows:

From this plot, we can see that the decline in the metric is primarily driven by the male group. We call this visualization the “Theseus Plot.”

The TheseusPlot package is designed to make it easy to generate Theseus Plots for various attributes.

2. Installation

You can install the theseusplot package from PyPI with:

python -m pip install theseusplot

You can install the optional dependencies for examples and documentation data with:

python -m pip install "theseusplot[examples]"

You can install the development version from GitHub with:

python -m pip install "git+https://github.com/hoxo-m/TheseusPlot_py.git"

3. Details

3.1 Prepare Data

To create Theseus plots, you need two data frames that share common columns.

We use the 2013 New York City flight data from nycflights13 as a demo dataset. Here, we will define the rate metric as the proportion of flights that arrived on time. In December 2013, the on-time arrival rate dropped substantially compared to November. We investigate the cause using a Theseus plot.

First, we create an on_time column in the data frame to indicate whether each flight arrived on time. Next, we extract the flights for November and December into separate data frames to form two comparison groups. The on-time arrival rate was 64% in November and dropped to 47% in December.

from nycflights13 import airlines, flights

data = (
    flights.dropna(subset=["arr_delay"])
    .assign(on_time=lambda df: df["arr_delay"] <= 15)
    .merge(airlines, on="carrier")
    .assign(carrier=lambda df: df["name"])
    .loc[
        :,
        [
            "year",
            "month",
            "day",
            "origin",
            "dest",
            "carrier",
            "dep_delay",
            "on_time",
        ],
    ]
)

print(data.head())
#>    year  month  day origin dest                 carrier  dep_delay  on_time
#> 0  2013      1    1    EWR  IAH   United Air Lines Inc.        2.0     True
#> 1  2013      1    1    LGA  IAH   United Air Lines Inc.        4.0    False
#> 2  2013      1    1    JFK  MIA  American Airlines Inc.        2.0    False
#> 3  2013      1    1    JFK  BQN         JetBlue Airways       -1.0     True
#> 4  2013      1    1    LGA  ATL    Delta Air Lines Inc.       -6.0     True

data_nov = data[data["month"] == 11]
data_dec = data[data["month"] == 12]

print(data_nov["on_time"].mean())
#> 0.8264802936487339
print(data_dec["on_time"].mean())
#> 0.6738712065136936

3.2 Basics

Using the two prepared data frames, we first create a ship object. The ship object is an instance of the Python class ShipOfTheseus, designed to create Theseus plots.

from theseusplot import create_ship

ship = create_ship(
    data_nov,
    data_dec,
    y="on_time",
    labels=("November", "December"),
)

You can create a Theseus plot by passing column names to the plot method of a ship object. For example, to create a Theseus plot for the airport of origin:

fig, ax = ship.plot("origin")
fig.show()

New York City has three major airports, and Newark Liberty International Airport (EWR) accounted for the largest share of the decline in the on-time arrival rate.

Note that the number of flights at each airport matters, as a larger flight volume is expected to have a greater impact. To make this clear, the Theseus plot displays the data size for each group within each subgroup as a bar chart. From this, we see that the number of flights is similar across airports, allowing for direct comparison of contributions.

In summary, a Theseus plot consists of two components:

A waterfall plot showing how much each subgroup contributed to the change in the metric.
A bar chart representing the sample size for each group within each subgroup.

A ship object also provides the table method to inspect the exact values used in the Theseus plot.

ship.table("origin")
#>   origin   contrib    n1    n2    x1    x2     rate1     rate2
#> 0    EWR -0.071873  9603  9410  7995  5910  0.832552  0.628055
#> 1    JFK -0.050249  8645  8923  7290  6142  0.843262  0.688334
#> 2    LGA -0.030487  8723  8687  7006  6156  0.803164  0.708645

3.3 Flipping the Plot

When there are many subgroups, a Theseus plot can become hard to read. In such cases, you can swap the x- and y-axes for better visualization.

fig, ax = ship.plot_flip("carrier")
fig.show()

When the number of subgroups is large, those with small contributions are automatically grouped together. By default, this happens when there are more than 10 subgroups, but the threshold can be adjusted with the n argument.

fig, ax = ship.plot_flip("carrier", n=6)
fig.show()

From this plot, JetBlue Airways and United Air Lines appear to have the largest contributions to the decline in on-time arrival rate.

3.4 Automatic Discretization of Continuous Values

Theseus plots do not directly support continuous variables. If a continuous column is provided, it is automatically discretized. For example, we can create a Theseus plot for departure delays.

fig, ax = ship.plot_flip("dep_delay")
fig.show()

By default, continuous variables are discretized so that each subgroup has roughly equal sample sizes, with the number of bins set to 10. You can modify these settings by passing the return value of continuous_config() to the continuous argument.

from theseusplot import continuous_config

fig, ax = ship.plot_flip("dep_delay", continuous=continuous_config(n=3))
fig.show()

This result shows that both a decrease in on-time departures and an increase in delayed departures contributed to the decline in on-time arrival rate.

3.5 Ordering for Factor Columns

If a subgroup column is categorical, table() and plot() respect its category order. This is useful when you want to keep a meaningful predefined order, such as "Low", "Medium", and "High", instead of ordering categories by their contributions.

import pandas as pd
from pandas.api.types import CategoricalDtype

segment_type = CategoricalDtype(
    categories=["Low", "Medium", "High"],
    ordered=True,
)

data1 = pd.DataFrame(
    {
        "segment": pd.Series(
            ["Low", "Low", "Medium", "Medium", "High", "High"],
            dtype=segment_type,
        ),
        "y": [1, 1, 1, 0, 1, 1],
    }
)

data2 = pd.DataFrame(
    {
        "segment": pd.Series(
            ["Low", "Low", "Medium", "Medium", "High", "High"],
            dtype=segment_type,
        ),
        "y": [1, 0, 1, 1, 0, 0],
    }
)

ship = create_ship(data1, data2, y="y", labels=("Group 1", "Group 2"))

print(ship.table("segment"))
#>   segment   contrib  n1  n2  x1  x2  rate1  rate2
#> 0     Low -0.166667   2   2   2   1    1.0    0.5
#> 1  Medium  0.166667   2   2   1   2    0.5    1.0
#> 2    High -0.333333   2   2   2   0    1.0    0.0

fig, ax = ship.plot("segment")
fig.show()

Even if the contribution of "High" is larger than that of "Low" or "Medium", the rows and bars are shown in the order "Low" -> "Medium" -> "High" because segment is categorical.

By contrast, if segment were an object column, the output would be ordered by contribution rather than by a predefined level order.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hoxo_m

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

May 9, 2026

0.1.1

May 9, 2026

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

theseusplot-0.1.2.tar.gz (303.8 kB view details)

Uploaded May 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

theseusplot-0.1.2-py3-none-any.whl (15.2 kB view details)

Uploaded May 9, 2026 Python 3

File details

Details for the file theseusplot-0.1.2.tar.gz.

File metadata

Download URL: theseusplot-0.1.2.tar.gz
Upload date: May 9, 2026
Size: 303.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for theseusplot-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`23e49cae67d1d74a636bfd2d021b9c6ef9de3c88f528ac5d884cdb5d9fd2fc11`
MD5	`9d5eaf7066729472f883efeb651c010e`
BLAKE2b-256	`b71eeb7f13b778daa22850d463d1d626ba3dbfe744d6023635c8326351087417`

See more details on using hashes here.

Provenance

The following attestation bundles were made for theseusplot-0.1.2.tar.gz:

Publisher: publish.yml on hoxo-m/TheseusPlot_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: theseusplot-0.1.2.tar.gz
- Subject digest: 23e49cae67d1d74a636bfd2d021b9c6ef9de3c88f528ac5d884cdb5d9fd2fc11
- Sigstore transparency entry: 1485825105
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: hoxo-m/TheseusPlot_py@65a9e065ac182e7ff8980c471bcc589153b7f942
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/hoxo-m
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@65a9e065ac182e7ff8980c471bcc589153b7f942
- Trigger Event: release

File details

Details for the file theseusplot-0.1.2-py3-none-any.whl.

File metadata

Download URL: theseusplot-0.1.2-py3-none-any.whl
Upload date: May 9, 2026
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for theseusplot-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`21ce4656b1624654465dc499add5b6e493346acccbc97bb236ec95c6aac46479`
MD5	`1305eadbf531af6105fc2f9b9b4d0eef`
BLAKE2b-256	`17c8890ec7cc985befaeaf8b985ffce386be42f3010881dacf53334f602a04f0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for theseusplot-0.1.2-py3-none-any.whl:

Publisher: publish.yml on hoxo-m/TheseusPlot_py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: theseusplot-0.1.2-py3-none-any.whl
- Subject digest: 21ce4656b1624654465dc499add5b6e493346acccbc97bb236ec95c6aac46479
- Sigstore transparency entry: 1485825124
- Sigstore integration time: May 9, 2026
Source repository:
- Permalink: hoxo-m/TheseusPlot_py@65a9e065ac182e7ff8980c471bcc589153b7f942
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/hoxo-m
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@65a9e065ac182e7ff8980c471bcc589153b7f942
- Trigger Event: release

theseusplot 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

TheseusPlot: Visualizing Decomposition of Differences in Rate Metrics

1. Overview

2. Installation

3. Details

3.1 Prepare Data

3.2 Basics

3.3 Flipping the Plot

3.4 Automatic Discretization of Continuous Values

3.5 Ordering for Factor Columns

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance