Simple funnel plots
Project description
Funnel plot
Simple funnel plots for visualising sub-group variance.
This package provides simple funnel plots in Python, using Matplotlib. This lets you quickly see whether sub-groups of a population are outliers compared to the full population.
Two methods are provided:
- parametric funnelplot which uses a standard distribution to estimate the intervals of the funnel (usually a normal distribution)
- bootstrap funnelplot which uses bootstrapped percentiles to estimate the intervals of the funnel
A utility function funnel()
to make it easy to plot data by grouping Pandas DataFrames in a Seaborn-like API is provided.
Example
Data of test performance for California schools from pydataset/Caschool
.
funnel(df=data("Caschool"), x="testscr", group="county")
Install
pip install funnelplot
Examples
Full caschool example
# load some example data
import pandas as pd
import matplotlib.pyplot as plt
from pydataset import data
from funnelplot.core import funnel
# create a suitable axis
fig,ax = plt.subplots(figsize=(4,6))
ax.set_frame_on(False)
# funnel plot, using 0.5% -> 99.5% interval
funnel(df=data("Caschool"), x="testscr", group="county", percentage=99.5, error_mode="data")
C:\Users\John\Dropbox\devel\funnelplot\funnelplot\core.py:14: RuntimeWarning: invalid value encountered in true_divide
return band / np.sqrt(group_size)
C:\Users\John\Dropbox\devel\funnelplot\funnelplot\core.py:14: RuntimeWarning: divide by zero encountered in true_divide
return band / np.sqrt(group_size)
# use bootstrap instead of normal fit
fig,ax = plt.subplots(figsize=(5,6))
ax.set_frame_on(False)
funnel(df=data("Caschool"), x='testscr', group="county", bootstrap_mode=True, error_mode="bootstrap")
Synthetic data example
## Synthetic data
import numpy as np
import random
random.seed(2020)
np.random.seed(2020)
groups = []
p_mean, p_std = 0, 1
# random groups, with different sizes, means and std. devs.
for i in range(25):
n_group = np.random.randint(1, 80)
g_std = np.random.uniform(0.1, 4.5)
g_mean = np.random.uniform(-1.9, 0.5)
groups.append(np.random.normal(p_mean + g_mean,
p_std + g_std,
n_group))
ax, fig = plt.subplots(figsize=(9, 4))
funnel_plot(
groups,
labels=[random.choice("abcdefg") * 4 for i in range(len(groups))],
percentage=95,
)
ax, fig = plt.subplots(figsize=(9, 4))
# bootstrap version, using medians instead of means
funnel_plot_bootstrap(
groups,
labels=[random.choice("abcdefg") * 4 for i in range(len(groups))],
percentage=95,
stat=np.median
)
API
-
funnel(df, x, group, bootstrap_mode=False)
show a DataFramedf
as a funnel plot, rendering columnx
and grouping the data bygroup
.Parameters: df: DataFrame The data to be shown. x: string, column name The column of the frame to render as datapoints. group: string, column name The column to group the frame by bootstrap_mode: boolean, optional (default False) If True, uses the funnel_plot_bootstrap() function; otherwise use the parameteric funnel_plot() function **kwargs: passed to funnel_plot() / funnel_plot_bootstrap()
-
funnel_plot(data_groups, ...)
plot a list of arrays as a funnel plot.Parameters: data_groups: list of 1D arrays a list of 1D arrays the individual groups to be analysed. ax: axis, optional an Matplotlib axis to draw onto dist: distribution function, like scipy.stats.norm(0,1) function to use to get the ppf and cdf of for plotting percentage: float, 0.0 -> 100.0 (default 95) percentage of interval enclosed (e.g. percentage=95 will enclose 2.5% to 97.5%) labels: list of strings, optional one label string per group, will be shown only for those groups that lie outside the funnel left_color: matplotlib color, optional (default C1) color to render points to the left of the funnel bounds (negative outliers) right_color: matplotlib color, optional (default C2) color to render points to the right of the funnel bounds (positive outliers) error_mode: string, optional (default "data") For each outlier group, can show: "data": original data values for that group as a dot plot "none": no error bars "bootstrap": 95% bootstrap intervals, as lines "ci": 95% CI intervals, as lines show_rug: boolean, optional (default False): If True, show a rug plot at the bottom of the graph, for the whole group population show_contours: boolean optional (default True) true if additional contours shown
-
funnel_plot_bootstrap(data_groups, ...)
plot a list of arrays as a funnel plot, using bootstrapped intervals instead of a parametric distribution.Parameters: data_groups: list of 1D arrays a list of 1D arrays the individual groups to be analysed. ax: axis, optional an Matplotlib axis to draw onto percentage: float, 0.0 -> 100.0 (default 95) percentage of interval enclosed (e.g. percentage=95 will enclose 2.5% to 97.5%) labels: list of strings, optional one label string per group, will be shown only for those groups that lie outside the funnel left_color: matplotlib color, optional (default C1) color to render points to the left of the funnel bounds (negative outliers) right_color: matplotlib color, optional (default C2) color to render points to the right of the funnel bounds (positive outliers) bootstrap_n: int, optional (default 1000) number of runs in the bootstrap error_mode: string, optional (default "data") For each outlier group, can show: "data": original data values for that group as a dot plot "none": no error bars "bootstrap": 95% bootstrap intervals, as lines "ci": 95% CI intervals, as lines show_rug: boolean, optional (default False): If True, show a rug plot at the bottom of the graph, for the whole group population show_contours: boolean optional (default True) true if additional contours shown stat: function like np.mean, optional statistic to use when plotting the funnel plot
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file funnelplot-0.2.1.tar.gz
.
File metadata
- Download URL: funnelplot-0.2.1.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 959d4d5a9be83bd5d52a5bef6dc016851b2e29bfbce8453a21992d9d42d55187 |
|
MD5 | ae257a1882efd9b5e4fc8280c99d3ea2 |
|
BLAKE2b-256 | dec2da5118d1ba413ce7425349d9c7fcdabac5c0681db4ccb3e3a1ba08490412 |
File details
Details for the file funnelplot-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: funnelplot-0.2.1-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 381b2268f421063b4ef86ea4b2d23b6d2c0d55b27610168f754a7ced903db56e |
|
MD5 | 37bcc284f299412707e5c508c288ee95 |
|
BLAKE2b-256 | 9ef03d159f21f7e95827afaa6145dbbacc83352b24270173f2eaf1c344e8891a |