Skip to main content

Simple funnel plots

Project description

Funnel plot

Simple funnel plots for visualising sub-group variance.

This package provides simple funnel plots in Python, using Matplotlib. This lets you quickly see whether sub-groups of a population are outliers compared to the full population.

Two methods are provided:

  • parametric funnelplot which uses a standard distribution to estimate the intervals of the funnel (usually a normal distribution)
  • bootstrap funnelplot which uses bootstrapped percentiles to estimate the intervals of the funnel

A utility function funnel() to make it easy to plot data by grouping Pandas DataFrames in a Seaborn-like API is provided.

Example

Data of test performance for California schools from pydataset/Caschool.

funnel(df=data("Caschool"), x="testscr", group="county")

Install

pip install funnelplot

Examples

Full caschool example

# load some example data
import pandas as pd
import matplotlib.pyplot as plt
from pydataset import data
from funnelplot.core import funnel

# create a suitable axis
fig,ax = plt.subplots(figsize=(4,6))
ax.set_frame_on(False)

# funnel plot, using 0.5% -> 99.5% interval
funnel(df=data("Caschool"), x="testscr", group="county", percentage=99.5, error_mode="data")
C:\Users\John\Dropbox\devel\funnelplot\funnelplot\core.py:14: RuntimeWarning: invalid value encountered in true_divide
  return band / np.sqrt(group_size)
C:\Users\John\Dropbox\devel\funnelplot\funnelplot\core.py:14: RuntimeWarning: divide by zero encountered in true_divide
  return band / np.sqrt(group_size)

png

# use bootstrap instead of normal fit
fig,ax = plt.subplots(figsize=(5,6))
ax.set_frame_on(False)
funnel(df=data("Caschool"), x='testscr', group="county", bootstrap_mode=True, error_mode="bootstrap")

png

Synthetic data example

## Synthetic data
import numpy as np
import random
random.seed(2020)
np.random.seed(2020)
groups = []
p_mean, p_std = 0, 1
# random groups, with different sizes, means and std. devs.
for i in range(25):
    n_group = np.random.randint(1, 80)
    g_std =  np.random.uniform(0.1, 4.5) 
    g_mean = np.random.uniform(-1.9, 0.5)
    groups.append(np.random.normal(p_mean + g_mean,
                                   p_std + g_std, 
                                   n_group))
ax, fig = plt.subplots(figsize=(9, 4))
funnel_plot(
    groups,
    labels=[random.choice("abcdefg") * 4 for i in range(len(groups))],
    percentage=95,
)

png

ax, fig = plt.subplots(figsize=(9, 4))
# bootstrap version, using medians instead of means
funnel_plot_bootstrap(
    groups,
    labels=[random.choice("abcdefg") * 4 for i in range(len(groups))],
    percentage=95,
    stat=np.median
)

png

API

  • funnel(df, x, group, bootstrap_mode=False) show a DataFrame df as a funnel plot, rendering column x and grouping the data by group.

      Parameters:
          df: DataFrame
              The data to be shown.
          x:  string, column name
              The column of the frame to render as datapoints.
          group: string, column name
              The column to group the frame by
          bootstrap_mode: boolean, optional (default False)
              If True, uses the funnel_plot_bootstrap() function; otherwise
              use the parameteric funnel_plot() function
          **kwargs:
              passed to funnel_plot() / funnel_plot_bootstrap()
    
  • funnel_plot(data_groups, ...) plot a list of arrays as a funnel plot.

      Parameters:
          data_groups: list of 1D arrays
              a list of 1D arrays the individual groups to be analysed.
          ax: axis, optional
              an Matplotlib axis to draw onto
          dist: distribution function, like scipy.stats.norm(0,1)
              function to use to get the ppf and cdf of for plotting
          percentage: float, 0.0 -> 100.0 (default 95)
              percentage  of interval enclosed (e.g. percentage=95 will enclose 2.5% to 97.5%)
          labels: list of strings, optional
              one label string per group, will be shown only for those groups that lie outside the funnel
          left_color: matplotlib color, optional (default C1)
              color to render points to the left of the funnel bounds (negative outliers)
          right_color: matplotlib color, optional (default C2)
              color to render points to the right of the funnel bounds (positive outliers)        
          error_mode: string, optional (default "data")
              For each outlier group, can show:
                  "data": original data values for that group as a dot plot
                  "none": no error bars
                  "bootstrap": 95% bootstrap intervals, as lines
                  "ci": 95% CI intervals, as lines
          show_rug: boolean, optional (default False):
              If True, show a rug plot at the bottom of the graph, for
              the whole group population
          show_contours: boolean optional (default True)
              true if additional contours shown
    
  • funnel_plot_bootstrap(data_groups, ...) plot a list of arrays as a funnel plot, using bootstrapped intervals instead of a parametric distribution.

      Parameters:
          data_groups: list of 1D arrays
              a list of 1D arrays the individual groups to be analysed.
          ax: axis, optional
              an Matplotlib axis to draw onto
          percentage: float, 0.0 -> 100.0 (default 95)
              percentage  of interval enclosed (e.g. percentage=95 will enclose 2.5% to 97.5%)
          labels: list of strings, optional
              one label string per group, will be shown only for those groups that lie outside the funnel
          left_color: matplotlib color, optional (default C1)
              color to render points to the left of the funnel bounds (negative outliers)
          right_color: matplotlib color, optional (default C2)
              color to render points to the right of the funnel bounds (positive outliers)
          bootstrap_n: int, optional (default 1000)
              number of runs in the bootstrap
          error_mode: string, optional (default "data")
              For each outlier group, can show:
                  "data": original data values for that group as a dot plot
                  "none": no error bars
                  "bootstrap": 95% bootstrap intervals, as lines
                  "ci": 95% CI intervals, as lines
          show_rug: boolean, optional (default False):
              If True, show a rug plot at the bottom of the graph, for
              the whole group population            
          show_contours: boolean optional (default True)
              true if additional contours shown
          stat: function like np.mean, optional
              statistic to use when plotting the funnel plot  
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

funnelplot-0.2.1.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

funnelplot-0.2.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file funnelplot-0.2.1.tar.gz.

File metadata

  • Download URL: funnelplot-0.2.1.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for funnelplot-0.2.1.tar.gz
Algorithm Hash digest
SHA256 959d4d5a9be83bd5d52a5bef6dc016851b2e29bfbce8453a21992d9d42d55187
MD5 ae257a1882efd9b5e4fc8280c99d3ea2
BLAKE2b-256 dec2da5118d1ba413ce7425349d9c7fcdabac5c0681db4ccb3e3a1ba08490412

See more details on using hashes here.

File details

Details for the file funnelplot-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: funnelplot-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for funnelplot-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 381b2268f421063b4ef86ea4b2d23b6d2c0d55b27610168f754a7ced903db56e
MD5 37bcc284f299412707e5c508c288ee95
BLAKE2b-256 9ef03d159f21f7e95827afaa6145dbbacc83352b24270173f2eaf1c344e8891a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page