Skip to main content

A set of utilities for running and evaluating experiments at Greenhouse. Primarily designed to install in a Mode Analytics Python notebook.

Project description

Greenhouse Data Utilities

The Greenhouse Data Utilities package includes a series of tools to streamline the evaluation of several common experimental designs. It's designed and maintained by the data science team at Greenhouse Software with the intention of installing in a Mode Analytics Python notebook.

Please note that this package was designed for internal use by the data science team. You're welcome to use it, but we will prioritize the experimentation needs of our team when reading through issues/feature requests.

Sub-modules

  • gh_data_utils.data_visualization
  • gh_data_utils.stat_tests

data_visualization

Functions for generating visual representations of statistical tests.

get_overlapping_distributions

Generates a Seaborn plot with overlapping distributions between two or more groups in order to visualize potential differences that may be detected with parametric non-parametric statistical tests. These can be used in the final presentation of results.

Parameters
  • data: pandas dataframe, required

    Dataframe used to generate charts. Data should be in a tidy format.

  • groups: str, required

    The name of the column in data by which you are grouping results.

  • data_col: str, required

    The name of the column in data with continuous data to compare by group.

  • order: list, optional; default = None

    The order you want groups to appear in the graphs (i.e., Before and After). Must match a list of distinct values in the groups parameter.

  • x-label: str, optional; default = '' (empty)

    x-axis label for the chart.

  • y-label: str, optional; default = '' (empty)

    y-axis label for the chart.

  • title: str, optional; default = '' (empty)

    Title for the chart.

  • bins: int, required; default = 20

    Number of bins for the distributions.

  • tick_format: str; options = 'int, pct; default = 'int'

    A string indicating the tick format for the graph axes based on the type of data used.

get_barplot

Generates a Seaborn barplot with error bars in order to visualize mean differences and confidence intervals for those differences between two or more groups. These can be used in the final presentation of results.

Parameters
  • data: pandas dataframe; required

    Dataframe used to generate charts. Data should be in a tidy format.

  • groups: str; required

    The name of the column in data by which you are grouping results.

  • data_col: str; required

    The name of the column in data with continuous data to compare by group.

  • ci: int; default = 95

    Confidence intervals for the error bars.

  • order: list; optional; default = None

    The order you want groups to appear in the graphs (i.e., Before and After). Must match a list of distinct values in the groups parameter.

  • hue: str; optional; default = None

    Seaborn plot hue in order to generate grouped comparisons.

  • x-label: str; default = '' (empty)

    x-axis label for the chart.

  • y-label: str; default = '' (empty)

    y-axis label for the chart.

  • title: str; default = '' (empty)

    Title for the chart.

  • palette: optional; default = None

    Seaborn palette for the chart.

    • tick_format: str; options = 'int, pct; default = 'int'

      A string indicating the tick format for the graph axes based on the type of data used.

stat_tests

Functions for conducting the appropriate parametric and non-parametric statistical test based on the specified experimental design and number of groups. We recommend conducting an a priori power analysis to ensure each group you're comparing has a sufficient sample size.

run_stat_test

Conduct a parametric and non-parametric statistical test for two or more groups. The type of test based on number of groups is determined automatically.

Parameters
  • data: pandas dataframe; required

    Dataframe used to generate charts. Data should be in a tidy format.

  • groups: str; required

    The name of the column in data by which you are grouping results.

  • data_col: str; required

    The name of the column in data with continuous data to compare by group.

  • index: str; required

    The dataframe column to use as an index when shaping data for a specific test. This should be your unit of comparison (i.e., user, day, event, etc).

  • dimensions: list; optional; default = None

    A subset of groups to use in statistical comparisons. If this isn't specified, dimensions for statistical tests will be a list of distinct values in groups.

  • comparison: str; required; options = 'ind, rep; default = 'ind'

    The type of experimental design (independent or repeated measures).

  • description: str; optional; default = ''

    A description of the statistical test to include at the top of the summary of results. Defaults to none, and uses the standard output from each test.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gh-data-utils-0.0.1.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gh_data_utils-0.0.1-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file gh-data-utils-0.0.1.tar.gz.

File metadata

  • Download URL: gh-data-utils-0.0.1.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.4

File hashes

Hashes for gh-data-utils-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ced84a9e8c0ce05a61e78db6d066ffe760443dc5f6e47d80d07f2b34a2617045
MD5 c7d0fdaa305bd1dde1523578c1f017db
BLAKE2b-256 695d92bf26d0214c48a7bd2a051152afdde1067b2cbcd58079e140dc16203a33

See more details on using hashes here.

File details

Details for the file gh_data_utils-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: gh_data_utils-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.4

File hashes

Hashes for gh_data_utils-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e30588dd99546ccbfbbea9057d570d0ab86c2bab8e68d3ef2560fad2040dd8e3
MD5 e8734b99dbe1c4ec9f2a7868dbec7d8e
BLAKE2b-256 126ecc30572ae5e06bb377029be84f805757a714faf6e8eff5c7f43769c55d3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page