A set of utilities for running and evaluating experiments at Greenhouse. Primarily designed to install in a Mode Analytics Python notebook.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Greenhouse Data Utilities

The Greenhouse Data Utilities package includes a series of tools to streamline the evaluation of several common experimental designs. It's designed and maintained by the data science team at Greenhouse Software with the intention of installing in a Mode Analytics Python notebook.

Please note that this package was designed for internal use by the data science team. You're welcome to use it, but we will prioritize the experimentation needs of our team when reading through issues/feature requests.

Sub-modules

gh_data_utils.data_visualization
gh_data_utils.stat_tests

`data_visualization`

Functions for generating visual representations of statistical tests.

`get_overlapping_distributions`

Generates a Seaborn plot with overlapping distributions between two or more groups in order to visualize potential differences that may be detected with parametric non-parametric statistical tests. These can be used in the final presentation of results.

Parameters

data: pandas dataframe, required

Dataframe used to generate charts. Data should be in a tidy format.
groups: str, required

The name of the column in data by which you are grouping results.
data_col: str, required

The name of the column in data with continuous data to compare by group.
order: list, optional; default = None

The order you want groups to appear in the graphs (i.e., Before and After). Must match a list of distinct values in the groups parameter.
x-label: str, optional; default = '' (empty)

x-axis label for the chart.
y-label: str, optional; default = '' (empty)

y-axis label for the chart.
title: str, optional; default = '' (empty)

Title for the chart.
bins: int, required; default = 20

Number of bins for the distributions.
tick_format: str; options = 'int, pct; default = 'int'

A string indicating the tick format for the graph axes based on the type of data used.

`get_barplot`

Generates a Seaborn barplot with error bars in order to visualize mean differences and confidence intervals for those differences between two or more groups. These can be used in the final presentation of results.

Parameters

data: pandas dataframe; required

Dataframe used to generate charts. Data should be in a tidy format.
groups: str; required

The name of the column in data by which you are grouping results.
data_col: str; required

The name of the column in data with continuous data to compare by group.
ci: int; default = 95

Confidence intervals for the error bars.
order: list; optional; default = None

The order you want groups to appear in the graphs (i.e., Before and After). Must match a list of distinct values in the groups parameter.
hue: str; optional; default = None

Seaborn plot hue in order to generate grouped comparisons.
x-label: str; default = '' (empty)

x-axis label for the chart.
y-label: str; default = '' (empty)

y-axis label for the chart.
title: str; default = '' (empty)

Title for the chart.
palette: optional; default = None

Seaborn palette for the chart.
- tick_format: str; options = 'int, pct; default = 'int'
  
  A string indicating the tick format for the graph axes based on the type of data used.

`stat_tests`

Functions for conducting the appropriate parametric and non-parametric statistical test based on the specified experimental design and number of groups. We recommend conducting an a priori power analysis to ensure each group you're comparing has a sufficient sample size.

`run_stat_test`

Conduct a parametric and non-parametric statistical test for two or more groups. The type of test based on number of groups is determined automatically.

Parameters

data: pandas dataframe; required

Dataframe used to generate charts. Data should be in a tidy format.
groups: str; required

The name of the column in data by which you are grouping results.
data_col: str; required

The name of the column in data with continuous data to compare by group.
index: str; required

The dataframe column to use as an index when shaping data for a specific test. This should be your unit of comparison (i.e., user, day, event, etc).
dimensions: list; optional; default = None

A subset of groups to use in statistical comparisons. If this isn't specified, dimensions for statistical tests will be a list of distinct values in groups.
comparison: str; required; options = 'ind, rep; default = 'ind'

The type of experimental design (independent or repeated measures).
description: str; optional; default = ''

A description of the statistical test to include at the top of the summary of results. Defaults to none, and uses the standard output from each test.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.1

Jun 10, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gh-data-utils-0.0.1.tar.gz (4.6 kB view hashes)

Uploaded Jun 10, 2020 Source

Built Distribution

gh_data_utils-0.0.1-py3-none-any.whl (6.4 kB view hashes)

Uploaded Jun 10, 2020 Python 3

Hashes for gh-data-utils-0.0.1.tar.gz

Hashes for gh-data-utils-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`ced84a9e8c0ce05a61e78db6d066ffe760443dc5f6e47d80d07f2b34a2617045`
MD5	`c7d0fdaa305bd1dde1523578c1f017db`
BLAKE2b-256	`695d92bf26d0214c48a7bd2a051152afdde1067b2cbcd58079e140dc16203a33`

Hashes for gh_data_utils-0.0.1-py3-none-any.whl

Hashes for gh_data_utils-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e30588dd99546ccbfbbea9057d570d0ab86c2bab8e68d3ef2560fad2040dd8e3`
MD5	`e8734b99dbe1c4ec9f2a7868dbec7d8e`
BLAKE2b-256	`126ecc30572ae5e06bb377029be84f805757a714faf6e8eff5c7f43769c55d3f`