Python package for conducting power analysis given ratio metrics, clustered data, and covariate adjustment.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

olspow: Power Analysis for Experiments Using Regression / Clustered Data

What is it?

olspow is a python package designed to elucidate the level of statistical power, the sample size, and the minimum detectable effect (MDE) within the context of randomized, controlled trials (i.e. A/B tests) where the experimenter is using OLS to estimate the effect size of a dichotomous treatment variable. The underlying methodology can be equally applied to clustered data, or simpler experimental designs where the relationship between observations and units of experimental assignment are 1:1.

Why OLS?

In the context of an A/B test, the mean difference across different covariates will be zero assuming that experimental assignment is appropriately random. However, the observed difference will rarely be precisely zero. These non-zero differences (typically referred to as 'covariate imbalance') introduce noise within our estimate of the effect of being treated on the response variable.

Ordinary Least Squares (OLS) is a well understood estimator that is available in a variety of packages in Python (scipy, statsmodels, sklearn) which is well-suited to mitigate covariate imbalance assuming that the experimenter provides appropriate covariates to adjust for (i.e. predictors that are orthogonal to the treatment variable). In other words, thoughtful use of OLS is a form of covariate adjustment. Most power analysis tools presume the use of a t-test and as such, cannot account for the degree to which covariate adjustment improves the sensitivity of our statistical test. olspow was specifically designed to address this problem.

Calling the solve_power() Method

All functionality is accessed via the solve_power() method, which returns the minimum detectable effect (MDE), power, or required sample size (as measured in number of units of experimental assignment).

solve_power(data, endog, exog, cluster, ratio, alpha, mde, power, n, alternative, verbose):

is_ratio (boolean, required) : A boolean representing whether the metric represents a ratio of two variables. Setting this value to 'True' makes the 'numerator' and 'denominator' arguments mandatory while rendering the 'endog' argument superfluous. Contrariwise, a value of 'False' will obviate the need for the 'numerator' and 'denominator' arguments while making the 'endog' argument mandatory.
data (pandas.core.frame.DataFrame, required) : A Pandas dataframe containing (at a minimum) historical values for the response variable and the cluster key. Ideally, the dataframe contains 100 unique units of experimental assignment per covariate.
endog (string, optional) : A string representing the response variable being modeled - i.e. the metric of interest.
exog (list of strings, required) : A list of strings representing the names of covariates that are being adjusted for (i.e. column names within the Pandas dataframe).
numerator (str, optional): The name of the numerator variable (e.g. if the metric is items fulfilled per unit of time, then this value would be the number of items fulfilled). Required when 'is_ratio' = True. Defaults to None.
denominator (str, optional): The name of the denominator variable (e.g. if the metric is items fulfilled per unit of time, then this value would be the amount of time). Required when 'is_ratio' = True. Defaults to None.
cluster (string, required) : The name of the column in the Pandas dataframe that serves as the cluster key (unit of experimental assignment)
ratio (float, optional) : Assumed ratio of treated units of assignment to control units of assignment. Defaults to 0.5 (i.e. 50:50 assignment between treated and control)
alpha (float, optional) : A float corresponding to the false positive rate. The default value is 0.05
power (float, optional) : A float corresponding to the level of statistical power.
n (integer, optional) : An integer corresponding to the number of units of experimental assignment.
alternative (string, optional) : A string that can be 'one-sided' or 'two-sided' denoting if the experimenter is designing a one-tailed or two-tailed test. The default value is 'two-sided'.
verbose (string, boolean) : A boolean that, if set to 'true', will print all steps in the power analysis workflow.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.1.2

Apr 17, 2024

1.0.0

Feb 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olspow-1.1.2.tar.gz (11.7 kB view hashes)

Uploaded Apr 17, 2024 Source

Built Distribution

olspow-1.1.2-py3-none-any.whl (10.0 kB view hashes)

Uploaded Apr 17, 2024 Python 3

Hashes for olspow-1.1.2.tar.gz

Hashes for olspow-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`644a37772cd7d77957f392cb1e9391fdabc2a70246112eae04cb28960260c107`
MD5	`e5c6543672bd8e540ce4969b4f4190f4`
BLAKE2b-256	`0de05b8000d5f9653ec3bd96efac6d04f2d12473c7b3471daad195e1557df0f5`

Hashes for olspow-1.1.2-py3-none-any.whl

Hashes for olspow-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0cc3a9853d99ce9515beb1a3c9835148a6da6a32b5f8c4b4d0baa2606e14f6f`
MD5	`cd702a545048e98a9da70b911e1404db`
BLAKE2b-256	`c12d56935520c4a5d523e8fd5726d7d53ffdaf26796703d986bf227e0e54faaf`