Skip to main content

Python package for conducting power analysis given ratio metrics, clustered data, and covariate adjustment.

Project description

olspow: Power Analysis for Experiments Using Regression / Clustered Data

What is it?

olspow is a python package designed to elucidate the level of statistical power, the sample size, and the minimum detectable effect (MDE) within the context of randomized, controlled trials (i.e. A/B tests) where the experimenter is using OLS to estimate the effect size of a dichotomous treatment variable. The underlying methodology can be equally applied to clustered data, or simpler experimental designs where the relationship between observations and units of experimental assignment are 1:1.

Why OLS?

In the context of an A/B test, the mean difference across different covariates will be zero assuming that experimental assignment is appropriately random. However, the observed difference will rarely be precisely zero. These non-zero differences (typically referred to as 'covariate imbalance') introduce noise within our estimate of the effect of being treated on the response variable.

Ordinary Least Squares (OLS) is a well understood estimator that is available in a variety of packages in Python (scipy, statsmodels, sklearn) which is well-suited to mitigate covariate imbalance assuming that the experimenter provides appropriate covariates to adjust for (i.e. predictors that are orthogonal to the treatment variable). In other words, thoughtful use of OLS is a form of covariate adjustment. Most power analysis tools presume the use of a t-test and as such, cannot account for the degree to which covariate adjustment improves the sensitivity of our statistical test. olspow was specifically designed to address this problem.

Calling the solve_power() Method

All functionality is accessed via the solve_power() method, which returns the minimum detectable effect (MDE), power, or required sample size (as measured in number of units of experimental assignment).

solve_power(data, endog, exog, cluster, ratio, alpha, mde, power, n, alternative, verbose):

is_ratio (boolean, required) : A boolean representing whether the metric represents a ratio of two variables. Setting this value to 'True' makes the 'numerator' and 'denominator' arguments mandatory while rendering the 'endog' argument superfluous. Contrariwise, a value of 'False' will obviate the need for the 'numerator' and 'denominator' arguments while making the 'endog' argument mandatory.
data (pandas.core.frame.DataFrame, required) : A Pandas dataframe containing (at a minimum) historical values for the response variable and the cluster key. Ideally, the dataframe contains 100 unique units of experimental assignment per covariate.
endog (string, optional) : A string representing the response variable being modeled - i.e. the metric of interest.
exog (list of strings, required) : A list of strings representing the names of covariates that are being adjusted for (i.e. column names within the Pandas dataframe).
numerator (str, optional): The name of the numerator variable (e.g. if the metric is items fulfilled per unit of time, then this value would be the number of items fulfilled). Required when 'is_ratio' = True. Defaults to None.
denominator (str, optional): The name of the denominator variable (e.g. if the metric is items fulfilled per unit of time, then this value would be the amount of time). Required when 'is_ratio' = True. Defaults to None.
cluster (string, required) : The name of the column in the Pandas dataframe that serves as the cluster key (unit of experimental assignment)
ratio (float, optional) : Assumed ratio of treated units of assignment to control units of assignment. Defaults to 0.5 (i.e. 50:50 assignment between treated and control)
alpha (float, optional) : A float corresponding to the false positive rate. The default value is 0.05
power (float, optional) : A float corresponding to the level of statistical power.
n (integer, optional) : An integer corresponding to the number of units of experimental assignment.
alternative (string, optional) : A string that can be 'one-sided' or 'two-sided' denoting if the experimenter is designing a one-tailed or two-tailed test. The default value is 'two-sided'.
verbose (string, boolean) : A boolean that, if set to 'true', will print all steps in the power analysis workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

olspow-1.1.2.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

olspow-1.1.2-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file olspow-1.1.2.tar.gz.

File metadata

  • Download URL: olspow-1.1.2.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.4

File hashes

Hashes for olspow-1.1.2.tar.gz
Algorithm Hash digest
SHA256 644a37772cd7d77957f392cb1e9391fdabc2a70246112eae04cb28960260c107
MD5 e5c6543672bd8e540ce4969b4f4190f4
BLAKE2b-256 0de05b8000d5f9653ec3bd96efac6d04f2d12473c7b3471daad195e1557df0f5

See more details on using hashes here.

File details

Details for the file olspow-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: olspow-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.4

File hashes

Hashes for olspow-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b0cc3a9853d99ce9515beb1a3c9835148a6da6a32b5f8c4b4d0baa2606e14f6f
MD5 cd702a545048e98a9da70b911e1404db
BLAKE2b-256 c12d56935520c4a5d523e8fd5726d7d53ffdaf26796703d986bf227e0e54faaf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page