Skip to main content

Advanced variance reduction methods.

Project description

The Twiser package implements a Python library for variance reduction in A/B tests using pre-experiment covariates supporting publication [1]. These functions extend the idea of using pre-experiment data for variance reduction previously proposed in publication [2].

Installation

Only Python>=3.7 is officially supported, but older versions of Python likely work as well.

The package is installed with:

pip install twiser

See GitHub, PyPI, and Read the Docs.

Example Usage

A full demo notebook of the package is given in demo/survey_loan.ipynb. Here is a snippet of the different methods from the notebook:

Setup a predictor as a control variate

First, we need to define a regression model. We can use anything that fits the sklearn idiom of fit and predict methods. This predictor is used to take the n x d array of treatment unit covariates x_covariates and predict the treatment outcomes n-length outcome array x. Likewise, it makes predictions from the m x d array of control unit covariates y_covariates to the control m-length outcome array y.

predictor = RandomForestRegressor(criterion="squared_error", random_state=0)

Basic z-test

First, we apply the basic two-sample z-test included in Twiser. This works basically the same as scipy.stats.ttest_ind.

estimate, (lb, ub), pval = twiser.ztest(x, y, alpha=0.05)
show_output(estimate, (lb, ub), pval)
ATE estimate: 0.80 in (-0.14, 1.75), CI width of 1.89, p = 0.0954

Variance reduction with held out data

Next, we apply variance reduction where the predictor was trained on a held out 30% of the data. This is the easiest to show validity, but some of the added power is lost because not all data is used in the test.

estimate, (lb, ub), pval = twiser.ztest_held_out_train(
  x,
  x_covariates,
  y,
  y_covariates,
  alpha=0.05,
  train_frac=0.3,
  predictor=predictor,
  random=np.random.RandomState(123),
)
show_output(estimate, (lb, ub), pval)
ATE estimate: 1.40 in (0.20, 2.59), CI width of 2.39, p = 0.0217*

Variance reduction with cross validation

To be more statistically efficient we train and predict using 10-fold cross validation. Here, no data is wasted. As we can see it is a more significant result.

estimate, (lb, ub), pval = twiser.ztest_cross_val_train(
  x,
  x_covariates,
  y,
  y_covariates,
  alpha=0.05,
  k_fold=10,
  predictor=predictor,
  random=np.random.RandomState(123),
)
show_output(estimate, (lb, ub), pval)
ATE estimate: 1.38 in (0.51, 2.25), CI width of 1.74, p = 0.0019*

Variance reduction in-sample

In the literature it is popular to train the predictor in the same sample as the test. This often gives the most power. However, any overfitting in the predictor can also invalidate the results.

estimate, (lb, ub), pval = twiser.ztest_in_sample_train(
  x,
  x_covariates,
  y,
  y_covariates,
  alpha=0.05,
  predictor=predictor,
  random=np.random.RandomState(123),
)
show_output(estimate, (lb, ub), pval)
ATE estimate: 0.86 in (0.24, 1.49), CI width of 1.24, p = 0.0065*

Other interfaces

It is also possible to call these methods using raw control predictions instead of training the predictor in the Twiser method. It also supports a sufficient statistics interface for working with large datasets. See the documentation for details.

Support

Create a new issue.

References

License

This project is licensed under the Apache 2 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twiser-0.0.1.tar.gz (16.8 kB view details)

Uploaded Source

File details

Details for the file twiser-0.0.1.tar.gz.

File metadata

  • Download URL: twiser-0.0.1.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.4

File hashes

Hashes for twiser-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a04b9b98d7a14b046f1568b485a39306dc4d22468fdb7d23b1ee1dad6b1f63aa
MD5 a8c3ce901fad5e8b5df4a57c133e2b98
BLAKE2b-256 4958b68db6ad2454eb3a39608d93dbd8d15f37bf9c23b3475f68e1398ce317b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page