A/B test framework: power analysis, frequentist & Bayesian tests, CUPED, sequential testing, novelty detection

These details have not been verified by PyPI

Project links

Project description

abtest-framework

A production-grade A/B testing library for data scientists. Covers the full experiment lifecycle — from power analysis through Bayesian decision-making — with a Streamlit dashboard for interactive analysis.

Features

Module	What it does
`power`	Sample size & MDE calculations for proportions and means
`frequentist`	z-test, Welch's t-test, CUPED variance reduction, Mann-Whitney, Bonferroni/BH correction
`bayesian`	Beta-Binomial & Normal-Normal Bayesian tests with expected loss
`sequential`	SPRT, always-valid p-values, novelty/primacy effect detection, SRM check

Quick Start

from abtest.power import sample_size_proportion
from abtest.frequentist import proportion_test, cuped_test
from abtest.bayesian import bayesian_proportion_test
from abtest.sequential import SPRTMonitor, detect_novelty_effect, check_srm

# 1. Pre-experiment: how many users do I need?
result = sample_size_proportion(baseline_rate=0.10, mde=0.02, alpha=0.05, power=0.80)
print(result)
# Per-group n: 3,841  |  Total n: 7,682

# 2. Sanity check before analysis
srm = check_srm(actual_ctrl=3841, actual_trt=3841)
print(srm["message"])  # ✅ No SRM detected

# 3. Frequentist test
result = proportion_test(388, 3841, 469, 3841, alpha=0.05)
print(result)
# Lift: +20.9%  |  p=0.0033  |  ✅ Significant

# 4. CUPED variance reduction (secondary metric)
cuped_result = cuped_test(ctrl_revenue, trt_revenue, ctrl_pre_revenue, trt_pre_revenue)
print(f"Variance reduced by {cuped_result.variance_reduction_pct}%")

# 5. Bayesian decision
bayes = bayesian_proportion_test(388, 3841, 469, 3841, loss_threshold=0.001)
print(f"P(treatment better): {bayes.prob_treatment_better:.1%}")
print(f"Decision: {bayes.decision}")
# P(treatment better): 99.8%
# Decision: ✅ Ship Treatment

# 6. Sequential monitoring (stop early when evidence is strong)
monitor = SPRTMonitor(p0=0.10, p1=0.12, alpha=0.05, beta=0.20)
for ctrl_obs, trt_obs in data_stream:
    result = monitor.update(ctrl_obs, trt_obs)
    if result.decision != "continue":
        print(result)
        break

# 7. Check for novelty effects
novelty = detect_novelty_effect(ctrl_timeseries, trt_timeseries, n_windows=4)
print(novelty)

Installation

# From PyPI (once published)
pip install abtest-framework

# From source
git clone https://github.com/yourusername/abtest-framework
cd abtest-framework
pip install -e ".[app]"

Streamlit Dashboard

streamlit run app.py

The dashboard has four pages:

Power Analysis — interactive sample size calculator with power curves
Significance Tests — proportion z-test, CUPED, multiple testing correction
Bayesian Analysis — posterior visualisation and expected loss decisions
Sequential & Novelty — SPRT monitoring chart and novelty effect detector

Project Structure

abtest-framework/
├── abtest/
│   ├── __init__.py       # Clean public API
│   ├── power.py          # Sample size & MDE calculations
│   ├── frequentist.py    # z-test, t-test, CUPED, BH correction
│   ├── bayesian.py       # Beta-Binomial & Normal-Normal Bayesian tests
│   └── sequential.py     # SPRT, always-valid p-values, novelty detection
├── tests/
│   └── test_abtest.py    # 23 unit tests (pytest)
├── examples/
│   └── full_experiment_walkthrough.py
├── app.py                # Streamlit dashboard
├── pyproject.toml        # Package config (PyPI-ready)
└── README.md

Key Concepts Implemented

CUPED (Controlled-experiment Using Pre-Experiment Data)

Variance reduction technique from Microsoft Research (Deng et al., 2013). Regresses out a pre-experiment covariate to reduce metric noise, giving equivalent power with fewer users.

Expected Loss (Bayesian Decision Rule)

Instead of a binary significant/not-significant decision, compute:

E[loss | choose treatment] = average amount lost if treatment turns out worse
Ship when expected loss falls below a business-defined threshold

SPRT (Sequential Probability Ratio Test)

Wald's sequential test that lets you stop experiments early when evidence accumulates — without inflating the false positive rate the way repeated classical testing does.

Novelty Effect Detection

Fits a linear regression on per-window treatment lift over time. Flags experiments where lift significantly decays — indicating users were reacting to novelty, not real value.

Sample Ratio Mismatch (SRM)

Chi-square test that checks whether the actual user split matches the intended randomisation ratio. A significant SRM means the experiment has a bug and results are invalid.

Testing

pytest tests/ -v
# 23 passed in 1.55s

Publishing to PyPI

pip install build twine
python -m build
twine upload dist/*

References

Deng, A., et al. (2013). Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data. WSDM.
Johari, R., et al. (2017). Peeking at A/B Tests: Why it matters, and what to do about it. KDD.
Wald, A. (1945). Sequential Tests of Statistical Hypotheses. Annals of Mathematical Statistics.
Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments. Cambridge.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abtest_framework-0.1.0.tar.gz (5.3 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

abtest_framework-0.1.0-py3-none-any.whl (4.8 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file abtest_framework-0.1.0.tar.gz.

File metadata

Download URL: abtest_framework-0.1.0.tar.gz
Upload date: Jun 30, 2026
Size: 5.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for abtest_framework-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ca9911afd877909b0a4c70837f5d06c7a142726b485874bff379fbed301516d4`
MD5	`7d13d97f2dc1daae81a28f405163b098`
BLAKE2b-256	`8e4d28b5f4b42f7eaa4d42f9df0cf1c5cb334e8f2eb1bc83b9d491b30e2b7771`

See more details on using hashes here.

File details

Details for the file abtest_framework-0.1.0-py3-none-any.whl.

File metadata

Download URL: abtest_framework-0.1.0-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 4.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for abtest_framework-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cc4aafeeaf438e8ce45ddf0c5930e3cd9bf859d8e729a3f56527088648d3eb10`
MD5	`f01d7247f5c427381240f8537e1f8a33`
BLAKE2b-256	`b8d51d388d0b44e1db65ccfa36e3fcc7f9a199afe630c1104f4a1e4b272a146a`

See more details on using hashes here.

abtest-framework 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

abtest-framework

Features

Quick Start

Installation

Streamlit Dashboard

Project Structure

Key Concepts Implemented

CUPED (Controlled-experiment Using Pre-Experiment Data)

Expected Loss (Bayesian Decision Rule)

SPRT (Sequential Probability Ratio Test)

Novelty Effect Detection

Sample Ratio Mismatch (SRM)

Testing

Publishing to PyPI

References

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes