A Python package for the full A/B test lifecycle — power analysis, hypothesis testing, sequential testing, and CUPED variance reduction.

These details have not been verified by PyPI

Project links

Homepage

Project description

abforge

A Python package for the full A/B test lifecycle, from power analysis to sequential monitoring and variance reduction.

Overview

abforge is a statistics library for designing and analyzing randomized controlled experiments. It was built to cover the parts of experiment design that most tutorials skip.

abforge covers four stages of an experiment.

Power analysis determines the sample size needed before data collection starts, so you do not end up with an underpowered experiment or stop too early by accident.
Hypothesis testing evaluates whether an observed difference between control and treatment is statistically meaningful.
Sequential testing monitors results at planned intervals during the experiment without inflating the false positive rate, which is what happens when you peek at results repeatedly without a stopping rule.
CUPED variance reduction uses pre-experiment data to make the outcome metric less noisy, so you get more statistical power out of the same sample size.

The demo notebook applies all four stages to a real e-commerce dataset, framed around a concrete business question: does a promotional banner increase mobile conversion rates at the Google Merchandise Store?

Why abforge?

When working on an experiment, running a t-test is the easy part. Researchers first need to know how many users to collect. During the experiment, checking results without inflating the false positive rate requires a stopping rule. Getting more out of noisy metrics without collecting more data is the next challenge. abforge covers each of these steps.

Features

Module	What it does
`power`	Sample size calculation, MDE estimation, power curves
`stats`	Two-proportion z-test, Welch's t-test, chi-square test
`sequential`	Sequential testing with O'Brien-Fleming and Pocock alpha spending
`cuped`	CUPED variance reduction using pre-experiment covariates
`viz`	Plotly visualizations for all of the above

Quickstart

import abforge

# 1. How many users do I need?
n = abforge.sample_size(
    baseline_rate=0.041,        # current conversion rate
    min_detectable_effect=0.20, # detect 20% relative lift
    alpha=0.05,
    power=0.80,
)
print(f"Required sample size: {n:,} per variant")

# 2. Analyze results
result = abforge.proportions_test(
    control_conversions=417,  control_n=n,
    treatment_conversions=530, treatment_n=n,
)
print(result)

# 3. Monitor safely with sequential testing
test = abforge.SequentialTest(max_n=n, spending='obrien_fleming')
status = test.evaluate(
    control_conversions=417,  control_n=n // 2,
    treatment_conversions=530, treatment_n=n // 2,
)
print(status)

# 4. Reduce variance with CUPED
cuped_result = abforge.cuped(
    control_metric=control_revenue,
    treatment_metric=treatment_revenue,
    control_covariate=control_prior_revenue,
    treatment_covariate=treatment_prior_revenue,
)
print(cuped_result)

Installation

git clone https://github.com/aditiputtur/abforge
cd abforge
pip install -r requirements.txt

Demo Notebook

notebooks/ecommerce_ab_analysis.ipynb

A full end-to-end analysis using the Google Analytics Merchandise Store public dataset covering 903,653 sessions from August 2016 through August 2017.

Part 1. Consumer Behavior Analysis

Half of all sessions bounce after a single page. Overall conversion sits at 1.28%. CPM traffic produces $21 in revenue per session while organic produces $1.03. Desktop users convert at 1.67% compared to 0.41% on mobile. Despite traffic from many countries, 93% of revenue comes from the United States. December shows a conversion spike and Thursday is the strongest day of the week.

Part 2. A/B Test Simulation with abforge

The power analysis found that 104,814 sessions per variant are needed to detect a 20% relative lift in mobile conversion. The z-test detected a 27.1% lift at p=0.0002. Pocock sequential boundaries were crossed at look 3 of 10. CUPED is demonstrated using pageviews as a pre-experiment covariate. A 20% mobile conversion lift translates to roughly $11,000 in additional annual revenue at current traffic levels.

Module Reference

`abforge.power`

sample_size(baseline_rate, min_detectable_effect, alpha=0.05, power=0.80)
minimum_detectable_effect(baseline_rate, n, alpha=0.05, power=0.80)
power_curve(baseline_rate, effects, n, alpha=0.05)

`abforge.stats`

proportions_test(control_conversions, control_n,
                 treatment_conversions, treatment_n, alpha=0.05)
means_test(control_values, treatment_values, alpha=0.05)
chi_square_test(contingency_table, alpha=0.05)

`abforge.sequential`

SequentialTest(max_n, alpha=0.05, spending='obrien_fleming')
    .evaluate(control_conversions, control_n,
              treatment_conversions, treatment_n)
    .simulate(true_control_rate, true_treatment_rate, n_looks=10)

Spending functions available: obrien_fleming, pocock, linear

`abforge.cuped`

cuped(control_metric, treatment_metric,
      control_covariate, treatment_covariate, alpha=0.05)
check_covariate_quality(metric, covariate)

`abforge.viz`

plot_power_curve(baseline_rate, effects, n, alpha, title)
plot_test_result(result, title)
plot_sequential_boundaries(looks_data, title)
plot_cuped_comparison(cuped_result, title)

Methodology Notes

Peeking at experiment results before they finish inflates false positives. Sequential testing with alpha spending functions lets you check results at planned intervals without that cost. abforge supports three spending functions: O'Brien-Fleming, which spends very little alpha early and is recommended for most experiments; Pocock, which applies constant boundaries at each look; and linear, which spends proportionally to information fraction.

CUPED removes variance explained by a pre-experiment covariate from the outcome metric. The stronger the correlation between the covariate and the metric, the more variance is removed, and the smaller the sample needed to reach the same power. In practice, prior-period values of the same metric work best as covariates.

Deng, A., Xu, Y., Kohavi, R., and Walker, T. (2013). Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. WSDM 2013.

Repo Structure

abforge/
├── abforge/
│   ├── __init__.py
│   ├── power.py        # Sample size and power calculations
│   ├── stats.py        # Hypothesis tests
│   ├── sequential.py   # Sequential testing and alpha spending
│   ├── cuped.py        # CUPED variance reduction
│   └── viz.py          # Plotly visualizations
├── notebooks/
│   └── ecommerce_ab_analysis.ipynb
└── requirements.txt

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abforge-0.1.0.tar.gz (16.1 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

abforge-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file abforge-0.1.0.tar.gz.

File metadata

Download URL: abforge-0.1.0.tar.gz
Upload date: May 18, 2026
Size: 16.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for abforge-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`30fd18121f52274e9f215350d48d16c6d8c200864661eba4b4adbb3d160c4296`
MD5	`3995fb1a4000ce1b069d26aca08e17f0`
BLAKE2b-256	`6739ebf11ac354dfbc571c46c9897bf586cb5a99dc7b9d4333c2025636baafab`

See more details on using hashes here.

File details

Details for the file abforge-0.1.0-py3-none-any.whl.

File metadata

Download URL: abforge-0.1.0-py3-none-any.whl
Upload date: May 18, 2026
Size: 14.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for abforge-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e81aa7ba4ab389a5a407fad15580928eb29ef695024922ac5a946aa0e7b634b`
MD5	`724ded9ecb75a0e28e29c5da96f4b497`
BLAKE2b-256	`acddf11e8bca513cd06d3d730828b7bbb53bd1ca47e0526f138138f8b5086690`

See more details on using hashes here.

abforge 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

abforge

Overview

Why abforge?

Features

Quickstart

Installation

Demo Notebook

Module Reference

`abforge.power`

`abforge.stats`

`abforge.sequential`

`abforge.cuped`

`abforge.viz`

Methodology Notes

Repo Structure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes