No project description provided

These details have not been verified by PyPI

Project description

cluster_experiments

License

A Python library for end-to-end A/B testing workflows, featuring:

Experiment analysis and scorecards
Power analysis (simulation-based and normal approximation)
Variance reduction techniques (CUPED, CUPAC)
Support for complex experimental designs (cluster randomization, switchback experiments)

Key Features

1. Power Analysis

Simulation-based: Run Monte Carlo simulations to estimate power
Normal approximation: Fast power estimation using CLT
Minimum Detectable Effect: Calculate required effect sizes
Multiple designs: Support for:
- Simple randomization
- Variance reduction techniques in power analysis
- Cluster randomization
- Switchback experiments
Dict config: Easy to configure power analysis with a dictionary

2. Experiment Analysis

Analysis Plans: Define structured analysis plans
Metrics:
- Simple metrics
- Ratio metrics
Dimensions: Slice results by dimensions
Statistical Methods:
- GEE
- Mixed Linear Models
- Clustered / regular OLS
- T-tests
- Synthetic Control
Dict config: Easy to define analysis plans with a dictionary

3. Variance Reduction

CUPED (Controlled-experiment Using Pre-Experiment Data):
- Use historical outcome data to reduce variance, choose any granularity
- Support for several covariates
CUPAC (Control Using Predictors as Covariates):
- Use any scikit-learn compatible estimator to predict the outcome with pre-experiment data

Quick Start

Power Analysis Example

import numpy as np
import pandas as pd
from cluster_experiments import PowerAnalysis, NormalPowerAnalysis

# Create sample data
N = 1_000
df = pd.DataFrame({
    "target": np.random.normal(0, 1, size=N),
    "date": pd.to_datetime(
        np.random.randint(
            pd.Timestamp("2024-01-01").value,
            pd.Timestamp("2024-01-31").value,
            size=N,
        )
    ),
})

# Simulation-based power analysis with CUPED
config = {
    "analysis": "ols",
    "perturbator": "constant",
    "splitter": "non_clustered",
    "n_simulations": 50,
}
pw = PowerAnalysis.from_dict(config)
power = pw.power_analysis(df, average_effect=0.1)

# Normal approximation (faster)
npw = NormalPowerAnalysis.from_dict({
    "analysis": "ols",
    "splitter": "non_clustered",
    "n_simulations": 5,
    "time_col": "date",
})
power_normal = npw.power_analysis(df, average_effect=0.1)
power_line_normal = npw.power_line(df, average_effects=[0.1, 0.2, 0.3])


# MDE calculation
mde = npw.mde(df, power=0.8)

# MDE line with length
mde_timeline = npw.mde_time_line(
    df,
    powers=[0.8],
    experiment_length=[7, 14, 21]
)

print(power, power_line_normal, power_normal, mde, mde_timeline)

Experiment Analysis Example

import numpy as np
import pandas as pd
from cluster_experiments import AnalysisPlan

N = 1_000
experiment_data = pd.DataFrame({
    "order_value": np.random.normal(100, 10, size=N),
    "delivery_time": np.random.normal(10, 1, size=N),
    "experiment_group": np.random.choice(["control", "treatment"], size=N),
    "city": np.random.choice(["NYC", "LA"], size=N),
    "customer_id": np.random.randint(1, 100, size=N),
    "customer_age": np.random.randint(20, 60, size=N),
})

# Create analysis plan
plan = AnalysisPlan.from_metrics_dict({
    "metrics": [
        {"alias": "AOV", "name": "order_value"},
        {"alias": "delivery_time", "name": "delivery_time"},
    ],
    "variants": [
        {"name": "control", "is_control": True},
        {"name": "treatment", "is_control": False},
    ],
    "variant_col": "experiment_group",
    "alpha": 0.05,
    "dimensions": [
        {"name": "city", "values": ["NYC", "LA"]},
    ],
    "analysis_type": "clustered_ols",
    "analysis_config": {"cluster_cols": ["customer_id"]},
})
# Run analysis
print(plan.analyze(experiment_data).to_dataframe())

Variance Reduction Example

import numpy as np
import pandas as pd
from cluster_experiments import (
    AnalysisPlan,
    SimpleMetric,
    Variant,
    Dimension,
    TargetAggregation,
    HypothesisTest
)

N = 1000

experiment_data = pd.DataFrame({
    "order_value": np.random.normal(100, 10, size=N),
    "delivery_time": np.random.normal(10, 1, size=N),
    "experiment_group": np.random.choice(["control", "treatment"], size=N),
    "city": np.random.choice(["NYC", "LA"], size=N),
    "customer_id": np.random.randint(1, 100, size=N),
    "customer_age": np.random.randint(20, 60, size=N),
})

pre_experiment_data = pd.DataFrame({
    "order_value": np.random.normal(100, 10, size=N),
    "customer_id": np.random.randint(1, 100, size=N),
})

# Define test
cupac_model = TargetAggregation(
    agg_col="customer_id",
    target_col="order_value"
)

hypothesis_test = HypothesisTest(
    metric=SimpleMetric(alias="AOV", name="order_value"),
    analysis_type="clustered_ols",
    analysis_config={
        "cluster_cols": ["customer_id"],
        "covariates": ["customer_age", "estimate_order_value"],
    },
    cupac_config={
        "cupac_model": cupac_model,
        "target_col": "order_value",
    },
)

# Create analysis plan
plan = AnalysisPlan(
    tests=[hypothesis_test],
    variants=[
        Variant("control", is_control=True),
        Variant("treatment", is_control=False),
    ],
    variant_col="experiment_group",
)

# Run analysis
results = plan.analyze(experiment_data, pre_experiment_data)
print(results.to_dataframe())

Installation

You can install this package via pip.

pip install cluster-experiments

For detailed documentation and examples, visit our documentation site.

Features

The library offers the following classes:

Regarding power analysis:
- PowerAnalysis: to run power analysis on any experiment design, using simulation
- PowerAnalysisWithPreExperimentData: to run power analysis on a clustered/switchback design, but adding pre-experiment df during split and perturbation (especially useful for Synthetic Control)
- NormalPowerAnalysis: to run power analysis on any experiment design using the central limit theorem for the distribution of the estimator. It can be used to compute the minimum detectable effect (MDE) for a given power level.
- ConstantPerturbator: to artificially perturb treated group with constant perturbations
- BinaryPerturbator: to artificially perturb treated group for binary outcomes
- RelativePositivePerturbator: to artificially perturb treated group with relative positive perturbations
- RelativeMixedPerturbator: to artificially perturb treated group with relative perturbations for positive and negative targets
- NormalPerturbator: to artificially perturb treated group with normal distribution perturbations
- BetaRelativePositivePerturbator: to artificially perturb treated group with relative positive beta distribution perturbations
- BetaRelativePerturbator: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval
- SegmentedBetaRelativePerturbator: to artificially perturb treated group with relative beta distribution perturbations in a specified support interval, but using clusters
Regarding splitting data:
- ClusteredSplitter: to split data based on clusters
- FixedSizeClusteredSplitter: to split data based on clusters with a fixed size (example: only 1 treatment cluster and the rest in control)
- BalancedClusteredSplitter: to split data based on clusters in a balanced way
- NonClusteredSplitter: Regular data splitting, no clusters
- StratifiedClusteredSplitter: to split based on clusters and strata, balancing the number of clusters in each stratus
- RepeatedSampler: for backtests where we have access to counterfactuals, does not split the data, just duplicates the data for all groups
- Switchback splitters (the same can be done with clustered splitters, but there is a convenient way to define switchback splitters using switch frequency):
  - SwitchbackSplitter: to split data based on clusters and dates, for switchback experiments
  - BalancedSwitchbackSplitter: to split data based on clusters and dates, for switchback experiments, balancing treatment and control among all clusters
  - StratifiedSwitchbackSplitter: to split data based on clusters and dates, for switchback experiments, balancing the number of clusters in each stratus
  - Washover for switchback experiments:
    - EmptyWashover: no washover done at all.
    - ConstantWashover: accepts a timedelta parameter and removes the data when we switch from A to B for the timedelta interval.
Regarding analysis methods:
- GeeExperimentAnalysis: to run GEE analysis on the results of a clustered design
- MLMExperimentAnalysis: to run Mixed Linear Model analysis on the results of a clustered design
- TTestClusteredAnalysis: to run a t-test on aggregated data for clusters
- PairedTTestClusteredAnalysis: to run a paired t-test on aggregated data for clusters
- ClusteredOLSAnalysis: to run OLS analysis on the results of a clustered design
- OLSAnalysis: to run OLS analysis for non-clustered data
- DeltaMethodAnalysis: to run Delta Method Analysis for clustered designs
- TargetAggregation: to add pre-experimental data of the outcome to reduce variance
- SyntheticControlAnalysis: to run synthetic control analysis
Regarding experiment analysis workflow:
- Metric: abstract class to define a metric to be used in the analysis
- SimpleMetric: to create a metric defined at the same level of the data used for the analysis
- RatioMetric: to create a metric defined at a lower level than the data used for the analysis
- Variant: to define a variant of the experiment
- Dimension: to define a dimension to slice the results of the experiment
- HypothesisTest: to define a Hypothesis Test with a metric, analysis method, optional analysis configuration, and optional dimensions
- AnalysisPlan: to define a plan of analysis with a list of Hypothesis Tests for a dataset and the experiment variants. The analyze() method runs the analysis and returns the results
- AnalysisResults: to store the results of an analysis
Other:
- PowerConfig: to conveniently configure PowerAnalysis class
- ConfidenceInterval: to store the data representation of a confidence interval
- InferenceResults: to store the structure of complete statistical analysis results

Project details

These details have not been verified by PyPI

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.30.0

Mar 3, 2026

0.29.0

Dec 22, 2025

0.28.0

Oct 17, 2025

0.27.0

Aug 24, 2025

This version

0.26.0

May 16, 2025

0.25.0

Feb 11, 2025

0.24.0

Jan 15, 2025

0.23.0

Dec 20, 2024

0.22.0

Dec 20, 2024

0.21.0

Dec 17, 2024

0.20.2

Dec 13, 2024

0.20.1

Dec 12, 2024

0.20.0

Nov 7, 2024

0.19.0

Jun 21, 2024

0.18.0

Jun 14, 2024

0.17.0

Jun 14, 2024

0.16.0

Jun 12, 2024

0.15.0

May 27, 2024

0.14.1

May 3, 2024

0.14.0

Mar 4, 2024

0.13.0

Feb 28, 2024

0.12.0

Feb 6, 2024

0.11.0

Jan 12, 2024

0.10.4

Jun 23, 2023

0.10.3

May 31, 2023

0.10.2

May 31, 2023

0.10.1

May 29, 2023

0.10.0

May 26, 2023

0.9.1

May 26, 2023

0.9.0

May 26, 2023

0.8.5

May 17, 2023

0.8.4

May 17, 2023

0.8.3

May 17, 2023

0.8.2

May 12, 2023

0.8.1

May 10, 2023

0.8.0

May 10, 2023

0.7.1

May 9, 2023

0.7.0

May 9, 2023

0.6.5

May 8, 2023

0.6.4

May 8, 2023

0.6.3

Apr 28, 2023

0.6.2

Apr 14, 2023

0.6.1

Apr 12, 2023

0.6.0

Mar 9, 2023

0.5.4

Mar 1, 2023

0.5.3

Feb 10, 2023

0.5.2

Feb 10, 2023

0.5.1

Feb 8, 2023

0.5.0

Dec 30, 2022

0.4.1

Dec 27, 2022

0.4.0

Dec 23, 2022

0.3.5

Dec 23, 2022

0.3.4

Dec 19, 2022

0.3.3

Dec 9, 2022

0.3.2

Nov 15, 2022

0.3.1

Nov 9, 2022

0.3.0

Nov 7, 2022

0.2.8

Nov 2, 2022

0.2.7

Oct 26, 2022

0.2.6

Oct 25, 2022

0.2.5

Oct 24, 2022

0.2.4

Oct 24, 2022

0.2.3

Oct 21, 2022

0.2.2

Oct 11, 2022

0.2.1

Oct 7, 2022

0.2.0

Oct 3, 2022

0.1.7

Oct 2, 2022

0.1.6

Sep 30, 2022

0.1.5

Sep 29, 2022

0.1.4

Sep 26, 2022

0.1.3

Sep 19, 2022

0.1.2

Sep 13, 2022

0.1.1

Sep 5, 2022

0.1.0

Sep 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cluster_experiments-0.26.0.tar.gz (77.3 kB view details)

Uploaded May 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cluster_experiments-0.26.0-py3-none-any.whl (98.0 kB view details)

Uploaded May 16, 2025 Python 3

File details

Details for the file cluster_experiments-0.26.0.tar.gz.

File metadata

Download URL: cluster_experiments-0.26.0.tar.gz
Upload date: May 16, 2025
Size: 77.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cluster_experiments-0.26.0.tar.gz
Algorithm	Hash digest
SHA256	`cf44c1d14aad204b64bec3df3743a3ba54f8d80fee2c09908660ffa139cb348a`
MD5	`4302b128ad0b6b49a9a6cd6f2d2a22e3`
BLAKE2b-256	`fb1815b030f7e429e6901720c5cc2792ec7cd72aec27e47d026c8518a59bd2af`

See more details on using hashes here.

File details

Details for the file cluster_experiments-0.26.0-py3-none-any.whl.

File metadata

Download URL: cluster_experiments-0.26.0-py3-none-any.whl
Upload date: May 16, 2025
Size: 98.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for cluster_experiments-0.26.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4eec29db3cb32a2cd1dd53e1b700fd63177c52f0588639e491b9a0fabe6aaf63`
MD5	`e0e00dab53a5491a7addb4991ce7b468`
BLAKE2b-256	`9e727716402297413a0d9c3c1bf6132675226c4dc63589123215aecaf1f570c6`

See more details on using hashes here.

cluster-experiments 0.26.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

cluster_experiments

Key Features

1. Power Analysis

2. Experiment Analysis

3. Variance Reduction

Quick Start

Power Analysis Example

Experiment Analysis Example

Variance Reduction Example

Installation

Features

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes