Skip to main content

A low-code Python library for enterprise-grade experiment design, classical DoE, and statistical analysis.

Project description

xpyrment 🧪

PyPI version Release Python Support Tests Coverage License Statistical Engine Industrial DoE Maintainer

xpyrment is an enterprise-grade, low-code Python library designed for experiment design, classical Design of Experiments (DoE), and statistical causal inference.

It provides an elegant, object-oriented fluent API to orchestrate the entire lifecycle of digital experimentation (A/B testing) alongside the rigorous mathematical techniques of modern, enterprise-scale platforms. It features native support for CUPED (variance reduction), ratio metrics via the Delta method, multiple comparison corrections, Sample Ratio Mismatch (SRM) diagnostics, mixture SPRT continuous monitoring (mSPRT), Bayesian inference, and classical industrial DoE design matrices.


🌟 Key Features

  • Unified Fluent Orchestrator API: Initialize experiments, define metric structures, run statistical evaluations, and compile publication-ready summaries or plots in a clean, state-gated object-oriented pipeline.
  • Rigorous Variance Reduction (CUPED): Built-in support for standard CUPED (continuous metrics) and Ratio CUPED (numerator and denominator adjustment). Reduces variance and sample size requirements by up to 88%+.
  • Ratio Metric Precision: Precise variance estimation of ratio metrics (e.g., CTR, revenue per click) where both numerator and denominator are stochastic, using first-order Taylor expansion (Delta method).
  • Classical Design of Experiments (DoE): Full and Fractional Factorial, Plackett-Burman, Taguchi Orthogonal Arrays, Definitive Screening Designs (DSD), Response Surface Methodologies (CCD & Box-Behnken), and D-Optimal coordinate exchange.
  • Continuous Monitoring & Early Stopping: Always-valid confidence intervals and sequential monitoring boundaries via mixture SPRT (mSPRT) and Pocock/O'Brien-Fleming alpha-spending functions.
  • Experimental Diagnostics: Built-in automated Chi-square tests to detect Sample Ratio Mismatch (SRM), pre-experiment covariate balance validation with Standardized Mean Differences (SMD), and time-series novelty/primacy effect detectors.
  • Multi-Testing Correction: Guard against Type I error inflation by automatically adjusting p-values for multiple metrics using Holm-Bonferroni, Bonferroni, or Benjamini-Hochberg (FDR).
  • Multi-Armed Bandits & Adaptive Traffic: Dynamically allocate traffic using Beta-Binomial / Normal-Normal Thompson Sampling, standard/decaying $\varepsilon$-Greedy, and classical UCB1 optimistic exploration. Supports sliding-window and discounted Thompson Sampling for drifting baselines.
  • Heterogeneous Treatment Effects (HTE): Personalize variant targeting using CATE estimators (S-Learner, T-Learner, and propensity-weighted X-Learner) alongside custom bootstrapped Causal Forests.
  • Synthetic Controls & Quasi-Experiments: Analyze unrandomized policy deployments using Abadie SLSQP-constrained Synthetic Controls, multi-variable Difference-in-Differences (DiD) regressions, and Synthetic DiD (SDID).
  • Premium Standalone Reports: Instantly export summaries into beautiful, portable, responsive CSS-styled HTML dashboards and GitHub-compatible Markdown summary tables.
  • Audit Trail Security: Cryptographically chain and sign state updates via a SHA-256 tamper-evident ledger, ensuring experiment metadata and configuration parameters remain auditable.
  • Interactive CLI Toolchain: Perform analytical power sizing, calculate Standardized Mean Differences (SMD) on pre-period covariates, and run rapid ordinary least squares regressions directly from your terminal.

⚙️ Installation

To install the stable release of xpyrment from PyPI, simply run:

pip install xpyrment

For development and contributor setups (including pytest, black, and mypy), clone the repository and install in editable mode:

git clone https://github.com/sadatian/xpyrment.git
cd xpyrment
pip install -e .[dev]

🚀 Quickstart Tutorial

This quickstart guides you through the entire A/B testing lifecycle: designing, simulating, configuring, and analyzing.

1. Experiment Design (Power Analysis)

Before launching your test, calculate the sample size required to detect a $5%$ relative lift in a key continuous metric (e.g., Average Order Value = $100, standard deviation = $35).

import xpyrment as xp

# Calculate sample size for a standard t-test
design = xp.design_experiment(
    metric_type="mean",
    baseline_value=100.0,
    standard_deviation=35.0,
    mde=0.05,                  # 5% relative lift
    mde_type="relative",
    alpha=0.05,                # Significance level (Type I error)
    power=0.80,                # Target power (1 - Type II error)
    pre_post_correlation=0.75, # Optional: Pre-Post correlation to calculate CUPED savings!
    daily_traffic=5000         # Optional: Daily user traffic to calculate duration
)

print(design)

Output:

=========================================
       Experiment Design Summary        
=========================================
Metric Type                   : Mean
Baseline Value                : 100.0000
Target MDE (Absolute)         : 5.0000
Target MDE (Relative)         : 5.00%
Significance Level (Alpha)    : 5.00%
Statistical Power (1-Beta)    : 80.00%
Sample Size Per Variant       : 1,537
Total Sample Size Required    : 3,074
Pre-Post Correlation          : 0.75
CUPED Sample Size Per Variant : 672
CUPED Total Sample Size       : 1,344
CUPED Sample Size Savings     : 43.8%
Daily Traffic                 : 5,000/day
Estimated Duration (Standard) : 0.6 days
Estimated Duration (CUPED)    : 0.3 days
=========================================

Visualizing Power Curves

Generate coordinates and plot required sample sizes against a range of MDEs to see the impact of CUPED:

# Generate power curve coordinates
curve_data = xp.generate_power_curve_data(
    metric_type="mean",
    baseline_value=100.0,
    standard_deviation=35.0,
    pre_post_correlation=0.75
)

# Plot standard vs. CUPED required sample sizes
xp.plot_power_curve(curve_data)

2. Generate Synthetic A/B Test Data

Let's generate simulated experimental data of 10,000 users split 50/50, complete with pre-period covariates so we can demonstrate CUPED and ratio metric evaluations:

df = xp.generate_ab_data(
    n_samples=10000,
    treatment_effect_revenue=2.5,        # +$2.50 absolute lift
    treatment_effect_conversion=0.015,    # +1.5% absolute lift
    treatment_effect_clicks=0.06,         # +6% relative lift in click ratios
    pre_period_correlation=0.82,          # Correlation between pre- and post- period
    random_seed=42
)

print(df.head())
user_id variant pre_revenue revenue converted pre_clicks pre_impressions clicks impressions
USER_000001 control 4.47 4.42 0 4 93 5 96
USER_000002 treatment 56.78 61.22 1 6 112 8 108
USER_000003 control 51.12 48.91 0 5 105 3 99
USER_000004 treatment 32.54 36.90 0 3 82 4 88

3. Setup and Run Analysis

Initialize the experiment environment using the setup function, define your metrics (with pre-period specifications for automatic CUPED), and run your analysis!

# 1. Initialize experiment setup
exp = xp.setup(
    data=df, 
    treatment_col="variant", 
    id_col="user_id"
)

# 2. Define your metrics
# Continuous metric (Average revenue) with automatic CUPED!
revenue = xp.MeanMetric(
    name="Average Revenue per User", 
    value_col="revenue", 
    pre_period_col="pre_revenue"
)

# Proportion metric (Conversion rate)
conversion = xp.ProportionMetric(
    name="Purchase Conversion Rate", 
    value_col="converted"
)

# Ratio metric (Click-Through-Rate = sum(clicks)/sum(impressions)) with ratio CUPED!
ctr = xp.RatioMetric(
    name="Click-Through-Rate (CTR)", 
    numerator_col="clicks", 
    denominator_col="impressions",
    pre_numerator_col="pre_clicks",
    pre_denominator_col="pre_impressions"
)

# 3. Add metrics to the experiment container
exp.add_metrics([revenue, conversion, ctr])

# 4. Run Analysis (optionally apply multi-test corrections like 'fdr_bh')
results = exp.run_analysis(
    control="control", 
    treatment="treatment",
    multi_test_correction="fdr_bh"
)

4. Review and Visualize Results

Standard Summary DataFrame

Call .summary() to get a polished, publication-ready pandas DataFrame with automatic statistical significance annotations (* for $p < 0.05$, ** for $p < 0.01$, *** for $p < 0.001$).

summary_df = results.summary()
print(summary_df)
Metric Type Control Mean Treatment Mean Relative Lift 95% CI (Rel) p-value Post-hoc Power CUPED Var Reduction
Average Revenue per User Mean 49.9542 52.4712 +5.04% [+3.78%, +6.30%] 0.0000*** 100.0% Yes 68.3%
Purchase Conversion Rate Proportion 0.0990 0.1172 +18.42% [+4.12%, +32.72%] 0.0112* 73.1% No -
Click-Through-Rate (CTR) Ratio 0.0498 0.0528 +5.95% [+4.11%, +7.78%] 0.0000*** 100.0% Yes 71.2%

!!! tip "" CUPED was automatically applied to both Average Revenue and Click-Through-Rate, achieving over $68%$ and $71%$ variance reduction respectively! This dramatically narrowed our confidence intervals and amplified our statistical power.

Forest Plot Visualization

Call .plot() to render a gorgeous forest plot representing confidence intervals. Statistically significant lifts are automatically rendered in vibrant teal, while others are shown in subtle gray.

# Render the forest plot
results.plot()

Covariate Balance Verification (Love Plot)

# Print an ASCII love plot directly in the console
print(results.love_plot())

5. Generate Standalone HTML Reports

With the v1 release, you can export beautiful standalone HTML dashboards or Markdown cards representing your experimental results, complete with embedded modern styling, KPI metrics, and covariate balance logs.

from xpyrment.report.generator import ExperimentReportGenerator

# Initialize the report generator with the analysis results
reporter = ExperimentReportGenerator(results, experiment_name="Mobile Landing Page Redesign")

# Save a premium responsive HTML dashboard (fully styled, self-contained)
reporter.save_html("reports/ab_experiment_dashboard.html")

# Save a GitHub-compatible Markdown summary card
reporter.save_markdown("reports/ab_experiment_summary.md")

🔬 Subpackage Taxonomy & Dependency Flow

To support industrial-scale digital tests and classical DoE, the package has been structured under src/xpyrment following a one-way dependency gating layout to avoid circular references:

metrics/     ← Houses core metric taxonomy and guardrail thresholds.
core/        ← Powers the phase gating lifecycle & spec registries.
plan/        ← Computes pre-registration power/durations.
design/      ← Handles randomizations, splits & DoE matrices.
validate/    ← Houses SRM checks and covariate balance tests.
run/         ← Handles ingestion & mSPRT monitors.
analyze/     ← Orchestrates frequentist/Bayesian engines.
interactions/← Decomposes multi-factor ANOVA interaction terms.
interpret/   ← Infers ship/no-ship decisions.
report/      ← Terminal consumer of all phases. Compiles audit trails & exportable reports.

📖 Mathematical Framework

Welch's t-test

For continuous metrics without a pre-period covariate, the standard error of the mean difference is: $$ SE = \sqrt{\frac{s_C^2}{n_C} + \frac{s_T^2}{n_T}} $$ Degrees of freedom are computed via the Welch-Satterthwaite equation to handle unequal sample sizes and variances.

Delta Method (Ratio Metrics)

Because click-through-rates or revenue ratios are calculated as:

$$ R = \frac{\sum_i X_i}{\sum_i Y_i} = \frac{\bar{X}}{\bar{Y}} $$

the variance of the ratio cannot be computed using standard methods because the denominator $Y$ is a random variable. We employ a first-order Taylor expansion (Delta method) to estimate variance:

$$ Var(R) \approx \frac{1}{\mu_Y^2} Var(X) + \frac{\mu_X^2}{\mu_Y^4} Var(Y) - 2\frac{\mu_X}{\mu_Y^3} Cov(X, Y) $$

CUPED (Controlled-experiments Using Pre-Experiment Data)

CUPED adjusts post-period metrics by subtracting the portion of variance explained by pre-period performance:

$$ Y_i^* = Y_i - \theta (X_i - \mu_{X, global}) $$

where $\theta = \frac{Cov(Y, X)}{Var(X)}$ is computed across the pooled data. The variance of the CUPED-adjusted metric is reduced by a factor of $1 - \rho^2$ (where $\rho$ is the correlation coefficient): $$ Var(Y^*) = Var(Y) (1 - \rho^2) $$ For ratio metrics, xpyrment applies CUPED adjustment separately to the numerator and denominator before applying the Delta method on adjusted vectors—a technique pioneered by Netflix and Uber.

Sample Ratio Mismatch (SRM) Goodness-of-Fit

A Pearson Chi-square test is calculated on the observed sample counts against the expected design weights to flag assignment bugs early:

$$ \chi^2 = \sum_i \frac{(O_i - E_i)^2}{E_i} $$

If the test p-value $< 0.001$, an SRMError is raised.

DerSimonian-Laird Random-Effects Meta-Analysis

To pool historical experiment estimates $\hat{\theta}_j$ with study variances $v_j$ across $k$ independent studies, the DerSimonian-Laird random-effects model accounts for between-study variance $\tau^2$:

$$ \tau^2 = \max\left(0, \ \frac{Q - (k - 1)}{\sum w_j - \frac{\sum w_j^2}{\sum w_j}}\right) $$

where $w_j = \frac{1}{v_j}$ are inverse-variance fixed weights, and $Q = \sum w_j (\hat{\theta}_j - \bar{\theta}_F)^2$ is Cochran's $Q$ heterogeneity statistic. Random weights $w_j^* = \frac{1}{v_j + \tau^2}$ are then applied to yield the pooled Random Effect estimate:

$$ \bar{\theta}_R = \frac{\sum w_j^* \hat{\theta}_j}{\sum w_j^*} $$

Simonsohn P-Curve Distribution Audits

To detect p-hacking, early peeking, or selective publication bias across independent experiments, the p-curve binomial test calculates the proportion of significant p-values ($p < 0.05$) lying in the low half ($p \le 0.025$):

  • True Evidential Power (Right-Skewed):

    $$ p_{right-skew} = 1 - F_{binom}(N_{low} - 1; N_{total}, 0.5) $$

  • Reporting Bias / Selective Stopping (Left-Skewed):

    $$ p_{left-skew} = F_{binom}(N_{low}; N_{total}, 0.5) $$


🛠️ Local Development & Testing

We use pytest for unit testing. To set up your local environment:

  1. Create a virtual environment and activate it:

    python -m venv .venv
    .venv\Scripts\activate  # On Windows
    source .venv/bin/activate  # On macOS/Linux
    
  2. Install the package in editable mode with development dependencies:

    pip install -e .[dev]
    
  3. Run the unit test suite:

    pytest
    

📄 License

Distributed under the AI Slop License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpyrment-1.1.2.5.tar.gz (255.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xpyrment-1.1.2.5-py3-none-any.whl (262.2 kB view details)

Uploaded Python 3

File details

Details for the file xpyrment-1.1.2.5.tar.gz.

File metadata

  • Download URL: xpyrment-1.1.2.5.tar.gz
  • Upload date:
  • Size: 255.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for xpyrment-1.1.2.5.tar.gz
Algorithm Hash digest
SHA256 2776c3cf67294e9deac314a73485904e68f882ae52977047edd4ca9f3efda1e6
MD5 d5ca96a717ba0a54ab26235b428b3011
BLAKE2b-256 13c7a42d9b1296e9c0e98ac2792dbccd0417b242fc53199a447e83abf34c1b98

See more details on using hashes here.

File details

Details for the file xpyrment-1.1.2.5-py3-none-any.whl.

File metadata

  • Download URL: xpyrment-1.1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 262.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for xpyrment-1.1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6971c5156fa4cacd567203f544b2325a7d30776c3b9766540d0c9cfe5c13c75d
MD5 1a59b3cc54d3d713801c16ec9882d1a7
BLAKE2b-256 a6d56b147a87f2dde235d61a11273b31971c610f97b7c90c77dd96c62bec5e0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page