Skip to main content

Set of utility functions for analyzing experimental and observational data

Project description

ci PyPI version

Experiment utils

Generic functions for experiment analysis and design:

Installation

PyPI

pip install experiment-utils-pd

From GitHub

pip install git+https://github.com/sdaza/experiment-utils-pd.git

How to use it

Experiment Analyzer

Suppose you have a DataFrame df with columns for experiment group, treatment assignment, outcomes, and covariates.

import pandas as pd
from experiment_utils import ExperimentAnalyzer

# Example data
df = pd.DataFrame({
    "experiment_id": [1, 1, 1, 2, 2, 2],
    "user_id": [101, 102, 103, 201, 202, 203],
    "treatment": [0, 1, 0, 1, 0, 1],
    "age": [25, 34, 29, 40, 22, 31],
    "gender": [1, 0, 1, 0, 1, 0],
    "outcome1": [0, 1, 0, 1, 0, 1],
    "outcome2": [5.2, 6.1, 5.8, 7.0, 5.5, 6.8],
})

covariates = ["age", "gender"]

# Initialize analyzer with balance adjustment
analyzer = ExperimentAnalyzer(
    df,
    treatment_col="treatment",
    outcomes=["outcome1", "outcome2"],
    covariates=covariates,
    experiment_identifier=["experiment_id"],
    unit_identifier=["user_id"], # Optional: To retrieve balance weights
    adjustment="balance",  # Options: 'balance', 'IV', or None
    balance_method="ps-logistic",  # Options: 'ps-logistic', 'ps-xgboost', 'entropy'
    target_effect="ATE",  # Options: 'ATT', 'ATE', 'ATC'
    bootstrap=True,  # Enable bootstrap inference (default: False)
    bootstrap_iterations=1000,  # Number of bootstrap iterations (default: 1000)
    bootstrap_seed=42  # Seed for reproducibility (optional)
)

# Estimate effects
analyzer.get_effects()
print(analyzer.results)

Parameters:

  • adjustment: Choose 'balance' for covariate balancing (using balance_method), 'IV' for instrumental variable adjustment, or None for unadjusted analysis.
  • balance_method: Selects the method for balancing: 'ps-logistic' (logistic regression), 'ps-xgboost' (XGBoost), or 'entropy' (entropy balancing).
  • target_effect: Specifies the estimand: 'ATT', 'ATE', or 'ATC'.
  • bootstrap: Enable bootstrap inference for p-values and confidence intervals (default: False for asymptotic inference).
  • bootstrap_iterations: Number of bootstrap resampling iterations (default: 1000).
  • bootstrap_ci_method: Method for computing bootstrap CIs: 'percentile' or 'basic' (default: 'percentile').
  • bootstrap_stratify: Whether to use stratified resampling by treatment group (default: True).
  • bootstrap_seed: Random seed for reproducible bootstrap results (optional).

Retrieve IPW Weights

To inspect the weights and selected sample after balancing:

# Get the DataFrame with weights and experiment identifiers
weights_df = analyzer.weights
print(weights_df.head())

Non-inferiority test

Test for non-inferiority after estimating effects:

# Test non-inferiority with a 10% margin
analyzer.test_non_inferiority(relative_margin=0.10)
print(analyzer.results[["outcome1", "non_inferiority_margin", "ci_lower_bound", "is_non_inferior"]])

Multiple comparison adjustment

Adjust p-values for multiple outcomes per experiment:

# Bonferroni adjustment
analyzer.adjust_pvalues(method="bonferroni")
print(analyzer.results[["outcome1", "pvalue", "pvalue_adj", "stat_significance_adj"]])

# Or use FDR (Benjamini-Hochberg)
analyzer.adjust_pvalues(method="fdr_bh")
print(analyzer.results[["outcome1", "pvalue", "pvalue_adj", "stat_significance_adj"]])

Power Analysis

from experiment_utils import PowerSim
p = PowerSim(metric='proportion', relative_effect=False,
  variants=1, nsim=1000, alpha=0.05, alternative='two-tailed')

p.get_power(baseline=[0.33], effect=[0.03], sample_size=[3000])

Utilities

Balanced Random Assignment

You can use the balanced_random_assignment utility to assign units to experimental groups with forced balance. Optionally stratify by covariates to ensure balance within strata.

from experiment_utils.utils import balanced_random_assignment
import pandas as pd

# Example DataFrame
users = pd.DataFrame({
    "user_id": range(100),
    "age_group": ["young", "old"] * 50,
    "gender": ["M", "F"] * 50
})

# Binary assignment (test/control, 50/50) without stratification
users["assignment"] = balanced_random_assignment(users, allocation_ratio=0.5)
print(users)

# Binary assignment with stratification by age_group and gender
users["assignment_stratified"] = balanced_random_assignment(
    users, 
    allocation_ratio=0.5, 
    balance_covariates=["age_group", "gender"]
)
print(users)

# Multiple variants with equal allocation
users["assignment_multi"] = balanced_random_assignment(
    users, 
    variants=["control", "A", "B"]
)
print(users)

# Multiple variants with custom allocation and stratification
users["assignment_custom"] = balanced_random_assignment(
    users,
    variants=["control", "A", "B"],
    allocation_ratio={"control": 0.5, "A": 0.3, "B": 0.2},
    balance_covariates=["age_group"]
)
print(users)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

experiment_utils_pd-0.1.10.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

experiment_utils_pd-0.1.10-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file experiment_utils_pd-0.1.10.tar.gz.

File metadata

  • Download URL: experiment_utils_pd-0.1.10.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for experiment_utils_pd-0.1.10.tar.gz
Algorithm Hash digest
SHA256 88d54a02019b2e88919b5ccf1e7fe9fb8e8ac9b09cccb4f2ad57c42b2d352dfa
MD5 3ee01f3d3c26d2b8ca3edc5375b470d8
BLAKE2b-256 f2c2a213fa2cceca2837c43d338c541748e8cad96d36e8cfb75894b910ae45f4

See more details on using hashes here.

File details

Details for the file experiment_utils_pd-0.1.10-py3-none-any.whl.

File metadata

File hashes

Hashes for experiment_utils_pd-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 8c0cd6a474c92493977ba3f9bbdc8b031c7da29330d6e67a4d5bf594ce15011d
MD5 a250e077050e37e4801a3373632adddb
BLAKE2b-256 b45893a5421b9e670126fe1135b4b5c1c529d0515ff9aab3e63ff66b32be86da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page