Skip to main content

Set of utility functions for analyzing experimental and observational data

Project description

ci PyPI version

Experiment utils

Generic functions for experiment analysis and design:

Installation

PyPI

pip install experiment-utils-pd

From GitHub

pip install git+https://github.com/sdaza/experiment-utils-pd.git

How to use it

Experiment Analyzer

Suppose you have a DataFrame df with columns for experiment group, treatment assignment, outcomes, and covariates.

import pandas as pd
from experiment_utils import ExperimentAnalyzer

# Example data
df = pd.DataFrame({
    "experiment_id": [1, 1, 1, 2, 2, 2],
    "user_id": [101, 102, 103, 201, 202, 203],
    "treatment": [0, 1, 0, 1, 0, 1],
    "age": [25, 34, 29, 40, 22, 31],
    "gender": [1, 0, 1, 0, 1, 0],
    "outcome1": [0, 1, 0, 1, 0, 1],
    "outcome2": [5.2, 6.1, 5.8, 7.0, 5.5, 6.8],
})

covariates = ["age", "gender"]

# Initialize analyzer with balance adjustment
analyzer = ExperimentAnalyzer(
    df,
    treatment_col="treatment",
    outcomes=["outcome1", "outcome2"],
    covariates=covariates,
    experiment_identifier=["experiment_id"],
    unit_identifier=["user_id"], # Optional: To retrieve balance weights
    adjustment="balance",  # Options: 'balance', 'IV', or None
    balance_method="ps-logistic",  # Options: 'ps-logistic', 'ps-xgboost', 'entropy'
    target_effect="ATE"  # Options: 'ATT', 'ATE', 'ATC'
)

# Estimate effects
analyzer.get_effects()
print(analyzer.results)

Parameters:

  • adjustment: Choose 'balance' for covariate balancing (using balance_method), 'IV' for instrumental variable adjustment, or None for unadjusted analysis.
  • balance_method: Selects the method for balancing: 'ps-logistic' (logistic regression), 'ps-xgboost' (XGBoost), or 'entropy' (entropy balancing).
  • target_effect: Specifies the estimand: 'ATT', 'ATE', or 'ATC'.

Retrieve IPW Weights

To inspect the weights and selected sample after balancing:

# Get the DataFrame with weights and experiment identifiers
weights_df = analyzer.weights
print(weights_df.head())

Non-inferiority test

Test for non-inferiority after estimating effects:

# Test non-inferiority with a 10% margin
analyzer.test_non_inferiority(relative_margin=0.10)
print(analyzer.results[["outcome1", "non_inferiority_margin", "ci_lower_bound", "is_non_inferior"]])

Multiple comparison adjustment

Adjust p-values for multiple outcomes per experiment:

# Bonferroni adjustment
analyzer.adjust_pvalues(method="bonferroni")
print(analyzer.results[["outcome1", "pvalue", "pvalue_adj", "stat_significance_adj"]])

# Or use FDR (Benjamini-Hochberg)
analyzer.adjust_pvalues(method="fdr_bh")
print(analyzer.results[["outcome1", "pvalue", "pvalue_adj", "stat_significance_adj"]])

Power Analysis

from experiment_utils import PowerSim
p = PowerSim(metric='proportion', relative_effect=False,
  variants=1, nsim=1000, alpha=0.05, alternative='two-tailed')

p.get_power(baseline=[0.33], effect=[0.03], sample_size=[3000])

Utilities

Balanced Random Assignment

You can use the balanced_random_assignment utility to assign units to experimental groups with forced balance. Optionally stratify by covariates to ensure balance within strata.

from experiment_utils.utils import balanced_random_assignment
import pandas as pd

# Example DataFrame
users = pd.DataFrame({
    "user_id": range(100),
    "age_group": ["young", "old"] * 50,
    "gender": ["M", "F"] * 50
})

# Binary assignment (test/control, 50/50) without stratification
users["assignment"] = balanced_random_assignment(users, allocation_ratio=0.5)
print(users)

# Binary assignment with stratification by age_group and gender
users["assignment_stratified"] = balanced_random_assignment(
    users, 
    allocation_ratio=0.5, 
    balance_covariates=["age_group", "gender"]
)
print(users)

# Multiple variants with equal allocation
users["assignment_multi"] = balanced_random_assignment(
    users, 
    variants=["control", "A", "B"]
)
print(users)

# Multiple variants with custom allocation and stratification
users["assignment_custom"] = balanced_random_assignment(
    users,
    variants=["control", "A", "B"],
    allocation_ratio={"control": 0.5, "A": 0.3, "B": 0.2},
    balance_covariates=["age_group"]
)
print(users)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

experiment_utils_pd-0.1.9.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

experiment_utils_pd-0.1.9-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file experiment_utils_pd-0.1.9.tar.gz.

File metadata

  • Download URL: experiment_utils_pd-0.1.9.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for experiment_utils_pd-0.1.9.tar.gz
Algorithm Hash digest
SHA256 7b72edb482733e9ce893765a6d08f35f23584b51259e1d1c970f8628a978de20
MD5 5e3c83ab8305e12e260cd7eb8a5b47a0
BLAKE2b-256 39d4399012436329a5c98d71da7027f886d39a9b3c832b985010dbc1eec5beaa

See more details on using hashes here.

File details

Details for the file experiment_utils_pd-0.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for experiment_utils_pd-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 200b392594f4b0211506f571d937e34103366deb345ed492a775652c037b758e
MD5 1ad19deb0cdeb06ae2241482fcf57b72
BLAKE2b-256 b0a689716a29b693a626242f3dc64c7c1a4c4b45005bda3d700b2c4a33c14579

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page