Skip to main content

A comprehensive library for Bayesian and frequentist inference on grouped binomial proportions

Project description

Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data

PyPI version Python 3.10+

A comprehensive Python library for estimating average success rates across multiple groups using Bayesian and frequentist methods.

Overview

When you have binomial data from multiple groups (e.g., success rates across experiments, conversion rates across user segments, test pass rates across scenarios), this library helps you:

  1. Estimate the average success rate across all groups
  2. Quantify uncertainty with credible/confidence intervals
  3. Account for heterogeneity between groups
  4. Compare different modeling approaches with automatic diagnostics
  5. Detect data quality issues and get recommendations

Supported Methods

Method Description When to Use
Hierarchical BayesRECOMMENDED Full Bayesian with importance sampling Need honest uncertainty, unusual data
Single-Theta Bayesian Pooled model (homogeneous groups) Groups believed identical
Clopper-Pearson Frequentist exact confidence intervals Baseline comparison
Empirical Bayes ⚠️ NOT RECOMMENDED Data-driven hyperparameter estimation ⚠️ Under-covers (17% when nominal is 95%), use HB instead

Installation

Recommended: Using uv (fast, modern)

First, install uv if you don't have it:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip
pip install uv

For more installation options, see: https://docs.astral.sh/uv/getting-started/installation/

Then, set up the project:

# Clone the repository
git clone git@gitlab.com:movellan/proportions.git
cd proportions

# Install with uv (automatically creates .venv and installs all dependencies)
uv sync

# Verify installation by running tests
uv run pytest tests/ -v

# Run commands with uv (no activation needed!)
uv run python examples/01_basic_usage.py

# Or activate manually if you prefer
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate  # On Windows

Why uv?

  • No need to activate: uv run automatically uses the project venv
  • Fast installation (10-100x faster than pip)
  • Reproducible builds with uv.lock

Alternative: Using pip

# Clone the repository
cd proportions

# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate  # On macOS/Linux

# Install package
pip install -e ".[dev]"  # With development tools

# Verify installation by running tests
pytest tests/ -v

From PyPI (Recommended for Users)

pip install proportions

Quick Start

import numpy as np
from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes, single_theta_bayesian

# Your data: success counts (x) and trial counts (n) per group
x = np.array([8, 7, 9, 6, 8])  # successes
n = np.array([10, 10, 10, 10, 10])  # trials
data = BinomialData(x=x, n=n)

# Hierarchical Bayes (RECOMMENDED - accounts for all uncertainty)
hb_result = hierarchical_bayes(data, random_seed=42)
print(f"Average success rate: {hb_result.posterior.mu:.3f}")
print(f"95% CI: [{hb_result.posterior.ci_lower:.3f}, {hb_result.posterior.ci_upper:.3f}]")

# Single-Theta Bayesian (simpler, assumes homogeneity)
st_result = single_theta_bayesian(data, alpha_prior=1.0, beta_prior=1.0)
print(f"Average success rate: {st_result.posterior.mu:.3f}")
print(f"95% CI: [{st_result.posterior.ci_lower:.3f}, {st_result.posterior.ci_upper:.3f}]")

Features

🎯 Core Capabilities

  • Multiple estimation methods with unified API
  • Automatic validation of input data (Pydantic models)
  • Numerically stable Beta distribution functions
  • Importance sampling for Hierarchical Bayes
  • Moment matching for posterior approximation

📊 Comprehensive Diagnostics

  • Model evidence (marginal likelihood) for method comparison
  • Bayes factors with interpretation (decisive/strong/moderate/weak)
  • Effective Sample Size (ESS) for importance sampling
  • Boundary detection for prior specification issues
  • Data quality checks (heterogeneity, sample sizes, extreme rates)
  • Variance decomposition (within-group vs between-group uncertainty)

📈 Visualization Tools

  • Data overview plots (scatter, histograms, heterogeneity)
  • Prior and posterior distributions
  • Method comparison plots
  • Importance sampling diagnostics
  • HTML reports with embedded plots

🔬 Based on Solid Theory

  • Beta-Binomial hierarchical models with conjugate priors
  • Importance sampling for intractable posteriors
  • Law of total variance for proper uncertainty propagation
  • Numerical stability via log-space computation

Example: Comparing Methods

from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes, single_theta_bayesian
import numpy as np

# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)

# Fit multiple methods
hb_result = hierarchical_bayes(data, random_seed=42)
st_result = single_theta_bayesian(data, alpha_prior=1.0, beta_prior=1.0)

# Compare via model evidence (marginal likelihood)
# Evidence is automatically computed for both methods!
print("Model Evidence Comparison:")
print(f"Hierarchical Bayes: {hb_result.log_marginal_likelihood:.2f}")
print(f"Single-Theta: {st_result.log_marginal_likelihood:.2f}")

# Calculate Bayes Factor
log_bf = hb_result.log_marginal_likelihood - st_result.log_marginal_likelihood
bf = np.exp(log_bf)
print(f"\nBayes Factor (HB vs ST): {bf:.2e}")

Example: Custom Priors

from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes
import numpy as np

# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)

# Hierarchical Bayes with custom prior parameters
result = hierarchical_bayes(
    data,
    m_prior_alpha=2.0,    # Beta prior for m: E[m] = 2/(2+2) = 0.5
    m_prior_beta=2.0,     # More informative than uniform
    k_prior_min=0.1,      # Allow low concentration (high heterogeneity)
    k_prior_max=100.0,    # Moderate maximum concentration
    n_samples=10000,      # More samples for better approximation
    random_seed=42
)

# Check diagnostics
print(f"Posterior mean for m: {result.m_posterior_mean:.3f}")
print(f"Posterior mean for k: {result.k_posterior_mean:.3f}")
print(f"Effective Sample Size: {result.diagnostics.effective_sample_size:.1f}")
print(f"ESS Ratio: {result.diagnostics.ess_ratio:.3f}")

# Check for boundary issues
if result.diagnostics.k_at_upper_boundary:
    print("⚠️ Warning: k posterior near upper boundary, consider increasing k_prior_max")
if result.diagnostics.k_at_lower_boundary:
    print("⚠️ Warning: k posterior near lower boundary, consider decreasing k_prior_min")

Project Status

Current Version: 0.1.0 (Production-Ready)

✅ Completed

  • Core Pydantic data models with validation
  • Prior specification interface
  • Stable Beta distribution utilities
  • Hierarchical Bayes ⭐ - Importance sampling with full uncertainty (RECOMMENDED)
  • Single-Theta Bayesian - Pooled Bayesian estimation
  • Clopper-Pearson - Frequentist confidence intervals
  • Empirical Bayes ⚠️ - Grid search MLE (NOT RECOMMENDED - use Hierarchical Bayes instead)
  • Comprehensive diagnostics (ESS, evidence, coverage analysis)
  • Visualization tools (importance sampling, distributions)
  • 194 passing tests with extensive coverage

📅 Future Enhancements

  • Additional visualization options
  • HTML report generation
  • Interactive dashboards
  • Extended documentation and tutorials

Development

Running Tests

uv run pytest                      # All tests
uv run pytest -v --cov=proportions # With coverage

Code Quality

uv run ruff format .  # Format code
uv run ruff check .   # Lint
uv run mypy proportions/  # Type check

Design Principles

  1. Modularity - Separate concerns (models, priors, inference, diagnostics)
  2. Type Safety - Pydantic models throughout
  3. Diagnostics First - Always compute ESS, boundaries, evidence
  4. Numerical Stability - Log-space computation, stable algorithms
  5. User-Friendly - Simple API for common cases, power for experts

Documentation

  • prompts/SESSION_STATE.md - Current development status and recent changes
  • prompts/LIBRARY_DESIGN_PLAN.md - Complete architecture and design
  • prompts/HIERARCHICAL_BAYES_SUMMARY.md - Mathematical foundations and algorithms
  • examples/ - Jupyter notebooks demonstrating all methods and comparisons

References

This library implements methods based on:

  • Beta-Binomial hierarchical models - Conjugate Bayesian inference
  • Hierarchical BayesRECOMMENDED - Importance sampling for posterior inference under hyperparameter uncertainty
  • Single-Theta Bayesian - Pooled Bayesian estimation assuming homogeneity
  • Empirical Bayes ⚠️ NOT RECOMMENDED - MLE of hyperparameters (under-covers, use HB instead)
  • Theory - Law of total variance, model evidence (marginal likelihood), Bayes factors

License

MIT License

Contact

Author: Javier Movellan Email: jmovellan@apple.com Repository: https://gitlab.com/movellan/proportions

Citation

If you use this library in your research, please cite:

@software{proportions2025,
  author = {Movellan, Javier},
  title = {Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data},
  year = {2025},
  url = {https://gitlab.com/movellan/proportions}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proportions-0.1.2.tar.gz (61.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

proportions-0.1.2-py3-none-any.whl (50.8 kB view details)

Uploaded Python 3

File details

Details for the file proportions-0.1.2.tar.gz.

File metadata

  • Download URL: proportions-0.1.2.tar.gz
  • Upload date:
  • Size: 61.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for proportions-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5aeb95e4d6292a87644f7b8e69f37b1e334f5bd3bc7dc0a232112651a812ee93
MD5 bc4d8c080042bb37898261c107a3ff22
BLAKE2b-256 d691409f761a9e16f47d8ebb6bf14811d72b4267abc74cfdd8b50bdba874cec7

See more details on using hashes here.

File details

Details for the file proportions-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: proportions-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 50.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for proportions-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3d99e3f1d7d0cd6016a702d5c5c7328025ed1f13c9c4c06811a0df702b0bf4c7
MD5 03a47c9a43f0b2b731b90bde07912e07
BLAKE2b-256 301e1a47e2327875d642fe9f55ab5eff19809781bd0a4889580f2272b259f727

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page