Skip to main content

A comprehensive library for Bayesian and frequentist inference on grouped binomial proportions

Project description

Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data

A comprehensive Python library for estimating average success rates across multiple groups using Bayesian and frequentist methods.

Overview

When you have binomial data from multiple groups (e.g., success rates across experiments, conversion rates across user segments, test pass rates across scenarios), this library helps you:

  1. Estimate the average success rate across all groups
  2. Quantify uncertainty with credible/confidence intervals
  3. Account for heterogeneity between groups
  4. Compare different modeling approaches with automatic diagnostics
  5. Detect data quality issues and get recommendations

Supported Methods

Method Description When to Use
Hierarchical BayesRECOMMENDED Full Bayesian with importance sampling Need honest uncertainty, unusual data
Single-Theta Bayesian Pooled model (homogeneous groups) Groups believed identical
Clopper-Pearson Frequentist exact confidence intervals Baseline comparison
Empirical Bayes ⚠️ NOT RECOMMENDED Data-driven hyperparameter estimation ⚠️ Under-covers (17% when nominal is 95%), use HB instead

Installation

Recommended: Using uv (fast, modern)

First, install uv if you don't have it:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip
pip install uv

For more installation options, see: https://docs.astral.sh/uv/getting-started/installation/

Then, set up the project:

# Clone the repository
git clone git@gitlab.com:movellan/proportions.git
cd proportions

# Install with uv (automatically creates .venv and installs all dependencies)
uv sync

# Verify installation by running tests
uv run pytest tests/ -v

# Run commands with uv (no activation needed!)
uv run python examples/01_basic_usage.py

# Or activate manually if you prefer
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate  # On Windows

Why uv?

  • No need to activate: uv run automatically uses the project venv
  • Fast installation (10-100x faster than pip)
  • Reproducible builds with uv.lock

Alternative: Using pip

# Clone the repository
cd proportions

# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate  # On macOS/Linux

# Install package
pip install -e ".[dev]"  # With development tools

# Verify installation by running tests
pytest tests/ -v

For Users (when published to PyPI)

pip install proportions

Quick Start

import numpy as np
from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes, single_theta_bayesian

# Your data: success counts (x) and trial counts (n) per group
x = np.array([8, 7, 9, 6, 8])  # successes
n = np.array([10, 10, 10, 10, 10])  # trials
data = BinomialData(x=x, n=n)

# Hierarchical Bayes (RECOMMENDED - accounts for all uncertainty)
hb_result = hierarchical_bayes(data, random_seed=42)
print(f"Average success rate: {hb_result.posterior.mu:.3f}")
print(f"95% CI: [{hb_result.posterior.ci_lower:.3f}, {hb_result.posterior.ci_upper:.3f}]")

# Single-Theta Bayesian (simpler, assumes homogeneity)
st_result = single_theta_bayesian(data)
print(f"Average success rate: {st_result.mu:.3f}")
print(f"95% CI: [{st_result.ci_lower:.3f}, {st_result.ci_upper:.3f}]")

Features

🎯 Core Capabilities

  • Multiple estimation methods with unified API
  • Automatic validation of input data (Pydantic models)
  • Numerically stable Beta distribution functions
  • Importance sampling for Hierarchical Bayes
  • Moment matching for posterior approximation

📊 Comprehensive Diagnostics

  • Model evidence (marginal likelihood) for method comparison
  • Bayes factors with interpretation (decisive/strong/moderate/weak)
  • Effective Sample Size (ESS) for importance sampling
  • Boundary detection for prior specification issues
  • Data quality checks (heterogeneity, sample sizes, extreme rates)
  • Variance decomposition (within-group vs between-group uncertainty)

📈 Visualization Tools

  • Data overview plots (scatter, histograms, heterogeneity)
  • Prior and posterior distributions
  • Method comparison plots
  • Importance sampling diagnostics
  • HTML reports with embedded plots

🔬 Based on Solid Theory

  • Beta-Binomial hierarchical models with conjugate priors
  • Importance sampling for intractable posteriors
  • Law of total variance for proper uncertainty propagation
  • Numerical stability via log-space computation

Example: Comparing Methods

from proportions.core.models import BinomialData
from proportions.inference import empirical_bayes, hierarchical_bayes
from proportions.diagnostics.evidence import ModelEvidence
import numpy as np

# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)

# Fit multiple methods
hb_result = hierarchical_bayes(data, random_seed=42)
st_result = single_theta_bayesian(data)

# Compare via model evidence (marginal likelihood)
from proportions.diagnostics.evidence import compute_single_theta_evidence
st_evidence = compute_single_theta_evidence(data, prior_type="uniform")

print("Model Evidence Comparison:")
print(f"Hierarchical Bayes: {hb_result.log_marginal_likelihood:.2f}")
print(f"Single-Theta: {st_evidence.log_evidence:.2f}")

# Calculate Bayes Factor
log_bf = hb_result.log_marginal_likelihood - st_evidence.log_evidence
bf = np.exp(log_bf)
print(f"\nBayes Factor (HB vs ST): {bf:.2e}")

Example: Custom Priors

from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes
import numpy as np

# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)

# Hierarchical Bayes with custom prior parameters
result = hierarchical_bayes(
    data,
    m_prior_alpha=2.0,    # Beta prior for m: E[m] = 2/(2+2) = 0.5
    m_prior_beta=2.0,     # More informative than uniform
    k_prior_min=0.1,      # Allow low concentration (high heterogeneity)
    k_prior_max=100.0,    # Moderate maximum concentration
    n_samples=10000,      # More samples for better approximation
    random_seed=42
)

# Check diagnostics
print(f"Posterior mean for m: {result.m_posterior_mean:.3f}")
print(f"Posterior mean for k: {result.k_posterior_mean:.3f}")
print(f"Effective Sample Size: {result.diagnostics.effective_sample_size:.1f}")
print(f"ESS Ratio: {result.diagnostics.ess_ratio:.3f}")

# Check for boundary issues
if result.diagnostics.k_at_upper_boundary:
    print("⚠️ Warning: k posterior near upper boundary, consider increasing k_prior_max")
if result.diagnostics.k_at_lower_boundary:
    print("⚠️ Warning: k posterior near lower boundary, consider decreasing k_prior_min")

Project Status

Current Version: 0.1.0 (Production-Ready)

✅ Completed

  • Core Pydantic data models with validation
  • Prior specification interface
  • Stable Beta distribution utilities
  • Hierarchical Bayes ⭐ - Importance sampling with full uncertainty (RECOMMENDED)
  • Single-Theta Bayesian - Pooled Bayesian estimation
  • Clopper-Pearson - Frequentist confidence intervals
  • Empirical Bayes ⚠️ - Grid search MLE (NOT RECOMMENDED - use Hierarchical Bayes instead)
  • Comprehensive diagnostics (ESS, evidence, coverage analysis)
  • Visualization tools (importance sampling, distributions)
  • 194 passing tests with extensive coverage

📅 Future Enhancements

  • Additional visualization options
  • HTML report generation
  • Interactive dashboards
  • Extended documentation and tutorials

Development

Running Tests

uv run pytest                      # All tests
uv run pytest -v --cov=proportions # With coverage

Code Quality

uv run ruff format .  # Format code
uv run ruff check .   # Lint
uv run mypy proportions/  # Type check

Design Principles

  1. Modularity - Separate concerns (models, priors, inference, diagnostics)
  2. Type Safety - Pydantic models throughout
  3. Diagnostics First - Always compute ESS, boundaries, evidence
  4. Numerical Stability - Log-space computation, stable algorithms
  5. User-Friendly - Simple API for common cases, power for experts

Documentation

  • prompts/SESSION_STATE.md - Current development status and recent changes
  • prompts/LIBRARY_DESIGN_PLAN.md - Complete architecture and design
  • prompts/HIERARCHICAL_BAYES_SUMMARY.md - Mathematical foundations and algorithms
  • examples/ - Jupyter notebooks demonstrating all methods and comparisons

References

This library implements methods based on:

  • Beta-Binomial hierarchical models - Conjugate Bayesian inference
  • Hierarchical BayesRECOMMENDED - Importance sampling for posterior inference under hyperparameter uncertainty
  • Single-Theta Bayesian - Pooled Bayesian estimation assuming homogeneity
  • Empirical Bayes ⚠️ NOT RECOMMENDED - MLE of hyperparameters (under-covers, use HB instead)
  • Theory - Law of total variance, model evidence (marginal likelihood), Bayes factors

License

MIT License

Contact

Author: Javier Movellan Email: jmovellan@apple.com Repository: https://gitlab.com/movellan/proportions

Citation

If you use this library in your research, please cite:

@software{proportions2025,
  author = {Movellan, Javier},
  title = {Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data},
  year = {2025},
  url = {https://gitlab.com/movellan/proportions}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proportions-0.1.1.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

proportions-0.1.1-py3-none-any.whl (50.7 kB view details)

Uploaded Python 3

File details

Details for the file proportions-0.1.1.tar.gz.

File metadata

  • Download URL: proportions-0.1.1.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for proportions-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d494d1365bc6da4bd6d91da1aecfc26fc9aa015d464b8c3b03adbdb2ba3982d5
MD5 407f2181a359bf2ba67e38a04c8abff0
BLAKE2b-256 a321613f1696b222dd7302c14512fd00598d0196bff2f178526d5265a1c4b709

See more details on using hashes here.

File details

Details for the file proportions-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: proportions-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 50.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for proportions-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bcd2f128a9939b2fa077b500d53c43a6c305609f74669cc4ac8d8b323ca1f2c5
MD5 2fb0c769f4fd54d980b1ddd769f66f5b
BLAKE2b-256 4e864df994a8673d82c1b9d844f330dbd9711828ebb046721b1886fd0e96f960

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page