Skip to main content

A comprehensive library for Bayesian and frequentist inference on grouped binomial proportions

Project description

Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data

A comprehensive Python library for estimating average success rates across multiple groups using Bayesian and frequentist methods.

Overview

When you have binomial data from multiple groups (e.g., success rates across experiments, conversion rates across user segments, test pass rates across scenarios), this library helps you:

  1. Estimate the average success rate across all groups
  2. Quantify uncertainty with credible/confidence intervals
  3. Account for heterogeneity between groups
  4. Compare different modeling approaches with automatic diagnostics
  5. Detect data quality issues and get recommendations

Supported Methods

Method Description When to Use
Hierarchical BayesRECOMMENDED Full Bayesian with importance sampling Need honest uncertainty, unusual data
Single-Theta Bayesian Pooled model (homogeneous groups) Groups believed identical
Clopper-Pearson Frequentist exact confidence intervals Baseline comparison
Empirical Bayes ⚠️ NOT RECOMMENDED Data-driven hyperparameter estimation ⚠️ Under-covers (17% when nominal is 95%), use HB instead

Installation

Recommended: Using uv (fast, modern)

First, install uv if you don't have it:

# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or via pip
pip install uv

For more installation options, see: https://docs.astral.sh/uv/getting-started/installation/

Then, set up the project:

# Clone the repository
git clone git@gitlab.com:movellan/proportions.git
cd proportions

# Install with uv (automatically creates .venv and installs all dependencies)
uv sync

# Verify installation by running tests
uv run pytest tests/ -v

# Run commands with uv (no activation needed!)
uv run python examples/01_basic_usage.py

# Or activate manually if you prefer
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate  # On Windows

Why uv?

  • No need to activate: uv run automatically uses the project venv
  • Fast installation (10-100x faster than pip)
  • Reproducible builds with uv.lock

Alternative: Using pip

# Clone the repository
cd proportions

# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate  # On macOS/Linux

# Install package
pip install -e ".[dev]"  # With development tools

# Verify installation by running tests
pytest tests/ -v

For Users (when published to PyPI)

pip install proportions

Quick Start

import numpy as np
from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes, single_theta_bayesian

# Your data: success counts (x) and trial counts (n) per group
x = np.array([8, 7, 9, 6, 8])  # successes
n = np.array([10, 10, 10, 10, 10])  # trials
data = BinomialData(x=x, n=n)

# Hierarchical Bayes (RECOMMENDED - accounts for all uncertainty)
hb_result = hierarchical_bayes(data, random_seed=42)
print(f"Average success rate: {hb_result.posterior.mu:.3f}")
print(f"95% CI: [{hb_result.posterior.ci_lower:.3f}, {hb_result.posterior.ci_upper:.3f}]")

# Single-Theta Bayesian (simpler, assumes homogeneity)
st_result = single_theta_bayesian(data)
print(f"Average success rate: {st_result.mu:.3f}")
print(f"95% CI: [{st_result.ci_lower:.3f}, {st_result.ci_upper:.3f}]")

Features

🎯 Core Capabilities

  • Multiple estimation methods with unified API
  • Automatic validation of input data (Pydantic models)
  • Numerically stable Beta distribution functions
  • Importance sampling for Hierarchical Bayes
  • Moment matching for posterior approximation

📊 Comprehensive Diagnostics

  • Model evidence (marginal likelihood) for method comparison
  • Bayes factors with interpretation (decisive/strong/moderate/weak)
  • Effective Sample Size (ESS) for importance sampling
  • Boundary detection for prior specification issues
  • Data quality checks (heterogeneity, sample sizes, extreme rates)
  • Variance decomposition (within-group vs between-group uncertainty)

📈 Visualization Tools

  • Data overview plots (scatter, histograms, heterogeneity)
  • Prior and posterior distributions
  • Method comparison plots
  • Importance sampling diagnostics
  • HTML reports with embedded plots

🔬 Based on Solid Theory

  • Beta-Binomial hierarchical models with conjugate priors
  • Importance sampling for intractable posteriors
  • Law of total variance for proper uncertainty propagation
  • Numerical stability via log-space computation

Example: Comparing Methods

from proportions.core.models import BinomialData
from proportions.inference import empirical_bayes, hierarchical_bayes
from proportions.diagnostics.evidence import ModelEvidence
import numpy as np

# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)

# Fit multiple methods
hb_result = hierarchical_bayes(data, random_seed=42)
st_result = single_theta_bayesian(data)

# Compare via model evidence (marginal likelihood)
from proportions.diagnostics.evidence import compute_single_theta_evidence
st_evidence = compute_single_theta_evidence(data, prior_type="uniform")

print("Model Evidence Comparison:")
print(f"Hierarchical Bayes: {hb_result.log_marginal_likelihood:.2f}")
print(f"Single-Theta: {st_evidence.log_evidence:.2f}")

# Calculate Bayes Factor
log_bf = hb_result.log_marginal_likelihood - st_evidence.log_evidence
bf = np.exp(log_bf)
print(f"\nBayes Factor (HB vs ST): {bf:.2e}")

Example: Custom Priors

from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes
import numpy as np

# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)

# Hierarchical Bayes with custom prior parameters
result = hierarchical_bayes(
    data,
    m_prior_alpha=2.0,    # Beta prior for m: E[m] = 2/(2+2) = 0.5
    m_prior_beta=2.0,     # More informative than uniform
    k_prior_min=0.1,      # Allow low concentration (high heterogeneity)
    k_prior_max=100.0,    # Moderate maximum concentration
    n_samples=10000,      # More samples for better approximation
    random_seed=42
)

# Check diagnostics
print(f"Posterior mean for m: {result.m_posterior_mean:.3f}")
print(f"Posterior mean for k: {result.k_posterior_mean:.3f}")
print(f"Effective Sample Size: {result.diagnostics.effective_sample_size:.1f}")
print(f"ESS Ratio: {result.diagnostics.ess_ratio:.3f}")

# Check for boundary issues
if result.diagnostics.k_at_upper_boundary:
    print("⚠️ Warning: k posterior near upper boundary, consider increasing k_prior_max")
if result.diagnostics.k_at_lower_boundary:
    print("⚠️ Warning: k posterior near lower boundary, consider decreasing k_prior_min")

Project Status

Current Version: 0.1.0 (Production-Ready)

✅ Completed

  • Core Pydantic data models with validation
  • Prior specification interface
  • Stable Beta distribution utilities
  • Hierarchical Bayes ⭐ - Importance sampling with full uncertainty (RECOMMENDED)
  • Single-Theta Bayesian - Pooled Bayesian estimation
  • Clopper-Pearson - Frequentist confidence intervals
  • Empirical Bayes ⚠️ - Grid search MLE (NOT RECOMMENDED - use Hierarchical Bayes instead)
  • Comprehensive diagnostics (ESS, evidence, coverage analysis)
  • Visualization tools (importance sampling, distributions)
  • 194 passing tests with extensive coverage

📅 Future Enhancements

  • Additional visualization options
  • HTML report generation
  • Interactive dashboards
  • Extended documentation and tutorials

Development

Running Tests

uv run pytest                      # All tests
uv run pytest -v --cov=proportions # With coverage

Code Quality

uv run ruff format .  # Format code
uv run ruff check .   # Lint
uv run mypy proportions/  # Type check

Design Principles

  1. Modularity - Separate concerns (models, priors, inference, diagnostics)
  2. Type Safety - Pydantic models throughout
  3. Diagnostics First - Always compute ESS, boundaries, evidence
  4. Numerical Stability - Log-space computation, stable algorithms
  5. User-Friendly - Simple API for common cases, power for experts

Documentation

  • prompts/SESSION_STATE.md - Current development status and recent changes
  • prompts/LIBRARY_DESIGN_PLAN.md - Complete architecture and design
  • prompts/HIERARCHICAL_BAYES_SUMMARY.md - Mathematical foundations and algorithms
  • examples/ - Jupyter notebooks demonstrating all methods and comparisons

References

This library implements methods based on:

  • Beta-Binomial hierarchical models - Conjugate Bayesian inference
  • Hierarchical BayesRECOMMENDED - Importance sampling for posterior inference under hyperparameter uncertainty
  • Single-Theta Bayesian - Pooled Bayesian estimation assuming homogeneity
  • Empirical Bayes ⚠️ NOT RECOMMENDED - MLE of hyperparameters (under-covers, use HB instead)
  • Theory - Law of total variance, model evidence (marginal likelihood), Bayes factors

License

MIT License

Contact

Author: Javier Movellan Email: jmovellan@apple.com Repository: https://gitlab.com/movellan/proportions

Citation

If you use this library in your research, please cite:

@software{proportions2025,
  author = {Movellan, Javier},
  title = {Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data},
  year = {2025},
  url = {https://gitlab.com/movellan/proportions}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proportions-0.1.0.tar.gz (58.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

proportions-0.1.0-py3-none-any.whl (50.4 kB view details)

Uploaded Python 3

File details

Details for the file proportions-0.1.0.tar.gz.

File metadata

  • Download URL: proportions-0.1.0.tar.gz
  • Upload date:
  • Size: 58.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for proportions-0.1.0.tar.gz
Algorithm Hash digest
SHA256 65cf1cd5e14398acd879dfb644c0a714cd83872d911b3d88cab82e8f94edfdd3
MD5 ed5d0a04847929b4b8a966f6080e0907
BLAKE2b-256 704c05ddb2c495e1c59e2eb20d74703f0be022cfa651dc805422733aebf4e0b7

See more details on using hashes here.

File details

Details for the file proportions-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: proportions-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for proportions-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c59d25669e65edcd80a5634e0895d07daeda463afffbf138daafc927ece38d13
MD5 5de8eaa72761c291b85f0b3583d72aad
BLAKE2b-256 3082be21e968198115bd5e23dd27cf817ab70fc4fd7988a9c22f21f9ef2eb81c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page