A comprehensive library for Bayesian and frequentist inference on grouped binomial proportions
Project description
Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data
A comprehensive Python library for estimating average success rates across multiple groups using Bayesian and frequentist methods.
Overview
When you have binomial data from multiple groups (e.g., success rates across experiments, conversion rates across user segments, test pass rates across scenarios), this library helps you:
- Estimate the average success rate across all groups
- Quantify uncertainty with credible/confidence intervals
- Account for heterogeneity between groups
- Compare different modeling approaches with automatic diagnostics
- Detect data quality issues and get recommendations
Supported Methods
| Method | Description | When to Use |
|---|---|---|
| Hierarchical Bayes ⭐ RECOMMENDED | Full Bayesian with importance sampling | Need honest uncertainty, unusual data |
| Single-Theta Bayesian | Pooled model (homogeneous groups) | Groups believed identical |
| Clopper-Pearson | Frequentist exact confidence intervals | Baseline comparison |
| Empirical Bayes ⚠️ NOT RECOMMENDED | Data-driven hyperparameter estimation | ⚠️ Under-covers (17% when nominal is 95%), use HB instead |
Installation
Recommended: Using uv (fast, modern)
First, install uv if you don't have it:
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or via pip
pip install uv
For more installation options, see: https://docs.astral.sh/uv/getting-started/installation/
Then, set up the project:
# Clone the repository
git clone git@gitlab.com:movellan/proportions.git
cd proportions
# Install with uv (automatically creates .venv and installs all dependencies)
uv sync
# Verify installation by running tests
uv run pytest tests/ -v
# Run commands with uv (no activation needed!)
uv run python examples/01_basic_usage.py
# Or activate manually if you prefer
source .venv/bin/activate # On macOS/Linux
# or
.venv\Scripts\activate # On Windows
Why uv?
- No need to activate:
uv runautomatically uses the project venv - Fast installation (10-100x faster than pip)
- Reproducible builds with uv.lock
Alternative: Using pip
# Clone the repository
cd proportions
# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate # On macOS/Linux
# Install package
pip install -e ".[dev]" # With development tools
# Verify installation by running tests
pytest tests/ -v
From PyPI (Recommended for Users)
pip install proportions
Quick Start
import numpy as np
from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes, single_theta_bayesian
# Your data: success counts (x) and trial counts (n) per group
x = np.array([8, 7, 9, 6, 8]) # successes
n = np.array([10, 10, 10, 10, 10]) # trials
data = BinomialData(x=x, n=n)
# Hierarchical Bayes (RECOMMENDED - accounts for all uncertainty)
hb_result = hierarchical_bayes(data, random_seed=42)
print(f"Average success rate: {hb_result.posterior.mu:.3f}")
print(f"95% CI: [{hb_result.posterior.ci_lower:.3f}, {hb_result.posterior.ci_upper:.3f}]")
# Single-Theta Bayesian (simpler, assumes homogeneity)
st_result = single_theta_bayesian(data, alpha_prior=1.0, beta_prior=1.0)
print(f"Average success rate: {st_result.posterior.mu:.3f}")
print(f"95% CI: [{st_result.posterior.ci_lower:.3f}, {st_result.posterior.ci_upper:.3f}]")
Features
🎯 Core Capabilities
- Multiple estimation methods with unified API
- Automatic validation of input data (Pydantic models)
- Numerically stable Beta distribution functions
- Importance sampling for Hierarchical Bayes
- Moment matching for posterior approximation
📊 Comprehensive Diagnostics
- Model evidence (marginal likelihood) for method comparison
- Bayes factors with interpretation (decisive/strong/moderate/weak)
- Effective Sample Size (ESS) for importance sampling
- Boundary detection for prior specification issues
- Data quality checks (heterogeneity, sample sizes, extreme rates)
- Variance decomposition (within-group vs between-group uncertainty)
📈 Visualization Tools
- Data overview plots (scatter, histograms, heterogeneity)
- Prior and posterior distributions
- Method comparison plots
- Importance sampling diagnostics
- HTML reports with embedded plots
🔬 Based on Solid Theory
- Beta-Binomial hierarchical models with conjugate priors
- Importance sampling for intractable posteriors
- Law of total variance for proper uncertainty propagation
- Numerical stability via log-space computation
Example: Comparing Methods
from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes, single_theta_bayesian
import numpy as np
# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)
# Fit multiple methods
hb_result = hierarchical_bayes(data, random_seed=42)
st_result = single_theta_bayesian(data, alpha_prior=1.0, beta_prior=1.0)
# Compare via model evidence (marginal likelihood)
# Evidence is automatically computed for both methods!
print("Model Evidence Comparison:")
print(f"Hierarchical Bayes: {hb_result.log_marginal_likelihood:.2f}")
print(f"Single-Theta: {st_result.log_marginal_likelihood:.2f}")
# Calculate Bayes Factor
log_bf = hb_result.log_marginal_likelihood - st_result.log_marginal_likelihood
bf = np.exp(log_bf)
print(f"\nBayes Factor (HB vs ST): {bf:.2e}")
Example: Custom Priors
from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes
import numpy as np
# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)
# Hierarchical Bayes with custom prior parameters
result = hierarchical_bayes(
data,
m_prior_alpha=2.0, # Beta prior for m: E[m] = 2/(2+2) = 0.5
m_prior_beta=2.0, # More informative than uniform
k_prior_min=0.1, # Allow low concentration (high heterogeneity)
k_prior_max=100.0, # Moderate maximum concentration
n_samples=10000, # More samples for better approximation
random_seed=42
)
# Check diagnostics
print(f"Posterior mean for m: {result.m_posterior_mean:.3f}")
print(f"Posterior mean for k: {result.k_posterior_mean:.3f}")
print(f"Effective Sample Size: {result.diagnostics.effective_sample_size:.1f}")
print(f"ESS Ratio: {result.diagnostics.ess_ratio:.3f}")
# Check for boundary issues
if result.diagnostics.k_at_upper_boundary:
print("⚠️ Warning: k posterior near upper boundary, consider increasing k_prior_max")
if result.diagnostics.k_at_lower_boundary:
print("⚠️ Warning: k posterior near lower boundary, consider decreasing k_prior_min")
Project Status
Current Version: 0.1.0 (Production-Ready)
✅ Completed
- Core Pydantic data models with validation
- Prior specification interface
- Stable Beta distribution utilities
- Hierarchical Bayes ⭐ - Importance sampling with full uncertainty (RECOMMENDED)
- Single-Theta Bayesian - Pooled Bayesian estimation
- Clopper-Pearson - Frequentist confidence intervals
- Empirical Bayes ⚠️ - Grid search MLE (NOT RECOMMENDED - use Hierarchical Bayes instead)
- Comprehensive diagnostics (ESS, evidence, coverage analysis)
- Visualization tools (importance sampling, distributions)
- 194 passing tests with extensive coverage
📅 Future Enhancements
- Additional visualization options
- HTML report generation
- Interactive dashboards
- Extended documentation and tutorials
Development
Running Tests
uv run pytest # All tests
uv run pytest -v --cov=proportions # With coverage
Code Quality
uv run ruff format . # Format code
uv run ruff check . # Lint
uv run mypy proportions/ # Type check
Design Principles
- Modularity - Separate concerns (models, priors, inference, diagnostics)
- Type Safety - Pydantic models throughout
- Diagnostics First - Always compute ESS, boundaries, evidence
- Numerical Stability - Log-space computation, stable algorithms
- User-Friendly - Simple API for common cases, power for experts
Documentation
- prompts/SESSION_STATE.md - Current development status and recent changes
- prompts/LIBRARY_DESIGN_PLAN.md - Complete architecture and design
- prompts/HIERARCHICAL_BAYES_SUMMARY.md - Mathematical foundations and algorithms
- examples/ - Jupyter notebooks demonstrating all methods and comparisons
References
This library implements methods based on:
- Beta-Binomial hierarchical models - Conjugate Bayesian inference
- Hierarchical Bayes ⭐ RECOMMENDED - Importance sampling for posterior inference under hyperparameter uncertainty
- Single-Theta Bayesian - Pooled Bayesian estimation assuming homogeneity
- Empirical Bayes ⚠️ NOT RECOMMENDED - MLE of hyperparameters (under-covers, use HB instead)
- Theory - Law of total variance, model evidence (marginal likelihood), Bayes factors
License
MIT License
Contact
Author: Javier Movellan Email: jmovellan@apple.com Repository: https://gitlab.com/movellan/proportions
Citation
If you use this library in your research, please cite:
@software{proportions2025,
author = {Movellan, Javier},
title = {Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data},
year = {2025},
url = {https://gitlab.com/movellan/proportions}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file proportions-0.1.2.tar.gz.
File metadata
- Download URL: proportions-0.1.2.tar.gz
- Upload date:
- Size: 61.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5aeb95e4d6292a87644f7b8e69f37b1e334f5bd3bc7dc0a232112651a812ee93
|
|
| MD5 |
bc4d8c080042bb37898261c107a3ff22
|
|
| BLAKE2b-256 |
d691409f761a9e16f47d8ebb6bf14811d72b4267abc74cfdd8b50bdba874cec7
|
File details
Details for the file proportions-0.1.2-py3-none-any.whl.
File metadata
- Download URL: proportions-0.1.2-py3-none-any.whl
- Upload date:
- Size: 50.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d99e3f1d7d0cd6016a702d5c5c7328025ed1f13c9c4c06811a0df702b0bf4c7
|
|
| MD5 |
03a47c9a43f0b2b731b90bde07912e07
|
|
| BLAKE2b-256 |
301e1a47e2327875d642fe9f55ab5eff19809781bd0a4889580f2272b259f727
|