A comprehensive library for Bayesian and frequentist inference on binomial proportions (grouped and single) with truncated Beta prior support
Project description
Proportions: Bayesian Inference for Grouped Binomial Data
A Python library for estimating average success rates across multiple groups using Bayesian and frequentist methods.
Installation
pip install proportions
That's it! No need to clone the repository unless you want to contribute (see Developer Setup below).
Quick Start
import numpy as np
from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes, single_theta_bayesian
# Your data: success counts (x) and trial counts (n) per group
x = np.array([8, 7, 9, 6, 8]) # successes
n = np.array([10, 10, 10, 10, 10]) # trials
data = BinomialData(x=x, n=n)
# Hierarchical Bayes (RECOMMENDED - accounts for all uncertainty)
hb_result = hierarchical_bayes(data, random_seed=42)
print(f"Average success rate: {hb_result.posterior.mu:.3f}")
print(f"95% CI: [{hb_result.posterior.ci_lower:.3f}, {hb_result.posterior.ci_upper:.3f}]")
# Single-Theta Bayesian (simpler, assumes homogeneity)
st_result = single_theta_bayesian(data, alpha_prior=1.0, beta_prior=1.0)
print(f"Average success rate: {st_result.posterior.mu:.3f}")
print(f"95% CI: [{st_result.posterior.ci_lower:.3f}, {st_result.posterior.ci_upper:.3f}]")
When to Use This Library
When you have binomial data from multiple groups and want to:
- Estimate the average success rate across all groups
- Quantify uncertainty with credible/confidence intervals
- Account for heterogeneity between groups
- Compare different modeling approaches with automatic diagnostics
Examples: Success rates across experiments, conversion rates across user segments, test pass rates across scenarios.
Available Methods
| Method | Description | When to Use |
|---|---|---|
| Hierarchical Bayes ⭐ | Full Bayesian with importance sampling | Default choice - honest uncertainty |
| Single-Theta Bayesian | Pooled model (homogeneous groups) | Groups believed identical |
| Clopper-Pearson | Frequentist exact confidence intervals | Baseline comparison |
Comparing Methods
from proportions.core.models import BinomialData
from proportions.inference import hierarchical_bayes, single_theta_bayesian
import numpy as np
# Prepare data
x = np.array([8, 7, 9, 6, 8])
n = np.array([10, 10, 10, 10, 10])
data = BinomialData(x=x, n=n)
# Fit multiple methods
hb_result = hierarchical_bayes(data, random_seed=42)
st_result = single_theta_bayesian(data, alpha_prior=1.0, beta_prior=1.0)
# Compare via model evidence (marginal likelihood)
print("Model Evidence Comparison:")
print(f"Hierarchical Bayes: {hb_result.log_marginal_likelihood:.2f}")
print(f"Single-Theta: {st_result.log_marginal_likelihood:.2f}")
# Calculate Bayes Factor
log_bf = hb_result.log_marginal_likelihood - st_result.log_marginal_likelihood
bf = np.exp(log_bf)
print(f"\nBayes Factor (HB vs ST): {bf:.2e}")
Single Proportion Inference
For analyzing a single binomial proportion (not grouped data), the library provides specialized functions with support for truncated Beta priors:
from proportions.inference import (
conf_interval_proportion,
upper_bound_proportion,
prob_larger_than_threshold,
)
from proportions.diagnostics import plot_posterior_proportion
# Example: Rare failure rate estimation
# Domain knowledge: failure rate is between 1-10 per million
n_trials = 500_000
n_failures = 1
# Credible interval with truncated prior
interval = conf_interval_proportion(
n_failures, n_trials,
confidence=0.95,
prior_alpha=1.0, # Uniform prior
prior_beta=1.0,
lower=1e-6, # 1 per million (lower bound)
upper=10e-6 # 10 per million (upper bound)
)
print(f"95% CI: [{interval[0]*1e6:.2f}, {interval[1]*1e6:.2f}] per million")
# One-sided bounds
upper_95 = upper_bound_proportion(n_failures, n_trials, confidence=0.95,
lower=1e-6, upper=10e-6)
print(f"95% upper bound: {upper_95*1e6:.2f} per million")
# Threshold probabilities
prob = prob_larger_than_threshold(n_failures, n_trials, threshold=3e-6,
lower=1e-6, upper=10e-6)
print(f"P(rate > 3/million): {prob:.1%}")
# Visualize posterior
fig = plot_posterior_proportion(
n_failures, n_trials,
confidence=0.95,
lower=1e-6,
upper=10e-6,
title="Failure Rate Posterior"
)
fig.savefig('posterior.png', dpi=150, bbox_inches='tight')
Available Functions:
conf_interval_proportion()- Equal-tails credible intervalsupper_bound_proportion()- Upper credible boundslower_bound_proportion()- Lower credible boundsprob_larger_than_threshold()- P(θ > threshold | data)prob_smaller_than_threshold()- P(θ < threshold | data)prob_of_interval()- P(θ ∈ [a,b] | data)plot_posterior_proportion()- Visualize posterior distribution
Key Features:
- Truncated priors - Incorporate domain knowledge about plausible ranges
- Standard Beta priors - Use
lower=0.0, upper=1.0(default) for standard Beta - All priors supported - Uniform, Jeffreys, or custom Beta(α, β)
- Comprehensive inference - Intervals, bounds, probabilities, visualization
See demo_failure_rate.py for a complete example.
Features
- Multiple estimation methods with unified API
- Single proportion inference with truncated Beta priors
- Automatic validation of input data (Pydantic models)
- Model evidence (marginal likelihood) for method comparison
- Bayes factors with interpretation
- Effective Sample Size (ESS) for importance sampling diagnostics
- Variance decomposition (within-group vs between-group uncertainty)
- Numerically stable Beta distribution functions (log-space computation)
- Visualization tools for posterior distributions
Custom Priors
from proportions.inference import hierarchical_bayes
# Hierarchical Bayes with custom prior parameters
result = hierarchical_bayes(
data,
m_prior_alpha=2.0, # Beta prior for m: E[m] = 2/(2+2) = 0.5
m_prior_beta=2.0, # More informative than uniform
k_prior_min=0.1, # Allow low concentration (high heterogeneity)
k_prior_max=100.0, # Moderate maximum concentration
n_samples=10000, # More samples for better approximation
random_seed=42
)
# Check diagnostics
print(f"Posterior mean for m: {result.m_posterior_mean:.3f}")
print(f"Posterior mean for k: {result.k_posterior_mean:.3f}")
print(f"Effective Sample Size: {result.diagnostics.effective_sample_size:.1f}")
Documentation
- Repository: https://gitlab.com/movellan/proportions
- Examples: See
examples/directory in the repository - Theory: Based on Beta-Binomial hierarchical models with conjugate priors
License
MIT License
Citation
If you use this library in your research, please cite:
@software{proportions2025,
author = {Movellan, Javier},
title = {Proportions: Bayesian and Frequentist Inference for Grouped Binomial Data},
year = {2025},
url = {https://pypi.org/project/proportions/}
}
Developer Setup (Optional)
Only needed if you want to contribute to the library or run examples from the repository.
Using uv (Recommended - Fast and Modern)
First, install uv if you don't have it:
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or via pip
pip install uv
Then, set up the project:
# Clone the repository
git clone git@gitlab.com:movellan/proportions.git
cd proportions
# Install with uv (automatically creates .venv and installs all dependencies)
uv sync
# Run tests to verify installation
uv run pytest tests/ -v
# Run commands with uv (no activation needed!)
uv run python examples/demo_script.py
Why uv?
- No need to activate:
uv runautomatically uses the project venv - Fast installation (10-100x faster than pip)
- Reproducible builds with uv.lock
Using pip (Alternative)
# Clone the repository
git clone git@gitlab.com:movellan/proportions.git
cd proportions
# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate # On macOS/Linux
# or
.venv\Scripts\activate # On Windows
# Install package in editable mode with dev dependencies
pip install -e ".[dev]"
# Run tests to verify installation
pytest tests/ -v
Running Tests
# With uv
uv run pytest # All tests
uv run pytest -v --cov=proportions # With coverage
# With pip (after activation)
pytest # All tests
pytest -v --cov=proportions # With coverage
Code Quality
# With uv
uv run ruff format . # Format code
uv run ruff check . # Lint
uv run mypy proportions/ # Type check
# With pip (after activation)
ruff format . # Format code
ruff check . # Lint
mypy proportions/ # Type check
Contact
Author: Javier Movellan Email: jrmovellan@gmail.com Repository: https://gitlab.com/movellan/proportions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file proportions-0.2.1.tar.gz.
File metadata
- Download URL: proportions-0.2.1.tar.gz
- Upload date:
- Size: 71.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1de33738526e291faacf285a8a0306c6675b1276c835ca22a293ac4107f8caba
|
|
| MD5 |
94f10e861bae0a20bcd29c78949b11bf
|
|
| BLAKE2b-256 |
223e60df6d10f2ee9ccf9677a4f2ab77c8d9ab9fb53a4e9bb22cdbb9f1edb04d
|
File details
Details for the file proportions-0.2.1-py3-none-any.whl.
File metadata
- Download URL: proportions-0.2.1-py3-none-any.whl
- Upload date:
- Size: 57.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cc664ff8c1e50a768af350c49de60afa305d0b1a38dffafecc9bb5d568ea525
|
|
| MD5 |
7e0ba7af552951033880ba9b5a615050
|
|
| BLAKE2b-256 |
832c9714f9405ec5fc5412853f4ee242c36e5ec8b36e91157fa8c285e3525f13
|