Online False Discovery Rate (FDR) control algorithms for multiple hypothesis testing
Project description
Online FDR: Online False Discovery Rate Control Algorithms
🎯 Overview
online-fdr is a comprehensive Python library for controlling False Discovery Rate (FDR) and Family-Wise Error Rate (FWER) in online multiple hypothesis testing scenarios. Unlike traditional methods that require all p-values upfront, our library provides truly online algorithms that make decisions sequentially as data arrives.
Why Online FDR Control?
In modern data science and scientific research, hypotheses often arrive sequentially:
- 🔬 Clinical Trials: Interim analyses as patient data accumulates
- 📊 A/B Testing: Continuous experimentation in tech companies
- 🧬 Genomics: Sequential gene discovery studies
- 📈 Finance: Real-time anomaly detection in trading
- 🌐 Web Analytics: Ongoing feature testing and optimization
Traditional FDR control methods require batch processing of all hypotheses simultaneously. This library implements state-of-the-art online algorithms that:
- ✅ Make immediate decisions without waiting for future data
- ✅ Maintain rigorous statistical guarantees
- ✅ Adapt to the sequential nature of modern data collection
- ✅ Support both independent and dependent p-values
🚀 Quick Start
Installation
pip install online-fdr
Basic Usage
from online_fdr.investing.alpha.alpha import Gai
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel
# Initialize a data generator for demonstration
dgp = GaussianLocationModel(alt_mean=3.0, alt_std=1.0, one_sided=True)
generator = DataGenerator(n=1000, pi0=0.9, dgp=dgp) # 10% alternatives
# Create an online FDR procedure
alpha_investing = Gai(alpha=0.05)
# Test hypotheses sequentially
discoveries = []
for i in range(100):
p_value, label = generator.sample_one()
is_discovery = alpha_investing.test_one(p_value)
if is_discovery:
discoveries.append(i)
print(f"Discovery at test {i}: p-value = {p_value:.4f}")
📚 Implemented Methods
Sequential Testing Methods
Methods that test one hypothesis at a time:
Alpha Investing Family
- Generalized Alpha Investing (GAI): The classic alpha investing framework by Foster & Stine
- SAFFRON: Adaptive algorithm with improved power
- ADDIS: Adaptive algorithm that discards conservative nulls
LORD Family
- LORD3: Wealth-based testing with rewards
- LORD++: Improved variant with better power
- D-LORD: Version for dependent p-values
- LORD with Memory Decay: For non-stationary time series
LOND Family
- LOND: Levels based on Number of Discoveries
- Modified LOND: Improved variant with max(R_t, 1)
- LOND for Dependent p-values: Handles arbitrary dependence
Alpha Spending
- Bonferroni-like procedures: Classic FWER control adapted for sequential testing
- Online Fallback: Guarantees FWER control in sequential settings
Batch Testing Methods
Methods that test hypotheses in batches:
- BatchBH: Online version of Benjamini-Hochberg
- BatchStBH: Storey's improvement to BH for batches
- BatchPRDS: For positive regression dependency
- BatchBY: Benjamini-Yekutieli for arbitrary dependence
💡 Key Features
1. True Online API
Unlike other implementations that require pre-collected data, our library offers genuinely sequential testing:
# Real-world scenario: testing as data arrives
procedure = Addis(alpha=0.05, wealth=0.025, tau=0.5)
# In production, p-values arrive one by one
while data_stream.is_active():
p_value = compute_p_value(data_stream.get_next())
decision = procedure.test_one(p_value)
if decision:
trigger_alert()
2. Unified Interface
All procedures follow the same simple interface:
# Sequential testing
result = procedure.test_one(p_value)
# Batch testing
results = batch_procedure.test_batch(p_values_list)
3. Flexible Configuration
Each method supports various configurations for different scenarios:
# For independent p-values
lond_indep = Lond(alpha=0.05)
# For dependent p-values
lond_dep = Lond(alpha=0.05, dependent=True)
# With decay for time series
lord_decay = LORDMemoryDecay(alpha=0.05, wealth=0.025, delta=0.95)
4. Rich Utilities
Built-in tools for evaluation and testing:
from online_fdr.utils.evaluation import evaluate_procedures
from online_fdr.utils.visualization import plot_wealth_trajectory
# Compare different procedures
results = evaluate_procedures(
procedures=[lord3, lond, saffron],
data_generator=generator,
n_runs=100
)
📊 Performance Comparison
The library includes comprehensive benchmarking tools:
from online_fdr.benchmarks import compare_methods
# Compare methods on your data
comparison = compare_methods(
p_values=your_p_values,
methods=['lord3', 'saffron', 'addis'],
alpha=0.05
)
comparison.plot_power_curves()
🔬 Mathematical Foundations
Each implemented method provides rigorous theoretical guarantees:
- FDR Control: $\mathbb{E}[\text{FDR}] \leq \alpha$ for all methods
- FWER Control: $\mathbb{P}(\text{Any false rejection}) \leq \alpha$ for alpha spending methods
- Power: Optimized algorithms that maximize discovery rate while maintaining control
🛠️ Advanced Usage
Custom Gamma Sequences
from online_fdr.utils.sequence import AbstractGammaSequence
class MyGammaSequence(AbstractGammaSequence):
def calc_gamma(self, j: int) -> float:
return self.c / (j * np.log(j + 1))
# Use with any compatible method
lord_custom = LordThree(alpha=0.05, wealth=0.025, gamma_sequence=MyGammaSequence(c=0.07))
Handling Dependent P-values
# For arbitrary dependence
lond_dep = Lond(alpha=0.05, dependent=True)
# For positive dependence (PRDS)
batch_prds = BatchPRDS(alpha=0.05)
📖 Documentation
For detailed documentation, tutorials, and API reference, visit our documentation site.
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details.
# Clone the repository
git clone https://github.com/yourusername/online-fdr.git
cd online-fdr
# Install in development mode
pip install -e ".[dev]"
# Run tests
python -m pytest
# Format code
black online_fdr tests
📝 Citation
If you use this library in your research, please cite:
@software{online_fdr,
title = {online-fdr: Online False Discovery Rate Control Algorithms},
author = {Your Name},
year = {2024},
url = {https://github.com/yourusername/online-fdr}
}
🙏 Acknowledgements
This library is inspired by and validated against the R package onlineFDR. We extend our gratitude to the authors of the original papers and the onlineFDR package maintainers.
📄 License
This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.
Original Content
The vast majority of implementations of online method for FDR control are either part of an experimental setup, that does not straight-forwardly generalize towards applications outside this setup, or are geared towards tests for which all test results are already available (i.e. they do not have an actual online API).
For that reason, this repository implements a wide range of methods for FDR/FWER control for actual online multiple
hypothesis testing with an intuitive test_one() method:
- Alpha Spending (Bonferroni, ...)
- Online Fallback
- [Generalized] Alpha Investing (Foster/Stine, ...)
- LOND (Original, Modified, Dependent)
- LORD (LORD3, LOND++, D-LORD, Dependent, DecayLORD)
- SAFFRON (Standard, DecaySAFFRON)
- ADDIS (Standard, DecayADDIS)
- Batch-BH and Batch-StBH
Instantiate an online testing procedure (e.g. Addis()) and simply test p-values sequentially with .test_one():
from online_fdr.investing.addis.addis import Addis
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel
N = 100
dgp = GaussianLocationModel(alt_mean=3.0)
generator = DataGenerator(n=N, pi0=0.9, dgp=dgp) # 10% alternatives
addis = Addis(alpha=0.05, wealth=0.025, lambda_=0.25, tau=0.5) # procedure
for i in range(0, N):
p_value, label = generator.sample_one()
result = addis.test_one(p_value) # sequential testing
5. Advanced Data Generation
from online_fdr.utils.generation import (
BetaMixtureModel, DependentGaussianModel, SparseGaussianModel,
create_genomics_generator, create_screening_generator
)
# Genomics-style data (many nulls, beta-distributed alternatives)
gen_genomics = create_genomics_generator(n=10000, pi0=0.95)
# Screening study with sparse signals
gen_screening = create_screening_generator(n=1000, pi0=0.9,
min_effect=2.0, max_effect=5.0)
# Dependent p-values with block correlation
dgp_dep = DependentGaussianModel(alt_mean=3.0, correlation=0.5,
structure="block", block_size=20)
gen_dependent = ImprovedDataGenerator(n=500, pi0=0.8, dgp=dgp_dep)
# Batch generation for batch methods
p_vals_batch, labels_batch = gen_genomics.sample_batch(size=100)
This work is inspired by the R package 'onlineFDR'. This package, and most of its methods, are largely validated by the implementations of said package. Key differentiator is the design choice in regard to method calls for sequential testing, as this implementation allows for truly temporal applications ('onlineFDR' requires a [static] data.frame for testing).
Getting started
The library requires numpy and scipy for advanced data generation features. It's recommended to use with Python 3.8+, with testing performed on Python 3.12.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file online_fdr-0.0.1.tar.gz.
File metadata
- Download URL: online_fdr-0.0.1.tar.gz
- Upload date:
- Size: 28.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b29d940b58c186be5b86a479a8cd132fe5ce03bc3298f6ed16d99fc660eb8205
|
|
| MD5 |
b866f95c398c6da5026caa32a0187a3a
|
|
| BLAKE2b-256 |
816c4ee2f1781a5c9498d1324733d7951e6dacb1ed867c6745aa51c02b0761a1
|
File details
Details for the file online_fdr-0.0.1-py3-none-any.whl.
File metadata
- Download URL: online_fdr-0.0.1-py3-none-any.whl
- Upload date:
- Size: 37.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
511b1920567e49badd6b6caadbb6f797eb518944556ac6a6209b8745fbb7474f
|
|
| MD5 |
535883184e65fdad62a869106110f3be
|
|
| BLAKE2b-256 |
b97ee31939997362e585aa3a5b7957744cf1ffd1844b94280d5fa99e5574e9aa
|