Online False Discovery Rate (FDR) control algorithms for multiple hypothesis testing

These details have not been verified by PyPI

Project links

Project description

Online FDR: Online False Discovery Rate Control Algorithms

🎯 Overview

online-fdr is a comprehensive Python library for controlling False Discovery Rate (FDR) and Family-Wise Error Rate (FWER) in online multiple hypothesis testing scenarios. Unlike traditional methods that require all p-values upfront, our library provides truly online algorithms that make decisions sequentially as data arrives.

Why Online FDR Control?

In modern data science and scientific research, hypotheses often arrive sequentially:

🔬 Clinical Trials: Interim analyses as patient data accumulates
📊 A/B Testing: Continuous experimentation in tech companies
🧬 Genomics: Sequential gene discovery studies
📈 Finance: Real-time anomaly detection in trading
🌐 Web Analytics: Ongoing feature testing and optimization

Traditional FDR control methods require batch processing of all hypotheses simultaneously. This library implements state-of-the-art online algorithms that:

✅ Make immediate decisions without waiting for future data
✅ Maintain rigorous statistical guarantees
✅ Adapt to the sequential nature of modern data collection
✅ Support both independent and dependent p-values

🚀 Quick Start

Installation

pip install online-fdr

Basic Usage

from online_fdr.investing.alpha.alpha import Gai
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel

# Initialize a data generator for demonstration
dgp = GaussianLocationModel(alt_mean=3.0, alt_std=1.0, one_sided=True)
generator = DataGenerator(n=1000, pi0=0.9, dgp=dgp)  # 10% alternatives

# Create an online FDR procedure
alpha_investing = Gai(alpha=0.05)

# Test hypotheses sequentially
discoveries = []
for i in range(100):
    p_value, label = generator.sample_one()
    is_discovery = alpha_investing.test_one(p_value)

    if is_discovery:
        discoveries.append(i)
        print(f"Discovery at test {i}: p-value = {p_value:.4f}")

📚 Implemented Methods

Sequential Testing Methods

Methods that test one hypothesis at a time:

Alpha Investing Family

Generalized Alpha Investing (GAI): The classic alpha investing framework by Foster & Stine
SAFFRON: Adaptive algorithm with improved power
ADDIS: Adaptive algorithm that discards conservative nulls

LORD Family

LORD3: Wealth-based testing with rewards
LORD++: Improved variant with better power
D-LORD: Version for dependent p-values
LORD with Memory Decay: For non-stationary time series

LOND Family

LOND: Levels based on Number of Discoveries
Modified LOND: Improved variant with max(R_t, 1)
LOND for Dependent p-values: Handles arbitrary dependence

Alpha Spending

Bonferroni-like procedures: Classic FWER control adapted for sequential testing
Online Fallback: Guarantees FWER control in sequential settings

Batch Testing Methods

Methods that test hypotheses in batches:

BatchBH: Online version of Benjamini-Hochberg
BatchStBH: Storey's improvement to BH for batches
BatchPRDS: For positive regression dependency
BatchBY: Benjamini-Yekutieli for arbitrary dependence

💡 Key Features

1. True Online API

Unlike other implementations that require pre-collected data, our library offers genuinely sequential testing:

# Real-world scenario: testing as data arrives
procedure = Addis(alpha=0.05, wealth=0.025, tau=0.5)

# In production, p-values arrive one by one
while data_stream.is_active():
    p_value = compute_p_value(data_stream.get_next())
    decision = procedure.test_one(p_value)
    
    if decision:
        trigger_alert()

2. Unified Interface

All procedures follow the same simple interface:

# Sequential testing
result = procedure.test_one(p_value)

# Batch testing
results = batch_procedure.test_batch(p_values_list)

3. Flexible Configuration

Each method supports various configurations for different scenarios:

# For independent p-values
lond_indep = Lond(alpha=0.05)

# For dependent p-values  
lond_dep = Lond(alpha=0.05, dependent=True)

# With decay for time series
lord_decay = LORDMemoryDecay(alpha=0.05, wealth=0.025, delta=0.95)

4. Rich Utilities

Built-in tools for evaluation and testing:

from online_fdr.utils.evaluation import evaluate_procedures
from online_fdr.utils.visualization import plot_wealth_trajectory

# Compare different procedures
results = evaluate_procedures(
    procedures=[lord3, lond, saffron],
    data_generator=generator,
    n_runs=100
)

📊 Performance Comparison

The library includes comprehensive benchmarking tools:

from online_fdr.benchmarks import compare_methods

# Compare methods on your data
comparison = compare_methods(
    p_values=your_p_values,
    methods=['lord3', 'saffron', 'addis'],
    alpha=0.05
)
comparison.plot_power_curves()

🔬 Mathematical Foundations

Each implemented method provides rigorous theoretical guarantees:

FDR Control: $\mathbb{E}[\text{FDR}] \leq \alpha$ for all methods
FWER Control: $\mathbb{P}(\text{Any false rejection}) \leq \alpha$ for alpha spending methods
Power: Optimized algorithms that maximize discovery rate while maintaining control

🛠️ Advanced Usage

Custom Gamma Sequences

from online_fdr.utils.sequence import AbstractGammaSequence

class MyGammaSequence(AbstractGammaSequence):
    def calc_gamma(self, j: int) -> float:
        return self.c / (j * np.log(j + 1))

# Use with any compatible method
lord_custom = LordThree(alpha=0.05, wealth=0.025, gamma_sequence=MyGammaSequence(c=0.07))

Handling Dependent P-values

# For arbitrary dependence
lond_dep = Lond(alpha=0.05, dependent=True)

# For positive dependence (PRDS)
batch_prds = BatchPRDS(alpha=0.05)

📖 Documentation

For detailed documentation, tutorials, and API reference, visit our documentation site.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

# Clone the repository
git clone https://github.com/yourusername/online-fdr.git
cd online-fdr

# Install in development mode
pip install -e ".[dev]"

# Run tests
python -m pytest

# Format code
black online_fdr tests

📝 Citation

If you use this library in your research, please cite:

@software{online_fdr,
  title = {online-fdr: Online False Discovery Rate Control Algorithms},
  author = {Your Name},
  year = {2024},
  url = {https://github.com/yourusername/online-fdr}
}

🙏 Acknowledgements

This library is inspired by and validated against the R package onlineFDR. We extend our gratitude to the authors of the original papers and the onlineFDR package maintainers.

📄 License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

Original Content

The vast majority of implementations of online method for FDR control are either part of an experimental setup, that does not straight-forwardly generalize towards applications outside this setup, or are geared towards tests for which all test results are already available (i.e. they do not have an actual online API).

For that reason, this repository implements a wide range of methods for FDR/FWER control for actual online multiple hypothesis testing with an intuitive test_one() method:

Alpha Spending (Bonferroni, ...)
Online Fallback
[Generalized] Alpha Investing (Foster/Stine, ...)
LOND (Original, Modified, Dependent)
LORD (LORD3, LOND++, D-LORD, Dependent, DecayLORD)
SAFFRON (Standard, DecaySAFFRON)
ADDIS (Standard, DecayADDIS)
Batch-BH and Batch-StBH

Instantiate an online testing procedure (e.g. Addis()) and simply test p-values sequentially with .test_one():

from online_fdr.investing.addis.addis import Addis
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel

N = 100
dgp = GaussianLocationModel(alt_mean=3.0)
generator = DataGenerator(n=N, pi0=0.9, dgp=dgp)  # 10% alternatives

addis = Addis(alpha=0.05, wealth=0.025, lambda_=0.25, tau=0.5)  # procedure

for i in range(0, N):
    p_value, label = generator.sample_one()
    result = addis.test_one(p_value)  # sequential testing

5. Advanced Data Generation

from online_fdr.utils.generation import (
    BetaMixtureModel, DependentGaussianModel, SparseGaussianModel,
    create_genomics_generator, create_screening_generator
)

# Genomics-style data (many nulls, beta-distributed alternatives)
gen_genomics = create_genomics_generator(n=10000, pi0=0.95)

# Screening study with sparse signals
gen_screening = create_screening_generator(n=1000, pi0=0.9, 
                                         min_effect=2.0, max_effect=5.0)

# Dependent p-values with block correlation
dgp_dep = DependentGaussianModel(alt_mean=3.0, correlation=0.5, 
                                structure="block", block_size=20)
gen_dependent = ImprovedDataGenerator(n=500, pi0=0.8, dgp=dgp_dep)

# Batch generation for batch methods
p_vals_batch, labels_batch = gen_genomics.sample_batch(size=100)

This work is inspired by the R package 'onlineFDR'. This package, and most of its methods, are largely validated by the implementations of said package. Key differentiator is the design choice in regard to method calls for sequential testing, as this implementation allows for truly temporal applications ('onlineFDR' requires a [static] data.frame for testing).

Getting started

The library requires numpy and scipy for advanced data generation features. It's recommended to use with Python 3.8+, with testing performed on Python 3.12.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.3

Jul 3, 2025

0.0.2

Jul 3, 2025

This version

0.0.1

Jul 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

online_fdr-0.0.1.tar.gz (28.2 kB view details)

Uploaded Jul 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

online_fdr-0.0.1-py3-none-any.whl (37.9 kB view details)

Uploaded Jul 2, 2025 Python 3

File details

Details for the file online_fdr-0.0.1.tar.gz.

File metadata

Download URL: online_fdr-0.0.1.tar.gz
Upload date: Jul 2, 2025
Size: 28.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for online_fdr-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`b29d940b58c186be5b86a479a8cd132fe5ce03bc3298f6ed16d99fc660eb8205`
MD5	`b866f95c398c6da5026caa32a0187a3a`
BLAKE2b-256	`816c4ee2f1781a5c9498d1324733d7951e6dacb1ed867c6745aa51c02b0761a1`

See more details on using hashes here.

File details

Details for the file online_fdr-0.0.1-py3-none-any.whl.

File metadata

Download URL: online_fdr-0.0.1-py3-none-any.whl
Upload date: Jul 2, 2025
Size: 37.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for online_fdr-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`511b1920567e49badd6b6caadbb6f797eb518944556ac6a6209b8745fbb7474f`
MD5	`535883184e65fdad62a869106110f3be`
BLAKE2b-256	`b97ee31939997362e585aa3a5b7957744cf1ffd1844b94280d5fa99e5574e9aa`

See more details on using hashes here.

online-fdr 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Online FDR: Online False Discovery Rate Control Algorithms

🎯 Overview

Why Online FDR Control?

🚀 Quick Start

Installation

Basic Usage

📚 Implemented Methods

Sequential Testing Methods

Alpha Investing Family

LORD Family

LOND Family

Alpha Spending

Batch Testing Methods

💡 Key Features

1. True Online API

2. Unified Interface

3. Flexible Configuration

4. Rich Utilities

📊 Performance Comparison

🔬 Mathematical Foundations

🛠️ Advanced Usage

Custom Gamma Sequences

Handling Dependent P-values

📖 Documentation

🤝 Contributing

📝 Citation

🙏 Acknowledgements

📄 License

Original Content

5. Advanced Data Generation

Getting started

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes