Skip to main content

A fast and covariate-adaptive method for multiple hypothesis testing

Project description

AdaFDR

A fast and covariate-adaptive method for multiple hypothesis testing.

Software accompanying the paper "AdaFDR: a Fast, Powerful and Covariate-Adaptive Approach to Multiple Hypothesis Testing", 2018.

Requirement

  • AdaFDR runs on python 3.

Installation

pip install adafdr

Usage

Import package

adafdr.method contains all methods while adafdr.data_loader contains the data. They can be imported as

import adafdr.method as md
import adafdr.data_loader as dl

Other ways of importing are usually compatible. For example, one can import the package with import adafdr and call method xxx in the method modele via adafdr.method.xxx()

Input format

For a set of N hypotheses, the input data includes the p-values p and the d-dimensional covaraites x, with the following format:

  • p: (N,) numpy.ndarray.
  • x: (N,d) numpy.ndarray.

When d=1, x is allowed to be either (N,) numpy.ndarray or (N,1) numpy.ndarray.

Covariate visualization

The covariate visualization method adafdr_explore can be used as

adafdr.method.adafdr_explore(p, x, output_folder=None)

If the output_folder is not None, the covariate visualization figures will be saved into output_folder. Otherwise, they will show up on the console.

Multiple testing

The multiple hypotehsis testing method adafdr_test can be used as

  • fast version (default): res = adafdr.method.adafdr_test(p, x, alpha=0.1)
  • regular version: res = adafdr.method.adafdr_test(p, x, alpha=0.1, fast_mode=False)
  • regular version with multi-core: res = adafdr.method.adafdr_test(p, x, alpha=0.1, fast_mode=False, single_core=False)

res is a dictionary containing the results, including:

  • res['h_hat']: a (N,) boolean vector with testing results for each hypothesis, with value 1 meaning rejection.
  • res['n_rej']: the number of rejections (on each fold).
  • res['t_rej']: a (N,) float vector with decision threshold for each hypothesis.
  • res['theta']: a list of learned parameters. If output_folder is a folder path, log files will be saved in the folder.

Example on airway RNA-seq data

The following is an example on the airway RNA-seq data used in the paper.

Import package and load data

adafdr.method contains the algorithm implementation while adafdr.data_loader can be used to load the data used in the paper. Here we load the airway data used in the paper. See vignette for other data accompanied with the package.

import adafdr.method as md
import adafdr.data_loader as dl
p,x = dl.data_airway()

Covariate visualization using adafdr_explore

md.adafdr_explore(p, x, output_folder=None)

p_scatter ratio

Here, the left is a scatter plot of each hypothesis with p-values (y-axis) against the covariate (x-axis). The right are the estimated null hypothesis distribution (blue) and the estimated alternative hypothesis distribution (orange) with respect to the covariate. Here we can conclude that a hypothesis is more likely to be significant if the covariate (gene expression) value is larger.

Multiple hypothesis testing using adafdr_test

res = md.adafdr_test(p, x, fast_mode=True, output_folder=None)

Here, the learned threshold res['t_rej'] looks as follows. Note that the two lines correspond to the data from two folds via hypothesis splitting.

p_scatter

Quick Test

Here is a quick test. First check if the package can be succesfully imported:

import adafdr.method as md
import adafdr.data_loader as dl

Next, run a small example which should take a few seconds:

import numpy as np
p,x,h,_,_ = dl.load_1d_bump_slope()
res = md.adafdr_test(p, x, alpha=0.1)
t_rej = res['t_rej']
D = np.sum(p<=t_rej)
FD = np.sum((p<=t_rej)&(~h))
print('# AdaFDR successfully finished!')
print('# D=%d, FD=%d, FDP=%0.3f'%(D, FD, FD/D))

It runs AdaFDR-fast on a 1d simulated data. If the package is successfully imported, the result should look like:

# AdaFDR successfully finished! 
# D=840, FD=80, FDP=0.095

Citation information

Coming soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adafdr-0.0.8.tar.gz (2.4 MB view hashes)

Uploaded Source

Built Distribution

adafdr-0.0.8-py3-none-any.whl (2.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page