Several two-samples tests for contingency tables with counts data
Project description
TwoSampleHC -- Higher Criticism Test between Two Frequency Tables
This package provides an adaptation of the Donoho-Jin-Tukey Higher- Critisim (HC) test to frequency tables. This adapatation uses a binomial allocation model for the number of occurances of each feature in two- samples, each of which is associated with a frequency table. The exact binomial test associated with each feature yields a p-value. The HC statistic combines these P-values to a global test against the null hypothesis that the two tables are two realizations of the same data generating mechanism.
This test is particularly useful in identifying non-null effects under weak and sparse alternatives, i.e., when the difference between the tables is due to few features, and the evidence each such feature provide is realtively weak. See references below for more details. [1] Alon Kipnis. (2022). Higher Criticism for Discriminating Word Frequency Tables and Testing Authorship. Annals of Applied Statistics. [2] David L. Donoho and Alon Kipnis. (2022). Higher criticism to compare two large frequency tables, with sensitivity to possible rare and weak differences. Annals of Statistics.
Example:
from TwoSampleHC import two_sample_pvals, HC
import numpy as np
N = 1000 # number of features
n = 5 * N #number of samples
P = 1 / np.arange(1,N+1) # Zipf base distribution
P = P / P.sum()
ep = 0.02 #fraction of features to perturb
mu = 0.005 #intensity of perturbation
TH = np.random.rand(N) < ep
Q = P.copy()
Q[TH] += mu
Q = Q / np.sum(Q)
smp_P = np.random.multinomial(n, P) # sample form P
smp_Q = np.random.multinomial(n, Q) # sample from Q
pv = two_sample_pvals(smp_Q, smp_P) # binomial P-values
hc = HC(pv)
hc_val, p_th = hc.HCstar(gamma = 0.25) # Small sample Higher Criticism test
print("TV distance between P and Q: ", 0.5*np.sum(np.abs(P-Q)))
print("Higher-Criticism score for testing P == Q: ", hc_val)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file twosamplehc-0.3.3.tar.gz
.
File metadata
- Download URL: twosamplehc-0.3.3.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3c56d7024c3d1dc6d2e35e5211137d508222dbbb16fbe3ab075924ea9ea30a67 |
|
MD5 | 46fca47954dfb43163d60b87d41f83c4 |
|
BLAKE2b-256 | f3c935387d4c19bb1b3b1a920407d0113256eb8ba9edae56bd7661179996c147 |
File details
Details for the file TwoSampleHC-0.3.3-py3-none-any.whl
.
File metadata
- Download URL: TwoSampleHC-0.3.3-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72cab9c49a09c0adce6acba7f1577cb8c3e339b2c024d692b7bd621de6de7b91 |
|
MD5 | b1485b2967e06b2c6ce1f2898b262bbe |
|
BLAKE2b-256 | 438fc6a7fc6019855eeed33196acd736534fb61a77ce01999ab70be3baadf833 |