Two-Sample Hypothesis Test. A hypothesis testing tool for multi-dimensional data.
Project description
Introduction
TATTER (Two-sAmple TesT EstimatoR) is a tool to perform two-sample hypothesis test. The two-sample hypothesis test is concerned with whether distributions p(x) and q(x) are different on the basis of finite samples drawn from each of them. This ubiquitous problem appears in a legion of applications, ranging from data mining to data analysis and inference. This implementation can perform the Kolmogorov-Smirnov test (for one-dimensional data only), Kullback-Leibler divergence, and Maximum Mean Discrepancy (MMD) test. The module perform a bootstrap algorithm to estimate the null distribution, and compute p-value.
Dependencies
numpy
, matplotlib
, sklearn
, joblib
, tqdm
, pathlib
Cautions
-
The employed implementation of the Kullback-Leibler divergence is slow and generating a few thousands of bootstrap realizations when the sample size is large (n, m >1000) is not practical.
-
The provided tests reproduce Figures X, X, and X in the paper. Running all of these tests takes ~30 minutes. If your are impatient to reproduce one of the figures try
mnist_digits_distance.py
first.
References
[1]. A. Farahi, Y. Chen "TATTER: A hypothesis testing tool for multi-dimensional data." Astronomy and Computing, Volume 34, January (2021).
[2]. A. Gretton, B. M. Karsten, R. J. Malte, B. Schölkopf, and A. Smola, "A kernel two-sample test." Journal of Machine Learning Research 13, no. Mar (2012): 723-773.
[3]. Q. Wang, S. R. Kulkarni, and S. Verdú, "Divergence estimation for multidimensional densities via k-nearest-neighbor distances." IEEE Transactions on Information Theory 55, no. 5 (2009): 2392-2405.
[4]. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, "Numerical recipes." (1989).
Quickstart
To start using TATTER, simply use from tatter import two_sample_test
to
access the primary function. The exact requirements for the inputs are
listed in the docstring of the two_sample_test() function further below.
An example for using TATTER looks like this:
from tatter import two_sample_test
test_value, test_null, p_value =
two_sample_test(X, Y,
model='MMD',
iterations=1000,
kernel_function='rbf',
gamma=gamma,
n_jobs=4,
verbose=True,
random_state=0)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tatter-1.0.0.tar.gz
.
File metadata
- Download URL: tatter-1.0.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.4 tqdm/4.59.0 importlib-metadata/3.10.0 keyring/22.3.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70f427b598db810c61a43e637a7d7a4deb02d085dd381769eeec2d8eca31cec0 |
|
MD5 | 8cf1017f51e63dd4d776cae3bc3f739b |
|
BLAKE2b-256 | 6cdd9248d85bfe44f66590a984f801c4ae9db1125a673590ec8b08d33dc56af5 |
File details
Details for the file tatter-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: tatter-1.0.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.4 tqdm/4.59.0 importlib-metadata/3.10.0 keyring/22.3.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd96dca50029f29cc000851c24262a52285afbee32939de856564c121237c20b |
|
MD5 | 206380c7fa6e341ea1e1b287df025e63 |
|
BLAKE2b-256 | c95f12d79172bd17fa9241a8967be6b98906f365635071a820641cb526c1ffb0 |