Skip to main content

Two-Sample Hypothesis Test. A hypothesis testing tool for multi-dimensional data.

Project description

GitHub PyPI PyPI - Python Version ascl:2006.007

Introduction

TATTER (Two-sAmple TesT EstimatoR) is a tool to perform two-sample hypothesis test. The two-sample hypothesis test is concerned with whether distributions p(x) and q(x) are different on the basis of finite samples drawn from each of them. This ubiquitous problem appears in a legion of applications, ranging from data mining to data analysis and inference. This implementation can perform the Kolmogorov-Smirnov test (for one-dimensional data only), Kullback-Leibler divergence, and Maximum Mean Discrepancy (MMD) test. The module perform a bootstrap algorithm to estimate the null distribution, and compute p-value.

Dependencies

numpy, matplotlib, sklearn, joblib, tqdm, pathlib

Cautions

  • The employed implementation of the Kullback-Leibler divergence is slow and generating a few thousands of bootstrap realizations when the sample size is large (n, m >1000) is not practical.

  • The provided tests reproduce Figures X, X, and X in the paper. Running all of these tests takes ~30 minutes. If your are impatient to reproduce one of the figures try mnist_digits_distance.py first.

References

[1]. A. Farahi, Y. Chen "TATTER: A hypothesis testing tool for multi-dimensional data." Astronomy and Computing, Volume 34, January (2021).

[2]. A. Gretton, B. M. Karsten, R. J. Malte, B. Schölkopf, and A. Smola, "A kernel two-sample test." Journal of Machine Learning Research 13, no. Mar (2012): 723-773.

[3]. Q. Wang, S. R. Kulkarni, and S. Verdú, "Divergence estimation for multidimensional densities via k-nearest-neighbor distances." IEEE Transactions on Information Theory 55, no. 5 (2009): 2392-2405.

[4]. W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, "Numerical recipes." (1989).

Quickstart

To start using TATTER, simply use from tatter import two_sample_test to access the primary function. The exact requirements for the inputs are listed in the docstring of the two_sample_test() function further below. An example for using TATTER looks like this:

  from tatter import two_sample_test

  test_value, test_null, p_value =
           two_sample_test(X, Y,
                           model='MMD',
                           iterations=1000,
                           kernel_function='rbf',
                           gamma=gamma,
                           n_jobs=4,
                           verbose=True,
                           random_state=0)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tatter-1.0.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

tatter-1.0.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file tatter-1.0.0.tar.gz.

File metadata

  • Download URL: tatter-1.0.0.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.4 tqdm/4.59.0 importlib-metadata/3.10.0 keyring/22.3.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.8

File hashes

Hashes for tatter-1.0.0.tar.gz
Algorithm Hash digest
SHA256 70f427b598db810c61a43e637a7d7a4deb02d085dd381769eeec2d8eca31cec0
MD5 8cf1017f51e63dd4d776cae3bc3f739b
BLAKE2b-256 6cdd9248d85bfe44f66590a984f801c4ae9db1125a673590ec8b08d33dc56af5

See more details on using hashes here.

File details

Details for the file tatter-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: tatter-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.4 tqdm/4.59.0 importlib-metadata/3.10.0 keyring/22.3.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.8

File hashes

Hashes for tatter-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd96dca50029f29cc000851c24262a52285afbee32939de856564c121237c20b
MD5 206380c7fa6e341ea1e1b287df025e63
BLAKE2b-256 c95f12d79172bd17fa9241a8967be6b98906f365635071a820641cb526c1ffb0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page