Computes maximum mean discrepancy two-sample test for univariate data using the Laplacian kernel. It is also possible to compute the p-value using permutations.
Project description
eummd
Description
Computes maximum mean discrepancy two-sample test for univariate data using the Laplacian kernel. It is also possible to compute the p-value using permutations. Also includes implementation for computing the robust median difference statistic 'Q_n' from Croux and Rousseeuw (1992) doi:10.1007/978-3-662-26811-7_58 based on Johnson and Mizoguchi (1978) doi:10.1137/0207013.
Installation instructions
Install using pip
:
python3 -m pip install eummd
Methods are implemented in C++, and then called using Cython.
Functions
There are various functions for computing the maximum mean discrepancy (MMD).
eummd
: efficient univariate MMD with the Laplacian kernel, O(n log n) for a total of n samples.mmd
: MMD (univariate/multivariate).meammd
: multivariate efficient approximate MMD, using either random projections or interpoint distances.mediandiff
: computes the median difference. If univariate, can be computed in O(n log n), otherwise is O(n^2).medianheuristic
: the inverse of the median difference.
eummd
and mmd
return a dictionary with the items pval
for
p-value (if specified), stat
for the statistic and beta
for the
kernel parameter value that was used. If beta
is specified as a
non-positive number, then the median heuristic will be calculated.
Examples:
from eummd import eummd
from eummd import mediandiff
X = np.array([7.1, 1.2, 4.3, 0.4]).astype(np.float64)
Y = np.array([5.5, 2.6, 8.7]).astype(np.float64)
beta = 0.1
# only computes statistic,
d = eummd(X, Y, beta, pval=False)
print(d['stat'])
# computes statistic and p-value, and used median heuristic for beta
d = eummd(X, Y, beta=-0.1, pval=True)
print("p-value for univariate data: ", d['pval'])
print("statistic for univariate data: ", d['stat'])
print("beta for univariate data: ", d['beta'])
# show median heuristic was used
print(1.0 / mediandiff(X, Y, fast=True))
# a multivariate example, using median heuristic for beta
X = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]).reshape((6, 2))
Y = np.array([[13, 14, 15, 16], [17, 18, 19, 20]]).reshape((4, 2))
d = mmd(X=X, Y=Y, beta=-0.5, pval=True, kernel="Laplacian")
print("p-value for multivariate data: ", d['pval'])
print("statistic for multivariate data: ", d['stat'])
print("beta for multivariate data: ", d['beta'])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for eummd-0.0.2-cp311-cp311-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08c1d90285ce9d7a31a7acfb7e3b858d2f2420f50c7d382a434d960553152249 |
|
MD5 | 11054a81dcc7651a961ec4be5bb02218 |
|
BLAKE2b-256 | 919550f01023d975e344c85c39cfb650726f282f0e94e620689f4ea7eaa0701e |