Skip to main content

Computes maximum mean discrepancy two-sample test for univariate data using the Laplacian kernel. It is also possible to compute the p-value using permutations.

Project description

eummd

Description

Computes maximum mean discrepancy two-sample test for univariate data using the Laplacian kernel. It is also possible to compute the p-value using permutations. Also includes implementation for computing the robust median difference statistic 'Q_n' from Croux and Rousseeuw (1992) doi:10.1007/978-3-662-26811-7_58 based on Johnson and Mizoguchi (1978) doi:10.1137/0207013.

Installation instructions

Install using pip:

python3 -m pip install eummd

Methods are implemented in C++, and then called using Cython.

Functions

There are various functions for computing the maximum mean discrepancy (MMD).

  • eummd: efficient univariate MMD with the Laplacian kernel, O(n log n) for a total of n samples.
  • mmd: MMD (univariate/multivariate).
  • meammd: multivariate efficient approximate MMD, using either random projections or interpoint distances.
  • mediandiff: computes the median difference. If univariate, can be computed in O(n log n), otherwise is O(n^2).
  • medianheuristic: the inverse of the median difference.

eummd and mmd return a dictionary with the items pval for p-value (if specified), stat for the statistic and beta for the kernel parameter value that was used. If beta is specified as a non-positive number, then the median heuristic will be calculated.

Examples:

from eummd import eummd
from eummd import mediandiff

X = np.array([7.1, 1.2, 4.3, 0.4]).astype(np.float64)
Y = np.array([5.5, 2.6, 8.7]).astype(np.float64)
beta = 0.1

# only computes statistic, 
d = eummd(X, Y, beta, pval=False)
print(d['stat'])

# computes statistic and p-value, and used median heuristic for beta
d = eummd(X, Y, beta=-0.1, pval=True)
print("p-value for univariate data: ", d['pval'])
print("statistic for univariate data: ", d['stat'])
print("beta for univariate data: ", d['beta'])

# show median heuristic was used
print(1.0 / mediandiff(X, Y, fast=True))

# a multivariate example, using median heuristic for beta
X = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]).reshape((6, 2))
Y = np.array([[13, 14, 15, 16], [17, 18, 19, 20]]).reshape((4, 2))

d = mmd(X=X, Y=Y, beta=-0.5, pval=True, kernel="Laplacian")
print("p-value for multivariate data: ", d['pval'])
print("statistic for multivariate data: ", d['stat'])
print("beta for multivariate data: ", d['beta'])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eummd-0.0.2.tar.gz (224.5 kB view details)

Uploaded Source

Built Distribution

eummd-0.0.2-cp311-cp311-macosx_12_0_arm64.whl (229.2 kB view details)

Uploaded CPython 3.11 macOS 12.0+ ARM64

File details

Details for the file eummd-0.0.2.tar.gz.

File metadata

  • Download URL: eummd-0.0.2.tar.gz
  • Upload date:
  • Size: 224.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for eummd-0.0.2.tar.gz
Algorithm Hash digest
SHA256 0a932af282d03a0e07a80f304e8a8ee65281de0686ef864d01a47fd112717f46
MD5 4eaf27d7dd2d4aa592f488ec2c0577e4
BLAKE2b-256 869075c71159f89fe5dbbf6281923eec3817d48814bbbe02c613b35fc9c8f080

See more details on using hashes here.

File details

Details for the file eummd-0.0.2-cp311-cp311-macosx_12_0_arm64.whl.

File metadata

File hashes

Hashes for eummd-0.0.2-cp311-cp311-macosx_12_0_arm64.whl
Algorithm Hash digest
SHA256 08c1d90285ce9d7a31a7acfb7e3b858d2f2420f50c7d382a434d960553152249
MD5 11054a81dcc7651a961ec4be5bb02218
BLAKE2b-256 919550f01023d975e344c85c39cfb650726f282f0e94e620689f4ea7eaa0701e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page