Skip to main content

Unbiased estimators for multivariate statistical moments

Project description

PyMoments

PyMoments is a toolkit for unbiased estimation of multivariate statistical moments. In the current version (1.0.0), only multivariate k-statistics are implemented, allowing for the unbiased estimation of cumulants. An implementation of h-statistics (for unbiased estimation of central moments) is planned for a future release.

Installation

PyMoments can be installed either from the GitHub source, or via PyPI. To install the package from GitHub, first clone the repository:

$ git clone https://github.com/KevinDalySmith/PyMoments.git

Navigate to the root directory, which contains the setup.py file. You can then use setuptools to install the package and run the unit tests:

$ python setup.py install
$ python setup.py test

Alternatively, PyMoments can be installed using PyPI:

$ pip install PyMoments

Note that PyMoments requires NumPy. I have only tested the package with NumPy version 1.16.5, but it should be compatible with older versions.

Basic Usage

The simplest use case is to compute a multivariate k-statistic from a 2D array of data. For example, generate a random sample from a multivariate normal distribution:

import numpy as np
mu = np.zeros(3,)
sigma = np.array([
    [2, 0, 1],
    [0, 2, -1],
    [1, -1, 2]
])
data = np.random.multivariate_normal(mu, sigma, size=(1000,))

The data variable is a 1000 x 3 array, where each column corresponds to one of the three random variables in the joint distribution, and each row is an observation.

The first k-statistic is identical to the sample mean. Thus, we can compare the first-order k-statistics for each column of the array, to the simple average:

from PyMoments import kstat

first_order_kstats = [
    kstat(data, (0,)),
    kstat(data, (1,)),
    kstat(data, (2,))
]

sample_means = np.mean(data, axis=0)

print('First-order k-statistics:', first_order_kstats)
print('Sample means:', sample_means) 

The method kstat(data, (0,)) computes the first-order k-statistic, from the first column of the data. As the console output of this example reveals, the first-order k-statistics are indeed equivalent to the sample means, up to floating point error.

Second-order k-statistics are identical to covariances. Therefore, we can compare the values of the kstat() function to the covariance matrix of the data:

second_order_kstats = np.zeros((3, 3))
for i in range(3):
    for j in range(3):
        second_order_kstats[i, j] = kstat(data, (i, j))

print('Second-order k-statistics:')
print(second_order_kstats)
print('Covariance matrix:')
print(np.cov(data.T))

Higher-order k-statistics become difficult to express in terms of familiar statistics, but they still provide insight into the underlying distribution. Third-order k-statistics, for example, are related to the skewness of a distribution. Normal distributions have zero skew, and therefore, all third-order k-statistics should be fairly close to zero:

print('Sample third-order k-statistics:') 
print(kstat(data, (0, 1, 2)))
print(kstat(data, (0, 0, 1)))
print(kstat(data, (2, 2, 2)))

The second argument to the kstat() method specifies a multiset of column indices (as a sequence of integers), which encodes the particular k-statistic to be computed. The order of the k-statistic is equal to the length of this multiset. Note that kstat() is symmetric with respect to this argument, i.e., permuting the indices will have no affect on the output. Repeated indices are allowed; in fact, repeated indices are required to compute classical univariate k-statistics:

new_data = np.random.randn(1000, 1)

print('First univariate k-stat:')
print(kstat(new_data, (0,)))

print('Second univariate k-stat:')
print(kstat(new_data, (0, 0)))

print('Third univariate k-stat:')
print(kstat(new_data, (0, 0, 0)))

print('Fourth univariate k-stat:')
print(kstat(new_data, (0, 0, 0, 0)))

It should also be pointed out that k-statistics become noisy and difficult to compute with increasing order. For example, try evaluating a 9th-order univariate cumulant on a new sample several times:

kstat(np.random.randn(1000, 1), (0,) * 9)

Not only does the function take several seconds to return, but repeats of this experiment can lead to widely different values of the k-statistic, despite the fact that normal distributions have ninth-order cumulants of zero. The runtime of kstat() on nth-order indices scales with Bell's number B(n), so... I do not recommend trying k-statistics of order 10 or higher.

License, Citation, and Acknowledgements

PyMoments by Kevin D. Smith is licensed under a non-commercial Creative Commons license (CC BY-NC 4.0). When possible, please cite this package:

@misc{PyMoments:2020,
  author = {Kevin D. Smith},
  title = {PyMoments: A Python toolkit for unbiased estimation of multivariate statistical moments},
  howpublished = {\texttt{https://github.com/KevinDalySmith/PyMoments}},
  year = 2020 
}

This work was supported in part by the U.S. Defense Threat Reduction Agency, under grant HDTRA1-19-1-0017.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyMoments-1.0.1.tar.gz (94.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PyMoments-1.0.1-py3-none-any.whl (95.3 kB view details)

Uploaded Python 3

File details

Details for the file PyMoments-1.0.1.tar.gz.

File metadata

  • Download URL: PyMoments-1.0.1.tar.gz
  • Upload date:
  • Size: 94.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for PyMoments-1.0.1.tar.gz
Algorithm Hash digest
SHA256 eab9ee23e978567f4f28b24e8a1fd43cba9664cd86431fdb13cf74714840c49f
MD5 d36d92797ce40719c30bc65c94a4abdb
BLAKE2b-256 b1fc373d284320e5796ff241191c4b123e9c46768bbc141a697919bf6cfcb391

See more details on using hashes here.

File details

Details for the file PyMoments-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: PyMoments-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 95.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for PyMoments-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8ad5c69e7d0194634e357573827137b74575a39fd3b818067ea44dace71d0e85
MD5 d681733a2699be22de5df741fd30e156
BLAKE2b-256 220c6a8f5a54cb6b020897421bed9a39df408f1086135ea24f74153113bd4838

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page