Unbiased estimators for multivariate statistical moments
Project description
PyMoments
PyMoments is a toolkit for unbiased estimation of multivariate statistical moments. In the current version (1.0.0), only multivariate kstatistics are implemented, allowing for the unbiased estimation of cumulants. An implementation of hstatistics (for unbiased estimation of central moments) is planned for a future release.
Installation
PyMoments can be installed either from the GitHub source, or via PyPI. To install the package from GitHub, first clone the repository:
$ git clone https://github.com/KevinDalySmith/PyMoments.git
Navigate to the root directory, which contains the setup.py
file.
You can then use setuptools to install the package and run the unit tests:
$ python setup.py install
$ python setup.py test
Alternatively, PyMoments can be installed using PyPI:
$ pip install PyMoments
Note that PyMoments requires NumPy. I have only tested the package with NumPy version 1.16.5, but it should be compatible with older versions.
Basic Usage
The simplest use case is to compute a multivariate kstatistic from a 2D array of data. For example, generate a random sample from a multivariate normal distribution:
import numpy as np mu = np.zeros(3,) sigma = np.array([ [2, 0, 1], [0, 2, 1], [1, 1, 2] ]) data = np.random.multivariate_normal(mu, sigma, size=(1000,))
The data
variable is a 1000 x 3 array, where each column corresponds to one of the three
random variables in the joint distribution, and each row is an observation.
The first kstatistic is identical to the sample mean. Thus, we can compare the firstorder kstatistics for each column of the array, to the simple average:
from PyMoments import kstat first_order_kstats = [ kstat(data, (0,)), kstat(data, (1,)), kstat(data, (2,)) ] sample_means = np.mean(data, axis=0) print('Firstorder kstatistics:', first_order_kstats) print('Sample means:', sample_means)
The method kstat(data, (0,))
computes the firstorder kstatistic, from the first column
of the data. As the console output of this example reveals, the firstorder kstatistics are
indeed equivalent to the sample means, up to floating point error.
Secondorder kstatistics are identical to covariances. Therefore, we can compare the values
of the kstat()
function to the covariance matrix of the data:
second_order_kstats = np.zeros((3, 3)) for i in range(3): for j in range(3): second_order_kstats[i, j] = kstat(data, (i, j)) print('Secondorder kstatistics:') print(second_order_kstats) print('Covariance matrix:') print(np.cov(data.T))
Higherorder kstatistics become difficult to express in terms of familiar statistics, but they still provide insight into the underlying distribution. Thirdorder kstatistics, for example, are related to the skewness of a distribution. Normal distributions have zero skew, and therefore, all thirdorder kstatistics should be fairly close to zero:
print('Sample thirdorder kstatistics:') print(kstat(data, (0, 1, 2))) print(kstat(data, (0, 0, 1))) print(kstat(data, (2, 2, 2)))
The second argument to the kstat()
method specifies a multiset of column indices
(as a sequence of integers), which encodes the particular kstatistic to be computed.
The order of the kstatistic is equal to the length of this multiset.
Note that kstat()
is symmetric with respect to this argument, i.e., permuting the
indices will have no affect on the output. Repeated indices are allowed; in fact, repeated
indices are required to compute classical univariate kstatistics:
new_data = np.random.randn(1000, 1) print('First univariate kstat:') print(kstat(new_data, (0,))) print('Second univariate kstat:') print(kstat(new_data, (0, 0))) print('Third univariate kstat:') print(kstat(new_data, (0, 0, 0))) print('Fourth univariate kstat:') print(kstat(new_data, (0, 0, 0, 0)))
It should also be pointed out that kstatistics become noisy and difficult to compute with increasing order. For example, try evaluating a 9thorder univariate cumulant on a new sample several times:
kstat(np.random.randn(1000, 1), (0,) * 9)
Not only does the function take several seconds to return, but repeats of this experiment can
lead to widely different values of the kstatistic, despite the fact that normal
distributions have ninthorder cumulants of zero.
The runtime of kstat()
on nthorder indices scales with Bell's number B(n), so...
I do not recommend trying kstatistics of order 10 or higher.
License, Citation, and Acknowledgements
PyMoments by Kevin D. Smith is licensed under a noncommercial Creative Commons license (CC BYNC 4.0). When possible, please cite this package:
@misc{PyMoments:2020,
author = {Kevin D. Smith},
title = {PyMoments: A Python toolkit for unbiased estimation of multivariate statistical moments},
howpublished = {\texttt{https://github.com/KevinDalySmith/PyMoments}},
year = 2020
}
This work was supported in part by the U.S. Defense Threat Reduction Agency, under grant HDTRA11910017.
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size PyMoments1.0.1py3noneany.whl (95.3 kB)  File type Wheel  Python version py3  Upload date  Hashes View 
Filename, size PyMoments1.0.1.tar.gz (94.9 kB)  File type Source  Python version None  Upload date  Hashes View 
Hashes for PyMoments1.0.1py3noneany.whl
Algorithm  Hash digest  

SHA256  8ad5c69e7d0194634e357573827137b74575a39fd3b818067ea44dace71d0e85 

MD5  d681733a2699be22de5df741fd30e156 

BLAKE2256  220c6a8f5a54cb6b020897421bed9a39df408f1086135ea24f74153113bd4838 