Skip to main content

OMoment package calculates moments of statistical distributions (means, variances, covariance) in online or

Project description

pytest doc

OMoment: Efficient online calculation of statistical moments

OMoment package calculates moments of statistical distributions (means, variances, covariance) in online or distributed settings for univariate and bivariate distributions.

  • Suitable for large data – works well with numpy and Pandas and in distributed setting.

  • Moments calculated from different parts of data can be easily combined or updated for new data (supports addition of results).

  • Objects are lightweight, calculation is done in numpy if possible.

  • Weights for data can be provided.

  • Invalid values (NaNs, infinities are omitted by default).

Typical application is calculation of means and variances (or even correlation of two variables) of many chunks of data (corresponding to different groups or to different parts of the distributed data), the results can be analyzed on level of the groups or easily combined to get exact moments for the full dataset.

Basic example

from omoment import OMeanVar
import numpy as np
import pandas as pd

rng = np.random.default_rng(12354)
g = rng.integers(low=0, high=10, size=1000)
x = g + rng.normal(loc=0, scale=10, size=1000)
w = rng.exponential(scale=1, size=1000)

# calculate overall moments
OMeanVar.compute(x, w)
# should give: OMeanVar(mean=4.6, var=108, weight=1.08e+03)

# or calculate moments for every group
df = pd.DataFrame({'g': g, 'x': x, 'w': w})
omvs = df.groupby('g').apply(OMeanVar.of_frame, x='x', w='w')

# and combine group moments to obtain the same overall results
OMeanVar.combine(omvs)

# addition is also supported
omvs.loc[0] + omvs.loc[1]

At the moment, univariate and bivariate distributions are supported. Bivariate distributions allow for fast linear regression with two variables (and constant) calculation. Even multivariate distributions can be efficiently processed in a similar fashion, so the support for them might be added in the future. Moments of multivariate distributions would also allow for linear regression estimation and other statistical methods (such as PCA or regularized regression) to be calculated in a single pass through large distributed datasets.

Similar packages

OMoment package aims for fast calculation of weighted distribution moments (mean and variance at the moment), great compatibility with numpy and pandas and suitability for distributed datasets (composability of results). I have not found a package that would satisfy this, even though similar packages indeed exist.

RunStats

RunStats package calculates several moments of univariate distribution (including skewness and kurtosis) and a few other statistics (min and max) and the results can be combined together. In addition, it provides Regression object for bivariate statistics. It does not support weights and the calculation was more than 100x slower in my testing (admittedly I am not sure if I used cython support correctly).

import numpy as np
from omoment import OMeanVar
from runstats import Statistics
import time

rng = np.random.Generator(np.random.PCG64(12345))
x = rng.normal(size=1_000_000)

start = time.time()
omv = OMeanVar.compute(x)
end = time.time()
print(f'{end - start:.3g} seconds')
# 0.0146 seconds

start = time.time()
st = Statistics(x)
end = time.time()
print(f'{end - start:.3g} seconds')
# 2.83 seconds

Gym

OpenAI Gym (or newly Gymnasium) provides similar functionality as a part of its normalization of observations and rewards (in gym.wrappers.normalize.RunningMeanStd). The functionality is fairly limited as it was developed for a particular use case, but the calculation is fast, and it is possible to compose the results. It does not support weights though.

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omoment-0.1.5.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

omoment-0.1.5-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file omoment-0.1.5.tar.gz.

File metadata

  • Download URL: omoment-0.1.5.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for omoment-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c20e8bf32cb37102d6e09d00b6b05c36bfa4705d1c9353ad86abb382d04dc9ad
MD5 a07af0b1c3473abd1b1b186daf5251fa
BLAKE2b-256 c66a64b36a0024d03fa349305438092ca2ebae0edef5168bc33c44e0475bd57d

See more details on using hashes here.

File details

Details for the file omoment-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: omoment-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for omoment-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 db559c6241bb3b6a6cb8d064e01b66d52325a53a6f3c18dac847c810d7f848c4
MD5 499cd3e25d81ca5b013eeaa2ab1a23bd
BLAKE2b-256 84ee76663a077e36cf12f455940915156ac93ef199fa0172a4dc8b4d8e833b06

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page