Skip to main content

Estimates statistical moments in online or distributed settings.

Project description

pytest pytest

OMoment: Efficient online calculation of statistical moments

OMoment package calculates moments of statistical distributions (mean and variance) in online or distributed settings.

  • Suitable for large data – works well with numpy and Pandas and in distributed setting.

  • Moments calculated from different parts of data can be easily combined or updated for new data (supports addition of results).

  • Objects are lightweight, calculation is done in numpy if possible.

  • Weights for data can be provided.

  • Invalid values (NaNs, infinities are omitted by default).

Typical application is calculation of means and variances of many chunks of data (corresponding to different groups or to different parts of the distributed data), the results can be analyzed on level of the groups or easily combined to get exact moments for the full dataset.

Basic example

from omoment import OMeanVar
import numpy as np
import pandas as pd

rng = np.random.default_rng(12354)
g = rng.integers(low=0, high=10, size=1000)
x = g + rng.normal(loc=0, scale=10, size=1000)
w = rng.exponential(scale=1, size=1000)

# calculate overall moments
OMeanVar(x, weight=w)
# should give: OMeanVar(mean=4.6, var=108, weight=1.08e+03)

# or calculate moments for every group
df = pd.DataFrame({'g': g, 'x': x, 'w': w})
omvs = df.groupby('g').apply(OMeanVar.of_frame, x='x', w='w')

# and combine group moments to obtain the same overall results
OMeanVar.combine(omvs)

# addition is also supported
omvs.loc[0] + omvs.loc[1]

At the moment, only univariate distributions are supported. Bivariate or even multivariate distributions can be efficiently processed in a similar fashion, so the support for them might be added in the future. Moments of multivariate distributions would also allow for linear regression estimation and other statistical methods (such as PCA or regularized regression) to be calculated in a single pass through large distributed datasets.

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omoment-0.1.1.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

omoment-0.1.1-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file omoment-0.1.1.tar.gz.

File metadata

  • Download URL: omoment-0.1.1.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for omoment-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2c475591c5ee5ce4487a6eb702ed2b0339a6f78eaa2f0d29ba826abdde7a5014
MD5 18b442bd6918d678625e4e61f707b5bc
BLAKE2b-256 6fba79a26a8c63f87fa654bffa40cea67e92bd0f0565fad7525ffc185b7030ed

See more details on using hashes here.

File details

Details for the file omoment-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: omoment-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for omoment-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3acff6555f3996545d1fe37904c1347c63b1e7038a096212e2e52b31d71c5cab
MD5 857efd8290271c548a900c6a3fb6e7f9
BLAKE2b-256 ae4cdac69015575b20411cdf55ccb8c5c080b2bf497c43c8288226827b9c159e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page