Skip to main content

Efficient batch statistics computation library for Python.

Project description

Logo BatchStats

PyPI Version conda Version Documentation Status Unit tests Codacy Badge

batchstats is a Python package for computing statistics on data that arrives in batches. It's perfect for streaming data or datasets too large to fit into memory.

For detailed information, please check out the full documentation.

Installation

Install batchstats using pip:

pip install batchstats

Or with conda:

conda install -c conda-forge batchstats

Quick Start

Here's how to compute the mean and variance of a dataset in batches:

import numpy as np
from batchstats import BatchMean, BatchVar

# Simulate a data stream
data_stream = (np.random.randn(100, 10) for _ in range(10))

# Initialize the stat objects
batch_mean = BatchMean()
batch_var = BatchVar()

# Process each batch
for batch in data_stream:
    batch_mean.update_batch(batch)
    batch_var.update_batch(batch)

# Get the final result
mean = batch_mean()
variance = batch_var()

print(f"Mean shape: {mean.shape}")
print(f"Variance shape: {variance.shape}")

Advanced Usage

batchstats handles n-dimensional np.ndarray inputs and allows specifying multiple axes for reduction, just like numpy.

import numpy as np
from batchstats import BatchMean

# Create a 3D data stream
data_stream = (np.random.rand(10, 5, 8) for _ in range(5))

# Compute the mean over the last two axes (1 and 2)
batch_mean_3d = BatchMean(axis=(1, 2))

for batch in data_stream:
    batch_mean_3d.update_batch(batch)

mean_3d = batch_mean_3d()

print(f"3D Mean shape: {mean_3d.shape}")

Handling NaN Values

batchstats provides BatchNan* classes to handle NaN values, similar to numpy's nan* functions.

import numpy as np
from batchstats import BatchNanMean

# Create data with NaNs
data = np.random.randn(1000, 5)
data[::10] = np.nan

# Compute the mean, ignoring NaNs
nan_mean = BatchNanMean().update_batch(data)()

print(f"NaN-aware mean shape: {nan_mean.shape}")

Available Statistics

batchstats supports a variety of common statistics:

  • BatchSum / BatchNanSum
  • BatchMean / BatchNanMean
  • BatchMin / BatchNanMin
  • BatchMax / BatchNanMax
  • BatchPeakToPeak / BatchNanPeakToPeak
  • BatchVar
  • BatchStd
  • BatchCov

For more details on each class, see the API Reference.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batchstats-0.5.2.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

batchstats-0.5.2-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file batchstats-0.5.2.tar.gz.

File metadata

  • Download URL: batchstats-0.5.2.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for batchstats-0.5.2.tar.gz
Algorithm Hash digest
SHA256 9c8e0cfbc4d8fb5f2c2886243bcffa12164a004faceebf56f212f252806a8756
MD5 2b1323491192aee7ede865ca8fb0b132
BLAKE2b-256 8415be59e38f81fe562666e14537288aaa92d33cec3b0e9a0fa3f04c11bad193

See more details on using hashes here.

File details

Details for the file batchstats-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: batchstats-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for batchstats-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9bb0a5c37e0788834ddff15895654f11936ad5a4c951f720817e33024e081df1
MD5 f4d653d6a6d265a62badcd4464568ac7
BLAKE2b-256 d9139537a516e547309f42b047515b5e4b97be07903d6e787c794850c65be864

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page