Skip to main content

Efficient batch statistics computation library for Python.

Project description

Logo BatchStats BatchStats

PyPI Version conda Version Documentation Status Unit tests

batchstats is a Python package for computing statistics on data that arrives in batches. It's perfect for streaming data or datasets too large to fit into memory.

For detailed information, please check out the full documentation.

Installation

Install batchstats using pip:

pip install batchstats

Or with conda:

conda install -c conda-forge batchstats

Quick Start

Here's how to compute the mean and variance of a dataset in batches:

import numpy as np
from batchstats import BatchMean, BatchVar

# Simulate a data stream
data_stream = (np.random.randn(100, 10) for _ in range(10))

# Initialize the stat objects
batch_mean = BatchMean()
batch_var = BatchVar()

# Process each batch
for batch in data_stream:
    batch_mean.update_batch(batch)
    batch_var.update_batch(batch)

# Get the final result
mean = batch_mean()
variance = batch_var()

print(f"Mean shape: {mean.shape}")
print(f"Variance shape: {variance.shape}")

Handling NaN Values

batchstats provides BatchNan* classes to handle NaN values, similar to numpy's nan* functions.

import numpy as np
from batchstats import BatchNanMean

# Create data with NaNs
data = np.random.randn(1000, 5)
data[::10] = np.nan

# Compute the mean, ignoring NaNs
nan_mean = BatchNanMean().update_batch(data)()

print(f"NaN-aware mean shape: {nan_mean.shape}")

Available Statistics

batchstats supports a variety of common statistics:

  • BatchSum / BatchNanSum
  • BatchMean / BatchNanMean
  • BatchVar
  • BatchStd
  • BatchMin
  • BatchMax
  • BatchPeakToPeak
  • BatchCov

For more details on each class, see the API Reference.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batchstats-0.5.1.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

batchstats-0.5.1-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file batchstats-0.5.1.tar.gz.

File metadata

  • Download URL: batchstats-0.5.1.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for batchstats-0.5.1.tar.gz
Algorithm Hash digest
SHA256 a2e5c3b600c8023a49b71b428f54c479fcb54b457a6fe1f5874ca344c56c9c46
MD5 3b74d3a641fb9fd5f42ca643d034d6e6
BLAKE2b-256 116e342416fea891379364a874eefc90ab803018f363692fc0f9da2e17695fc3

See more details on using hashes here.

File details

Details for the file batchstats-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: batchstats-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for batchstats-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d7875b0fd1786e6ae5e702e2517320d9c45d4ad1305061d7a5f0bd646818bed
MD5 3f0838a2aa7a11ec0cad2da0f86c810c
BLAKE2b-256 faa5dd48017c3cc357946189d4ee34c35e8faf51b19e0a4def3ce4cac41e6797

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page