Skip to main content

Statistics for Streaming Data

Project description

Documentation Status CI Status Code Coverage Code Style: Black

statstream is a lightweight Python package providing data analysis and statistics utilities for streaming data.

Its main goal is to provide single-pass variants of conventional numpy data analysis and statistics functionality for streaming data that is either generated on the fly or to large to be handled at once. Data can be streamed as in chunks called mini-batches, which makes statstream extremely useful in combination with machine learning and deep learning packages like keras, tensorflow, or pytorch.

statstream functions consume iterators providing batches of data. They compute statistics of these batches and combine them to obtain statistics for the full data set.

import statstream
mean = statstream.streaming_mean(some_iterable)

The Overview and Examples sections of our documentation provide more realistic and complete examples.

Project Information

statstream is released under the MIT license, its documentation lives at Read the Docs, the code on GitHub, and the latest release can be found on PyPI. It’s tested on Python 2.7 and 3.5+.

If you’d like to contribute to statstream you’re most welcome. We have written a short guide to help you get you started!

Further Reading

Additional information on the algorithmic aspects of statstream can be found in the following works:

  • Tony F. Chan & Gene H. Golub & Randall J. LeVeque, “Updating formulae and a pairwise algorithm for computing sample variances”, 1979

  • Radim, Rehurek, “Scalability of Semantic Analysis in Natural Language Processing”, 2011

Acknowledgments

During the setup of this project we were heavily influenced and inspired by the works of Hynek Schlawack and in particular his attrs package and blog posts on testing and packaing and deploying to PyPI. Thank you for sharing your experiences and insights.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statstream-22.1.0.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

statstream-22.1.0-py2.py3-none-any.whl (12.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file statstream-22.1.0.tar.gz.

File metadata

  • Download URL: statstream-22.1.0.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.8.3 requests/2.28.1 setuptools/65.0.2 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.7.3

File hashes

Hashes for statstream-22.1.0.tar.gz
Algorithm Hash digest
SHA256 db87823438849985e7951f639a87182f5910f865e7c4fda9d085c22e23d72d54
MD5 6afe44f964dcfb5e13c11a2ebf178e59
BLAKE2b-256 ca7680436c7f3964c76b8581994ad0d75f8297f1887ef3fe43fbd19f79e41d57

See more details on using hashes here.

File details

Details for the file statstream-22.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: statstream-22.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.8.3 requests/2.28.1 setuptools/65.0.2 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.7.3

File hashes

Hashes for statstream-22.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e862dc502786a6f33a48e0d62d2c5e8fa633fbd6ef5a92a689765e0829bc6b6a
MD5 0e5caba2214b398764eeecaa7a88aa78
BLAKE2b-256 c958d991aa0aa07feeec14bc5cf70964b0559fcf0d13c51861ffb58f2aa11a35

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page