Skip to main content

Statistics for Streaming Data

Project description

Documentation Status CI Status Code Coverage Code Style: Black

statstream is a lightweight Python package providing data analysis and statistics utilities for streaming data.

Its main goal is to provide single-pass variants of conventional numpy data analysis and statistics functionality for streaming data that is either generated on the fly or to large to be handled at once. Data can be streamed as in chunks called mini-batches, which makes statstream extremely useful in combination with machine learning and deep learning packages like keras, tensorflow, or pytorch.

statstream functions consume iterators providing batches of data. They compute statistics of these batches and combine them to obtain statistics for the full data set.

import statstream
mean = statstream.streaming_mean(some_iterable)

The Overview and Examples sections of our documentation provide more realistic and complete examples.

Project Information

statstream is released under the MIT license, its documentation lives at Read the Docs, the code on GitHub, and the latest release can soon be found on PyPI. It’s tested on Python 2.7 and 3.4+.

If you’d like to contribute to statstream you’re most welcome. We have written a short guide to help you get you started!

Further Reading

Additional information on the algorithmic aspects of statstream can be found in the following works:

  • Tony F. Chan & Gene H. Golub & Randall J. LeVeque, “Updating formulae and a pairwise algorithm for computing sample variances”, 1979

  • Radim, Rehurek, “Scalability of Semantic Analysis in Natural Language Processing”, 2011

Acknowledgments

During the setup of this project we were heavily influenced and inspired by the works of Hynek Schlawack and in particular his attrs package and blog posts on testing and packaing and deploying to PyPI. Thank you for sharing your experiences and insights.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statstream-19.1.0.tar.gz (36.0 kB view details)

Uploaded Source

Built Distribution

statstream-19.1.0-py2.py3-none-any.whl (12.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file statstream-19.1.0.tar.gz.

File metadata

  • Download URL: statstream-19.1.0.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191101 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for statstream-19.1.0.tar.gz
Algorithm Hash digest
SHA256 81bc8203ebd95adeea38c4f47fa8b85fcaadc5b5d70e4a5cfcaa7e7ca988fa0e
MD5 16cddc8f25300ea077fc96f2f4828c5d
BLAKE2b-256 ab662b9e63cb51e6f747b0e4528b4ef760d7da9681aba05bb2332a42b699d374

See more details on using hashes here.

File details

Details for the file statstream-19.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: statstream-19.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0.post20191101 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for statstream-19.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 73389f00729a682e29001777fcca60daace8fe56a436d4e8ea7c05294e9715bc
MD5 262cdf204538bfe9880f1dc2d3e267c9
BLAKE2b-256 b5676301ac929c2f9cd233db3819f976b185d685678b2b8de419ba4a0f4b7593

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page