Statistics for Streaming Data
Project description
statstream is a lightweight Python package providing data analysis and statistics utilities for streaming data.
Its main goal is to provide single-pass variants of conventional numpy data analysis and statistics functionality for streaming data that is either generated on the fly or to large to be handled at once. Data can be streamed as in chunks called mini-batches, which makes statstream extremely useful in combination with machine learning and deep learning packages like keras, tensorflow, or pytorch.
statstream functions consume iterators providing batches of data. They compute statistics of these batches and combine them to obtain statistics for the full data set.
import statstream
mean = statstream.streaming_mean(some_iterable)
The Overview and Examples sections of our documentation provide more realistic and complete examples.
Project Information
statstream is released under the MIT license, its documentation lives at Read the Docs, the code on GitHub, and the latest release can be found on PyPI. It’s tested on Python 2.7 and 3.5+.
If you’d like to contribute to statstream you’re most welcome. We have written a short guide to help you get you started!
Further Reading
Additional information on the algorithmic aspects of statstream can be found in the following works:
Tony F. Chan & Gene H. Golub & Randall J. LeVeque, “Updating formulae and a pairwise algorithm for computing sample variances”, 1979
Radim, Rehurek, “Scalability of Semantic Analysis in Natural Language Processing”, 2011
Acknowledgments
During the setup of this project we were heavily influenced and inspired by the works of Hynek Schlawack and in particular his attrs package and blog posts on testing and packaing and deploying to PyPI. Thank you for sharing your experiences and insights.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file statstream-22.1.0.tar.gz
.
File metadata
- Download URL: statstream-22.1.0.tar.gz
- Upload date:
- Size: 36.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.8.3 requests/2.28.1 setuptools/65.0.2 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db87823438849985e7951f639a87182f5910f865e7c4fda9d085c22e23d72d54 |
|
MD5 | 6afe44f964dcfb5e13c11a2ebf178e59 |
|
BLAKE2b-256 | ca7680436c7f3964c76b8581994ad0d75f8297f1887ef3fe43fbd19f79e41d57 |
File details
Details for the file statstream-22.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: statstream-22.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.8.3 requests/2.28.1 setuptools/65.0.2 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e862dc502786a6f33a48e0d62d2c5e8fa633fbd6ef5a92a689765e0829bc6b6a |
|
MD5 | 0e5caba2214b398764eeecaa7a88aa78 |
|
BLAKE2b-256 | c958d991aa0aa07feeec14bc5cf70964b0559fcf0d13c51861ffb58f2aa11a35 |