Compute statistics and regression in one pass
Project description
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Description: RunStats: Computing Statistics and Regression in One Pass
=========================================================
`RunStats <http://www.grantjenks.com/docs/runstats/>`_ is an Apache2 licensed
Python module that computes statistics and regression in a single pass.
Long running systems often generate numbers summarizing performance. It could
be the latency of a response or the time between requests. It's often useful to
use these numbers in summary statistics like the arithmetic mean, minimum,
standard deviation, etc. When many values are generated, computing these
summaries can be computationally intensive. It may even be infeasible to keep
every recorded value. In such cases computing online statistics and online
regression is necessary.
In other cases, you may only have one opportunity to observe all the recorded
values. Python's generators work exactly this way. Traditional methods for
calculating the variance and other higher moments requires multiple passes over
the data. With generators, this is not possible and so computing statistics in
a single pass is necessary.
The Python RunStats module was designed for these cases by providing a pair of
classes for computing summary statistics and linear regression in a single
pass. Summary objects work on series which may be larger than memory or disk
space permit. They may also be efficiently combined together to create
aggregate measures.
Features
--------
- Pure-Python
- Fully Documented
- 100% Test Coverage
- Numerically Stable
- Statistics summary computes mean, variance, standard deviation, skewness,
kurtosis, minimum and maximum.
- Regression summary computes slope, intercept and correlation.
- Developed on Python 2.7
- Tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy 2.5+, PyPy3 2.4+
Quickstart
----------
Installing RunStats is simple with
`pip <http://www.pip-installer.org/>`_::
$ pip install runstats
You can access documentation in the interpreter with Python's built-in help
function::
>>> from runstats import Statistics, Regression
>>> help(Statistics)
>>> help(Regression)
Tutorial
--------
The Python runstats module provides two types for computing running Statistics
and Regression. The Regression object leverages Statistics internally for its
calculations. Each can be initialized without arguments::
>>> from runstats import Statistics, Regression
>>> stats = Statistics()
>>> regr = Regression()
Statistics objects support three methods for modification. Use `push` to add
values to the summary, `clear` to reset the summary, and sum to combine
Statistics summaries::
>>> for num in range(10):
... stats.push(num)
>>> stats.mean()
4.5
>>> stats.maximum()
9
>>> stats += stats
>>> stats.mean()
4.5
>>> stats.variance()
8.68421052631579
>>> len(stats)
20
>>> stats.clear()
>>> len(stats)
0
>>> stats.minimum() is None
True
Use the Python built-in `len` for the number of pushed values. Unfortunately
the Python `min` and `max` built-ins may not be used for the minimum and
maximum as sequences are instead expected. There are instead `minimum` and
`maximum` methods which are provided for that purpose::
>>> import random
>>> random.seed(0)
>>> for __ in range(1000):
... stats.push(random.random())
>>> len(stats)
1000
>>> min(stats)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: iteration over non-sequence
>>> stats.minimum()
0.00024069652516689466
>>> stats.maximum()
0.9996851255769114
Statistics summaries provide five measures of a series: mean, variance,
standard deviation, skewness and kurtosis::
>>> stats = Statistics([1, 2, 5, 12, 5, 2, 1])
>>> stats.mean()
4.0
>>> stats.variance()
15.33333333333333
>>> stats.stddev()
3.915780041490243
>>> stats.skewness()
1.33122127314735
>>> stats.kurtosis()
0.5496219281663506
All internal calculations use Python's `float` type.
Like Statistics, the Regression type supports three methods for modification:
`push`, `clear` and sum::
>>> regr.clear()
>>> len(regr)
0
>>> for num in range(10):
... regr.push(num, num + 5)
>>> len(regr)
10
>>> regr.slope()
1.0
>>> more = Regression((num, num + 5) for num in range(10, 20))
>>> total = regr + more
>>> len(total)
20
>>> total.slope()
1.0
>>> total.intercept()
5.0
>>> total.correlation()
1.0
Regression summaries provide three measures of a series of pairs: slope,
intercept and correlation. Note that, as a regression, the points need not
exactly lie on a line::
>>> regr = Regression([(1.2, 1.9), (3, 5.1), (4.9, 8.1), (7, 11)])
>>> regr.slope()
1.5668320150154176
>>> regr.intercept()
0.21850113956294415
>>> regr.correlation()
0.9983810791694997
Both constructors accept an optional iterable that is consumed and pushed into
the summary. Note that you may pass a generator as an iterable and the
generator will be entirely consumed.
All internal calculations are based entirely on the C++ code by John Cook as
posted in a couple of articles:
* `Computing Skewness and Kurtosis in One Pass`_
* `Computing Linear Regression in One Pass`_
.. _`Computing Skewness and Kurtosis in One Pass`: http://www.johndcook.com/blog/skewness_kurtosis/
.. _`Computing Linear Regression in One Pass`: http://www.johndcook.com/blog/running_regression/
Reference and Indices
---------------------
* `RunStats Documentation`_
* `RunStats API Reference`_
* `RunStats at PyPI`_
* `RunStats at GitHub`_
* `RunStats Issue Tracker`_
.. _`RunStats Documentation`: http://www.grantjenks.com/docs/runstats/
.. _`RunStats API Reference`: http://www.grantjenks.com/docs/runstats/api.html
.. _`RunStats at PyPI`: https://pypi.python.org/pypi/runstats/
.. _`RunStats at GitHub`: https://github.com/grantjenks/python_runstats/
.. _`RunStats Issue Tracker`: https://github.com/grantjenks/python_runstats/issues/
License
-------
Copyright 2015 Grant Jenks
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Description: RunStats: Computing Statistics and Regression in One Pass
=========================================================
`RunStats <http://www.grantjenks.com/docs/runstats/>`_ is an Apache2 licensed
Python module that computes statistics and regression in a single pass.
Long running systems often generate numbers summarizing performance. It could
be the latency of a response or the time between requests. It's often useful to
use these numbers in summary statistics like the arithmetic mean, minimum,
standard deviation, etc. When many values are generated, computing these
summaries can be computationally intensive. It may even be infeasible to keep
every recorded value. In such cases computing online statistics and online
regression is necessary.
In other cases, you may only have one opportunity to observe all the recorded
values. Python's generators work exactly this way. Traditional methods for
calculating the variance and other higher moments requires multiple passes over
the data. With generators, this is not possible and so computing statistics in
a single pass is necessary.
The Python RunStats module was designed for these cases by providing a pair of
classes for computing summary statistics and linear regression in a single
pass. Summary objects work on series which may be larger than memory or disk
space permit. They may also be efficiently combined together to create
aggregate measures.
Features
--------
- Pure-Python
- Fully Documented
- 100% Test Coverage
- Numerically Stable
- Statistics summary computes mean, variance, standard deviation, skewness,
kurtosis, minimum and maximum.
- Regression summary computes slope, intercept and correlation.
- Developed on Python 2.7
- Tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy 2.5+, PyPy3 2.4+
Quickstart
----------
Installing RunStats is simple with
`pip <http://www.pip-installer.org/>`_::
$ pip install runstats
You can access documentation in the interpreter with Python's built-in help
function::
>>> from runstats import Statistics, Regression
>>> help(Statistics)
>>> help(Regression)
Tutorial
--------
The Python runstats module provides two types for computing running Statistics
and Regression. The Regression object leverages Statistics internally for its
calculations. Each can be initialized without arguments::
>>> from runstats import Statistics, Regression
>>> stats = Statistics()
>>> regr = Regression()
Statistics objects support three methods for modification. Use `push` to add
values to the summary, `clear` to reset the summary, and sum to combine
Statistics summaries::
>>> for num in range(10):
... stats.push(num)
>>> stats.mean()
4.5
>>> stats.maximum()
9
>>> stats += stats
>>> stats.mean()
4.5
>>> stats.variance()
8.68421052631579
>>> len(stats)
20
>>> stats.clear()
>>> len(stats)
0
>>> stats.minimum() is None
True
Use the Python built-in `len` for the number of pushed values. Unfortunately
the Python `min` and `max` built-ins may not be used for the minimum and
maximum as sequences are instead expected. There are instead `minimum` and
`maximum` methods which are provided for that purpose::
>>> import random
>>> random.seed(0)
>>> for __ in range(1000):
... stats.push(random.random())
>>> len(stats)
1000
>>> min(stats)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: iteration over non-sequence
>>> stats.minimum()
0.00024069652516689466
>>> stats.maximum()
0.9996851255769114
Statistics summaries provide five measures of a series: mean, variance,
standard deviation, skewness and kurtosis::
>>> stats = Statistics([1, 2, 5, 12, 5, 2, 1])
>>> stats.mean()
4.0
>>> stats.variance()
15.33333333333333
>>> stats.stddev()
3.915780041490243
>>> stats.skewness()
1.33122127314735
>>> stats.kurtosis()
0.5496219281663506
All internal calculations use Python's `float` type.
Like Statistics, the Regression type supports three methods for modification:
`push`, `clear` and sum::
>>> regr.clear()
>>> len(regr)
0
>>> for num in range(10):
... regr.push(num, num + 5)
>>> len(regr)
10
>>> regr.slope()
1.0
>>> more = Regression((num, num + 5) for num in range(10, 20))
>>> total = regr + more
>>> len(total)
20
>>> total.slope()
1.0
>>> total.intercept()
5.0
>>> total.correlation()
1.0
Regression summaries provide three measures of a series of pairs: slope,
intercept and correlation. Note that, as a regression, the points need not
exactly lie on a line::
>>> regr = Regression([(1.2, 1.9), (3, 5.1), (4.9, 8.1), (7, 11)])
>>> regr.slope()
1.5668320150154176
>>> regr.intercept()
0.21850113956294415
>>> regr.correlation()
0.9983810791694997
Both constructors accept an optional iterable that is consumed and pushed into
the summary. Note that you may pass a generator as an iterable and the
generator will be entirely consumed.
All internal calculations are based entirely on the C++ code by John Cook as
posted in a couple of articles:
* `Computing Skewness and Kurtosis in One Pass`_
* `Computing Linear Regression in One Pass`_
.. _`Computing Skewness and Kurtosis in One Pass`: http://www.johndcook.com/blog/skewness_kurtosis/
.. _`Computing Linear Regression in One Pass`: http://www.johndcook.com/blog/running_regression/
Reference and Indices
---------------------
* `RunStats Documentation`_
* `RunStats API Reference`_
* `RunStats at PyPI`_
* `RunStats at GitHub`_
* `RunStats Issue Tracker`_
.. _`RunStats Documentation`: http://www.grantjenks.com/docs/runstats/
.. _`RunStats API Reference`: http://www.grantjenks.com/docs/runstats/api.html
.. _`RunStats at PyPI`: https://pypi.python.org/pypi/runstats/
.. _`RunStats at GitHub`: https://github.com/grantjenks/python_runstats/
.. _`RunStats Issue Tracker`: https://github.com/grantjenks/python_runstats/issues/
License
-------
Copyright 2015 Grant Jenks
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
runstats-0.6.0.tar.gz
(114.2 kB
view details)
Built Distribution
File details
Details for the file runstats-0.6.0.tar.gz
.
File metadata
- Download URL: runstats-0.6.0.tar.gz
- Upload date:
- Size: 114.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ba3cb81597c1f7d094edc41f8eeade556378947ae42fbdff29ba3b65043f32f |
|
MD5 | bd0867f3e275a82c52304c6108a2fb3f |
|
BLAKE2b-256 | 2c49ba1dbe6f98bef7a8e5301d8e77da7dbfa5e69b76112eaa4d08d3b4505457 |
File details
Details for the file runstats-0.6.0-py2.py3-none-any.whl
.
File metadata
- Download URL: runstats-0.6.0-py2.py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08fb823097e1ebf8fd3931e876aedea5e8349416e5bdb5c8d4724f256db8c80f |
|
MD5 | 6341220da5c143b853f8563f517515b9 |
|
BLAKE2b-256 | 98430028523a69561fc28291de1a0b1bc705968c8c1004463f8f6a3873174db5 |