scikit-gof·PyPI

Variations on goodness of fit tests for SciPy.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 2.7
- Python :: 3.5
Topic
- Scientific/Engineering :: Information Analysis

Project description

Provides variants of Kolmogorov-Smirnov, Cramer-von Mises and Anderson-Darling goodness of fit tests for fully specified continuous distributions.

Example

>>> from scipy.stats import norm, uniform
>>> from skgof import ks_test, cvm_test, ad_test

>>> ks_test((1, 2, 3), uniform(0, 4))
GofResult(statistic=0.25, pvalue=0.97...)

>>> cvm_test((1, 2, 3), uniform(0, 4))
GofResult(statistic=0.04..., pvalue=0.95...)

>>> data = norm(0, 1).rvs(random_state=1, size=100)
>>> ad_test(data, norm(0, 1))
GofResult(statistic=0.75..., pvalue=0.51...)
>>> ad_test(data, norm(.3, 1))
GofResult(statistic=3.52..., pvalue=0.01...)

Simple tests

Scikit-gof currently only offers three nonparametric tests that let you compare a sample with a reference probability distribution. These are:

ks_test(): Kolmogorov-Smirnov supremum statistic; almost the same as scipy.stats.kstest() with alternative='two-sided' but with (hopefully) somewhat more precise p-value calculation;
cvm_test(): Cramer-von Mises L2 statistic, with a rather crude estimation of the statistic distribution (but seemingly the best available);
ad_test(): Anderson-Darling statistic with a fair approximation of its distribution; unlike the composite scipy.stats.anderson() this one needs a fully specified hypothesized distribution.

Simple test functions use a common interface, taking as the first argument the data (sample) to be compared and as the second argument a frozen scipy.stats distribution. They return a named tuple with two fields: statistic and pvalue.

For a simple example consider the hypothesis that the sample (.4, .1, .7) comes from the uniform distribution on [0, 1]:

if ks_test((.4, .1, .7), unif(0, 1)).pvalue < .05:
    print("Hypothesis rejected with 5% significance.")

If your samples are very large and you have them sorted ahead of time, pass assume_sorted=True to save some time that would be wasted resorting.

Extending

Simple tests are composed of two phases: calculating the test statistic and determining how likely is the resulting value (under the hypothesis). New tests may be defined by providing a new statistic calculation routine or an alternative distribution for a statistic.

Functions calculating statistics are given evaluations of the reference cumulative distribution function on sorted data and are expected to return a single number. For a simple test, if the sample indeed comes from the hypothesized (continuous) distribution, the values passed to the function should be uniformly distributed over [0, 1].

Here is a simplistic example of how a statistic function might look like:

def ex_stat(data):
    return abs(data.sum() - data.size / 2)

Statistic functions for the provided tests, ks_stat(), cvm_stat(), and ad_stat(), can be imported from skgof.ecdfgof.

Statistic distributions should derive from rv_continuous and implement at least one of the abstract _cdf() or _pdf() methods (you might also consider directly coding _sf() for increased precision of results close to 1). For example:

from numpy import sqrt
from scipy.stats import norm, rv_continuous

class ex_unif_gen(rv_continuous):
    def _cdf(self, statistic, samples):
        return 1 - 2 * norm.cdf(-statistic, scale=sqrt(samples / 12))

ex_unif = ex_unif_gen(a=0, name='ex-unif', shapes='samples')

The provided distributions live in separate modules, respectively ksdist, cvmdist, and addist.

Once you have a statistic calculation function and a statistic distribution the two parts can be combined using simple_test:

from functools import partial
from skgof.ecdfgof import simple_test

ex_test = partial(simple_test, stat=ex_stat, pdist=ex_unif)

Exercise: The example test has a fundamental flaw. Can you point it out?

Installation

pip install scikit-gof

Requires recent versions of Python (> 3), NumPy (>= 1.10) and SciPy.

Please fix or point out any errors, inaccuracies or typos you notice.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 2.7
- Python :: 3.5
Topic
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

This version

0.1.3

Feb 15, 2017

0.1.2

Apr 10, 2016

0.1.1

Feb 10, 2016

0.0.2

Jan 21, 2016

0.0.1

Oct 29, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit-gof-0.1.3.tar.gz (10.3 kB view details)

Uploaded Feb 15, 2017 Source

File details

Details for the file scikit-gof-0.1.3.tar.gz.

File metadata

Download URL: scikit-gof-0.1.3.tar.gz
Upload date: Feb 15, 2017
Size: 10.3 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for scikit-gof-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`092e3bcbc8736dd19793cf3bbaa9a1b1e0f8a4fd0a2f0f90351d12e37b54779e`
MD5	`af83ddafcfc81a41cbccb5bf49761b9c`
BLAKE2b-256	`e0092c2a5af0fe9901bed91ca862fe4d678099b41c34db0efd6036925c93ad9a`

See more details on using hashes here.

scikit-gof 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Example

Simple tests

Extending

Installation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes