Skip to main content

Python package for probability density function fitting and hypothesis testing.

Project description

distfit

Python PyPI Version License

  • Python package for probability density fitting and hypothesis testing.
  • Probability density fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. distfit scores each of the 89 different distributions for the fit wih the emperical distribution and return the best scoring distribution.

Four functions are available:

# To make the distribution fit with the input data
.fit()
# Compute probabilities using the fitted distribution
.proba_parametric()
# Compute probabilities in an emperical manner
.proba_emperical()
# Plot results
.plot()

See below for the exact working of the functions

Contents

Installation

  • Install distfit from PyPI (recommended). distfit is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
  • It is distributed under the MIT license.

Requirements

pip install numpy pandas matplotlib

Quick Start

pip install distfit
  • Alternatively, install distfit from the GitHub source:
git clone https://github.com/erdogant/distfit.git
cd distfit
python setup.py install

Import distfit package

import distfit as dist

Generate some random data:

import numpy as np
data=np.random.normal(5, 8, [1000])

data looks like this:

array([[-12.65284521,  -3.81514715,  -4.53613236],
       [ 11.5865475 ,   2.42547023,   6.6395518 ],
       [  3.82076163,   6.65765319,   9.95795751],
       ...,
       [  3.65728268,   7.298237  ,  -4.25641318],
       [  7.51820943,  16.26147929,  -0.60033084],
       [  2.49165326,   3.97880574,   7.98986818]])

Example fitting best scoring distribution to input-data:

model = dist.fit(data)
dist.plot(model)

Output looks like this:

[DISTFIT] Checking for [norm] [SSE:0.000152]
[DISTFIT] Checking for [expon] [SSE:0.021767] 
[DISTFIT] Checking for [pareto] [SSE:0.054325] 
[DISTFIT] Checking for [dweibull] [SSE:0.000721]
[DISTFIT] Checking for [t] [SSE:0.000139]
[DISTFIT] Checking for [genextreme] [SSE:0.050649]
[DISTFIT] Checking for [gamma] [SSE:0.000152]
[DISTFIT] Checking for [lognorm] [SSE:0.000156]
[DISTFIT] Checking for [beta] [SSE:0.000152]
[DISTFIT] Checking for [uniform] [SSE:0.015671] 
[DISTFIT] Estimated distribution: t [loc:5.239912, scale:7.871518]

note that the best fit should be [normal], as this was also the input data. 
However, many other distributions can be very similar with specific loc/scale parameters. 
In this case, the t-distribution scored slightly better then normal. The normal distribution 
scored similar to gamma and beta which is not strange to see. 
If you dont understand why, do some homework first ;)

Example Compute probability whether values are of interest compared 95%CII of the data distribution:

expdata=[-20,-12,-8,0,1,2,3,5,10,20,30,35]
# Use fitted model
model_P = dist.proba_parametric(expdata, data, model=model)
# Make plot
dist.plot(model)

# Its also possible to do the distribution fit in the proba_ function:
model_P = dist.proba_parametric(expdata, data)

Citation

Please cite distfit in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019distfit,
  title={distfit},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/distfit}},
}

Maintainers

Contribute

  • Contributions are welcome.

© Copyright

See LICENSE for details.

Project details


Release history Release notifications

This version

0.1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for distfit, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size distfit-0.1.0-py3-none-any.whl (20.3 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size distfit-0.1.0.tar.gz (33.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page