Python package for probability density function fitting and hypothesis testing.
Project description
distfit
- Python package for probability density fitting and hypothesis testing.
- Probability density fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. distfit scores each of the 89 different distributions for the fit wih the emperical distribution and return the best scoring distribution.
Four functions are available:
# To make the distribution fit with the input data
.fit()
# Compute probabilities using the fitted distribution
.proba_parametric()
# Compute probabilities in an emperical manner
.proba_emperical()
# Plot results
.plot()
See below for the exact working of the functions
Contents
Installation
- Install distfit from PyPI (recommended). distfit is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- It is distributed under the MIT license.
Requirements
pip install numpy pandas matplotlib
Quick Start
pip install distfit
- Alternatively, install distfit from the GitHub source:
git clone https://github.com/erdogant/distfit.git
cd distfit
python setup.py install
Import distfit package
import distfit as dist
Generate some random data:
import numpy as np
data=np.random.normal(5, 8, [1000])
data looks like this:
array([[-12.65284521, -3.81514715, -4.53613236],
[ 11.5865475 , 2.42547023, 6.6395518 ],
[ 3.82076163, 6.65765319, 9.95795751],
...,
[ 3.65728268, 7.298237 , -4.25641318],
[ 7.51820943, 16.26147929, -0.60033084],
[ 2.49165326, 3.97880574, 7.98986818]])
Example fitting best scoring distribution to input-data:
model = dist.fit(data)
dist.plot(model)
Output looks like this:
[DISTFIT] Checking for [norm] [SSE:0.000152]
[DISTFIT] Checking for [expon] [SSE:0.021767]
[DISTFIT] Checking for [pareto] [SSE:0.054325]
[DISTFIT] Checking for [dweibull] [SSE:0.000721]
[DISTFIT] Checking for [t] [SSE:0.000139]
[DISTFIT] Checking for [genextreme] [SSE:0.050649]
[DISTFIT] Checking for [gamma] [SSE:0.000152]
[DISTFIT] Checking for [lognorm] [SSE:0.000156]
[DISTFIT] Checking for [beta] [SSE:0.000152]
[DISTFIT] Checking for [uniform] [SSE:0.015671]
[DISTFIT] Estimated distribution: t [loc:5.239912, scale:7.871518]
note that the best fit should be [normal], as this was also the input data.
However, many other distributions can be very similar with specific loc/scale parameters.
In this case, the t-distribution scored slightly better then normal. The normal distribution
scored similar to gamma and beta which is not strange to see.
If you dont understand why, do some homework first ;)
Example Compute probability whether values are of interest compared 95%CII of the data distribution:
expdata=[-20,-12,-8,0,1,2,3,5,10,20,30,35]
# Use fitted model
model_P = dist.proba_parametric(expdata, data, model=model)
# Make plot
dist.plot(model)
# Its also possible to do the distribution fit in the proba_ function:
model_P = dist.proba_parametric(expdata, data)
Citation
Please cite distfit in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2019distfit,
title={distfit},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/distfit}},
}
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- Contributions are welcome.
© Copyright
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
distfit-0.1.0.tar.gz
(33.4 kB
view hashes)
Built Distribution
distfit-0.1.0-py3-none-any.whl
(20.3 kB
view hashes)