Python package for probability density function fitting and hypothesis testing.
Project description
distfit
- Python package for probability density fitting and hypothesis testing.
- Probability density fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon.
- distfit scores each of the 89 different distributions for the fit wih the emperical distribution and return the best scoring distribution.
Functionalities
The distfit
library is created with classes to ensure simplicity in usage.
# Import library
from distfit import distfit
dist = distfit() # Specify desired parameters
dist.fit_transform(X) # Fit distributions on emperical data X
dist.predict(y) # Predict the probability of the resonse variables
dist.plot() # Plot the best fitted distribution (y is included if prediction is made)
Contents
Installation
- Install distfit from PyPI (recommended). distfit is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
- It is distributed under the MIT license.
Install from PyPi
pip install distfit
Install directly from github source (beta version)
pip install git+https://github.com/erdogant/distfit#egg=master
Install by cloning (beta version)
git clone https://github.com/erdogant/distfit.git
cd distfit
pip install -U .
Check version number
import distfit as distfit
print(distfit.__version__)
Examples
Import distfit
library
from distfit import distfit
Create Some random data and model using default parameters:
import numpy as np
X = np.random.normal(0, 2, [100,10])
y = [-8,-6,0,1,2,3,4,5,6]
Specify distfit
parameters. In this example nothing is specied and that means that all parameters are set to default.
dist = distfit()
dist.fit_transform(X)
dist.plot()
# Prints the screen:
# [distfit] >fit..
# [distfit] >transform..
# [distfit] >[norm ] [SSE: 0.0133619] [loc=-0.059 scale=2.031]
# [distfit] >[expon ] [SSE: 0.3911576] [loc=-6.213 scale=6.154]
# [distfit] >[pareto ] [SSE: 0.6755185] [loc=-7.965 scale=1.752]
# [distfit] >[dweibull ] [SSE: 0.0183543] [loc=-0.053 scale=1.726]
# [distfit] >[t ] [SSE: 0.0133619] [loc=-0.059 scale=2.031]
# [distfit] >[genextreme] [SSE: 0.0115116] [loc=-0.830 scale=1.964]
# [distfit] >[gamma ] [SSE: 0.0111372] [loc=-19.843 scale=0.209]
# [distfit] >[lognorm ] [SSE: 0.0111236] [loc=-29.689 scale=29.561]
# [distfit] >[beta ] [SSE: 0.0113012] [loc=-12.340 scale=41.781]
# [distfit] >[uniform ] [SSE: 0.2481737] [loc=-6.213 scale=12.281]
Note that the best fit should be [normal], as this was also the input data. However, many other distributions can be very similar with specific loc/scale parameters. It is however not unusual to see gamma and beta distribution as these are the "barba-pappas" among the distributions. Lets print the summary of detected distributions with the sum of square scores.
dist.plot_summary()
After we have a fitted model, we can make some predictions using the theoretical distributions. After making some predictions, we can plot again but now the predictions are automatically included.
dist.predict(y)
dist.plot()
#
# Prints to screen:
# [distfit] >predict..
# [distfit] >Multiple test correction..[fdr_bh]
The results of the prediction are stored in y_proba
and y_pred
# Show the predictions for y
print(dist.y_pred)
# ['down' 'down' 'none' 'none' 'none' 'none' 'up' 'up' 'up']
# Show the probabilities for y that belong with the predictions
print(dist.y_proba)
# [2.75338375e-05 2.74664877e-03 4.74739680e-01 3.28636879e-01 1.99195071e-01 1.06316132e-01 5.05914722e-02 2.18922761e-02 8.89349927e-03]
# All predicted information is also stored in a structured dataframe
print(dist.df)
# y y_proba y_pred P
# 0 -8 0.000028 down 0.000003
# 1 -6 0.002747 down 0.000610
# 2 0 0.474740 none 0.474740
# 3 1 0.328637 none 0.292122
# 4 2 0.199195 none 0.154929
# 5 3 0.106316 none 0.070877
# 6 4 0.050591 up 0.028106
# 7 5 0.021892 up 0.009730
# 8 6 0.008893 up 0.002964
Citation
Please cite distfit in your publications if this is useful for your research. Here is an example BibTeX entry:
@misc{erdogant2019distfit,
title={distfit},
author={Erdogan Taskesen},
year={2019},
howpublished={\url{https://github.com/erdogant/distfit}},
}
Maintainers
- Erdogan Taskesen, github: erdogant
Contribute
- Contributions are welcome.
Licence
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file distfit-1.0.0.tar.gz
.
File metadata
- Download URL: distfit-1.0.0.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d298444266fa4225f2756f959179cabf775525c29b70386041d6f5cb62d50345 |
|
MD5 | 884e31f2a1ed01faa3449adce47be17c |
|
BLAKE2b-256 | f2c8f6bc2e2deefb2385e4481e2f95944946d81228464211044ef4c44a5b02d9 |
File details
Details for the file distfit-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: distfit-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d4c6ad271bb55b3eed0105b5523095613b8826b7890bc51b38d66c8cd94c908 |
|
MD5 | 30eb38bc9c1546ed55b9e780407c0c14 |
|
BLAKE2b-256 | 82d2ce9a6413311314676b3303698c536e286e86a2fbc55132e50cb32e1ee23c |