Python package for probability density function fitting and hypothesis testing.

## Project description

# distfit - Probability density fitting

```
Star it if you like it!
```

### Background

`distfit`

is a python package for probability density fitting across 89 univariate distributions to non-censored data by residual sum of squares (RSS), and hypothesis testing.
Probability density fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. `distfit`

scores each of the 89 different distributions for the fit wih the empirical distribution and return the best scoring distribution.

### Functionalities

The `distfit`

library is created with classes to ensure simplicity in usage.

# Import library from distfit import distfit dist = distfit() # Specify desired parameters dist.fit_transform(X) # Fit distributions on empirical data X dist.predict(y) # Predict the probability of the resonse variables dist.plot() # Plot the best fitted distribution (y is included if prediction is made)

### Contents

### Installation

Install distfit from PyPI (recommended). distfit is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.

#### Install from PyPi

```
pip install distfit
```

#### Install directly from github source (beta version)

```
pip install git+https://github.com/erdogant/distfit#egg=master
```

#### Install by cloning (beta version)

```
git clone https://github.com/erdogant/distfit.git
cd distfit
pip install -U .
```

#### Check version number

import distfit print(distfit.__version__)

### Examples

Import `distfit`

library

from distfit import distfit

Create Some random data and model using default parameters:

import numpy as np X = np.random.normal(0, 2, [100,10]) y = [-8,-6,0,1,2,3,4,5,6]

Specify `distfit`

parameters. In this example nothing is specied and that means that all parameters are set to default.

dist = distfit() dist.fit_transform(X) dist.plot() # Prints the screen: # [distfit] >fit.. # [distfit] >transform.. # [distfit] >[norm ] [RSS: 0.0133619] [loc=-0.059 scale=2.031] # [distfit] >[expon ] [RSS: 0.3911576] [loc=-6.213 scale=6.154] # [distfit] >[pareto ] [RSS: 0.6755185] [loc=-7.965 scale=1.752] # [distfit] >[dweibull ] [RSS: 0.0183543] [loc=-0.053 scale=1.726] # [distfit] >[t ] [RSS: 0.0133619] [loc=-0.059 scale=2.031] # [distfit] >[genextreme] [RSS: 0.0115116] [loc=-0.830 scale=1.964] # [distfit] >[gamma ] [RSS: 0.0111372] [loc=-19.843 scale=0.209] # [distfit] >[lognorm ] [RSS: 0.0111236] [loc=-29.689 scale=29.561] # [distfit] >[beta ] [RSS: 0.0113012] [loc=-12.340 scale=41.781] # [distfit] >[uniform ] [RSS: 0.2481737] [loc=-6.213 scale=12.281]

Note that the best fit should be [normal], as this was also the input data. However, many other distributions can be very similar with specific loc/scale parameters. It is however not unusual to see gamma and beta distribution as these are the "barba-pappas" among the distributions. Lets print the summary of detected distributions with the Residual Sum of Squares.

# All scores of the tested distributions print(dist.summary) # Distribution parameters for best fit dist.model # Make plot dist.plot_summary()

After we have a fitted model, we can make some predictions using the theoretical distributions. After making some predictions, we can plot again but now the predictions are automatically included.

dist.predict(y) dist.plot() # # Prints to screen: # [distfit] >predict.. # [distfit] >Multiple test correction..[fdr_bh]

The results of the prediction are stored in `y_proba`

and `y_pred`

# Show the predictions for y print(dist.y_pred) # ['down' 'down' 'none' 'none' 'none' 'none' 'up' 'up' 'up'] # Show the probabilities for y that belong with the predictions print(dist.y_proba) # [2.75338375e-05 2.74664877e-03 4.74739680e-01 3.28636879e-01 1.99195071e-01 1.06316132e-01 5.05914722e-02 2.18922761e-02 8.89349927e-03] # All predicted information is also stored in a structured dataframe print(dist.results['df']) # y y_proba y_pred P # 0 -8 0.000028 down 0.000003 # 1 -6 0.002747 down 0.000610 # 2 0 0.474740 none 0.474740 # 3 1 0.328637 none 0.292122 # 4 2 0.199195 none 0.154929 # 5 3 0.106316 none 0.070877 # 6 4 0.050591 up 0.028106 # 7 5 0.021892 up 0.009730 # 8 6 0.008893 up 0.002964

Example if you want to test one specific distribution, such as the normal distribution:

dist = distfit(distr='norm') dist.fit_transform(X) # [distfit] >fit.. # [distfit] >transform.. # [distfit] >[norm] [RSS: 0.0151267] [loc=0.103 scale=2.028] dist.plot()

### Citation

Please cite distfit in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019distfit, title={distfit}, author={Erdogan Taskesen}, year={2019}, howpublished={\url{https://github.com/erdogant/distfit}}, }

### Maintainer

```
Erdogan Taskesen, github: [erdogant](https://github.com/erdogant)
Contributions are welcome.
```

## Project details

## Release history Release notifications | RSS feed

## Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|

Filename, size distfit-1.2.6-py3-none-any.whl (21.6 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |

Filename, size distfit-1.2.6.tar.gz (23.1 kB) | File type Source | Python version None | Upload date | Hashes View |