Skip to main content

Python package for probability density function fitting and hypothesis testing.

Project description

distfit

Python PyPI Version License Downloads

  • Python package for probability density fitting and hypothesis testing.
  • Probability density fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon.
  • distfit scores each of the 89 different distributions for the fit wih the emperical distribution and return the best scoring distribution.

Functionalities

The distfit library is created with classes to ensure simplicity in usage.

# Import library
from distfit import distfit

dist = distfit()        # Specify desired parameters
dist.fit_transform(X)   # Fit distributions on emperical data X
dist.predict(y)         # Predict the probability of the resonse variables
dist.plot()             # Plot the best fitted distribution (y is included if prediction is made)

Contents

Installation

  • Install distfit from PyPI (recommended). distfit is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
  • It is distributed under the MIT license.

Install from PyPi

pip install distfit

Install directly from github source (beta version)

pip install git+https://github.com/erdogant/distfit#egg=master

Install by cloning (beta version)

git clone https://github.com/erdogant/distfit.git
cd distfit
pip install -U .

Check version number

import distfit as distfit
print(distfit.__version__)

Examples

Import distfit library

from distfit import distfit

Create Some random data and model using default parameters:

import numpy as np
X = np.random.normal(0, 2, [100,10])
y = [-8,-6,0,1,2,3,4,5,6]

Specify distfit parameters. In this example nothing is specied and that means that all parameters are set to default.

dist = distfit()
dist.fit_transform(X)
dist.plot()

# Prints the screen:
# [distfit] >fit..
# [distfit] >transform..
# [distfit] >[norm      ] [SSE: 0.0133619] [loc=-0.059 scale=2.031] 
# [distfit] >[expon     ] [SSE: 0.3911576] [loc=-6.213 scale=6.154] 
# [distfit] >[pareto    ] [SSE: 0.6755185] [loc=-7.965 scale=1.752] 
# [distfit] >[dweibull  ] [SSE: 0.0183543] [loc=-0.053 scale=1.726] 
# [distfit] >[t         ] [SSE: 0.0133619] [loc=-0.059 scale=2.031] 
# [distfit] >[genextreme] [SSE: 0.0115116] [loc=-0.830 scale=1.964] 
# [distfit] >[gamma     ] [SSE: 0.0111372] [loc=-19.843 scale=0.209] 
# [distfit] >[lognorm   ] [SSE: 0.0111236] [loc=-29.689 scale=29.561] 
# [distfit] >[beta      ] [SSE: 0.0113012] [loc=-12.340 scale=41.781] 
# [distfit] >[uniform   ] [SSE: 0.2481737] [loc=-6.213 scale=12.281] 

Note that the best fit should be [normal], as this was also the input data. However, many other distributions can be very similar with specific loc/scale parameters. It is however not unusual to see gamma and beta distribution as these are the "barba-pappas" among the distributions. Lets print the summary of detected distributions with the sum of square scores.

dist.plot_summary()

After we have a fitted model, we can make some predictions using the theoretical distributions. After making some predictions, we can plot again but now the predictions are automatically included.

dist.predict(y)
dist.plot()
# 
# Prints to screen:
# [distfit] >predict..
# [distfit] >Multiple test correction..[fdr_bh]

The results of the prediction are stored in y_proba and y_pred

# Show the predictions for y
print(dist.y_pred)
# ['down' 'down' 'none' 'none' 'none' 'none' 'up' 'up' 'up']

# Show the probabilities for y that belong with the predictions
print(dist.y_proba)
# [2.75338375e-05 2.74664877e-03 4.74739680e-01 3.28636879e-01 1.99195071e-01 1.06316132e-01 5.05914722e-02 2.18922761e-02 8.89349927e-03]

# All predicted information is also stored in a structured dataframe
print(dist.df)
#    y   y_proba y_pred         P
# 0 -8  0.000028   down  0.000003
# 1 -6  0.002747   down  0.000610
# 2  0  0.474740   none  0.474740
# 3  1  0.328637   none  0.292122
# 4  2  0.199195   none  0.154929
# 5  3  0.106316   none  0.070877
# 6  4  0.050591     up  0.028106
# 7  5  0.021892     up  0.009730
# 8  6  0.008893     up  0.002964

Citation

Please cite distfit in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019distfit,
  title={distfit},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/distfit}},
}

Maintainers

Contribute

  • Contributions are welcome.

Licence

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distfit-1.0.0.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

distfit-1.0.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file distfit-1.0.0.tar.gz.

File metadata

  • Download URL: distfit-1.0.0.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for distfit-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d298444266fa4225f2756f959179cabf775525c29b70386041d6f5cb62d50345
MD5 884e31f2a1ed01faa3449adce47be17c
BLAKE2b-256 f2c8f6bc2e2deefb2385e4481e2f95944946d81228464211044ef4c44a5b02d9

See more details on using hashes here.

File details

Details for the file distfit-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: distfit-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3.post20200330 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for distfit-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d4c6ad271bb55b3eed0105b5523095613b8826b7890bc51b38d66c8cd94c908
MD5 30eb38bc9c1546ed55b9e780407c0c14
BLAKE2b-256 82d2ce9a6413311314676b3303698c536e286e86a2fbc55132e50cb32e1ee23c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page