Skip to main content

Python package for probability density function fitting and hypothesis testing.

Project description

distfit

Python PyPI Version License Downloads Donate

  • Python package for probability density fitting and hypothesis testing.
  • Probability density fitting is the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon.
  • distfit scores each of the 89 different distributions for the fit wih the emperical distribution and return the best scoring distribution.

The following functions are available:

import distfit as dist
# To make the distribution fit with the input data
dist.fit()
# Compute probabilities using the fitted distribution
dist.proba_parametric()
# Compute probabilities in an emperical manner
dist.proba_emperical()
# Plot results
dist.plot()
# Plot summary
dist.plot_summary()

See below for the exact working of the functions.

Contents

Installation

  • Install distfit from PyPI (recommended). distfit is compatible with Python 3.6+ and runs on Linux, MacOS X and Windows.
  • It is distributed under the MIT license.

Requirements

pip install numpy pandas matplotlib

Quick Start

pip install distfit

Alternatively, install distfit from the GitHub source:

git clone https://github.com/erdogant/distfit.git
cd distfit
python setup.py install

Import distfit package

import distfit as dist

Generate some random data:

import numpy as np
X = np.random.beta(5, 8, [100,100])
# or 
# X = np.random.beta(5, 8, 1000)
# or anything else

# Print to screen
print(X)
# array([[-12.65284521,  -3.81514715,  -4.53613236],
#        [ 11.5865475 ,   2.42547023,   6.6395518 ],
#        [  3.82076163,   6.65765319,   9.95795751],
#        ...,
#        [  3.65728268,   7.298237  ,  -4.25641318],
#        [  7.51820943,  16.26147929,  -0.60033084],
#        [  2.49165326,   3.97880574,   7.98986818]])

Example fitting best scoring distribution to input-data:

model = dist.fit(X)
dist.plot(model)

# Output looks like this:
# [DISTFIT.fit] Fitting [norm      ] [SSE: 1.1641360] [loc=0.384 scale=0.128] 
# [DISTFIT.fit] Fitting [expon     ] [SSE: 82.9253587] [loc=0.037 scale=0.347] 
# [DISTFIT.fit] Fitting [pareto    ] [SSE: 100.6452574] [loc=-0.711 scale=0.749] 
# [DISTFIT.fit] Fitting [dweibull  ] [SSE: 3.0304725] [loc=0.376 scale=0.112] 
# [DISTFIT.fit] Fitting [t         ] [SSE: 1.1640207] [loc=0.384 scale=0.128] 
# [DISTFIT.fit] Fitting [genextreme] [SSE: 0.4763435] [loc=0.335 scale=0.123] 
# [DISTFIT.fit] Fitting [gamma     ] [SSE: 0.6668446] [loc=-0.514 scale=0.018] 
# [DISTFIT.fit] Fitting [lognorm   ] [SSE: 0.6960495] [loc=-1.046 scale=1.424] 
# [DISTFIT.fit] Fitting [beta      ] [SSE: 0.3419988] [loc=-0.009 scale=0.987] 
# [DISTFIT.fit] Fitting [uniform   ] [SSE: 56.8836516] [loc=0.037 scale=0.797] 

Note that the best fit should be [beta], as this was also the input data. However, many other distributions can be very similar with specific loc/scale parameters. In this case, the beta-distribution scored best. It is however not unusual to see gamma and beta distribution as these are the "barba-pappas" among the distributions.

  • Summary of the SSE scores:

Example Compute probability whether values are of interest compared 95%CII of the data distribution:

This can be done using a pre-trained model or in simply in one run.

X = np.random.beta(5, 8, [100,100])
y = [-1,-0.8,-0.6,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,1.1,1.5]

# Fit model (manner 1)
model = dist.fit(X)
out = dist.proba_parametric(y, model=model)

# Fit model and predict (manner 2) 
# Note that this if not practical in a loop with fixed background
out = dist.proba_parametric(y, X)

# print probabilities
print(out['proba'])

#   data             P          Padj bound
#   -1.0  0.000000e+00  0.000000e+00  down
#   -0.8  0.000000e+00  0.000000e+00  down
#   -0.6  0.000000e+00  0.000000e+00  down
#    0.0  1.559231e-08  3.563956e-08  down
#    0.1  4.467564e-03  7.148102e-03  down
#    0.2  7.085374e-02  8.720461e-02  none
#    0.3  2.726085e-01  2.907824e-01  none
#    0.4  4.390847e-01  4.390847e-01  none
#    0.5  1.905598e-01  2.177826e-01  none
#    0.6  5.360688e-02  7.147584e-02  none
#    0.7  7.935965e-03  1.154322e-02    up
#    0.8  3.697836e-04  6.573931e-04    up
#    0.9  8.037999e-07  1.607600e-06    up
#    1.0  0.000000e+00  0.000000e+00    up
#    1.1  0.000000e+00  0.000000e+00    up
#    1.5  0.000000e+00  0.000000e+00    up

# Make plot
dist.plot(model)

Citation

Please cite distfit in your publications if this is useful for your research. Here is an example BibTeX entry:

@misc{erdogant2019distfit,
  title={distfit},
  author={Erdogan Taskesen},
  year={2019},
  howpublished={\url{https://github.com/erdogant/distfit}},
}

Maintainers

Contribute

  • Contributions are welcome.

Licence

See LICENSE for details.

Donation

This package is created and maintained in my free time. If this package is usefull, feel free to use more of my packages. Sponser here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distfit-0.1.4.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

distfit-0.1.4-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file distfit-0.1.4.tar.gz.

File metadata

  • Download URL: distfit-0.1.4.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0.post20200127 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.10

File hashes

Hashes for distfit-0.1.4.tar.gz
Algorithm Hash digest
SHA256 827f76618104577007b12dab5f8c6fd179edb7b15ac9c8ab04c94cacfab08607
MD5 c79723023c6c03de1629a81c5ec00386
BLAKE2b-256 98d13d041a7417c56efe06f997b797771a036685cfdcc6c802adda6af89fa2b0

See more details on using hashes here.

File details

Details for the file distfit-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: distfit-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0.post20200127 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.10

File hashes

Hashes for distfit-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 70c29626f7babd9946c42977530e583132f7268c5dae999f81d81256a79046f8
MD5 1207a6e4c4e809cd9cce24a397b27405
BLAKE2b-256 75b1e67980deec203be7c1731ed1436ea47054e53dd1cc4b997b4c08f069e512

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page