Skip to main content

Find the best probability distribution for your dataset

Project description

phitter-dark-logo

Downloads License Supported Python versions Tests

Phitter analyzes datasets and determines the best analytical probability distributions that represent them. Phitter studies over 80 probability distributions, both continuous and discrete, 3 goodness-of-fit tests, and interactive visualizations. For each selected probability distribution, a standard modeling guide is provided along with spreadsheets that detail the methodology for using the chosen distribution in data science, operations research, and artificial intelligence.

This repository contains the implementation of the python library and the kernel of Phitter Web

Installation

Requirements

python: >=3.9

PyPI

pip install phitter

Usage

Notebook's Tutorials

Tutorial Notebooks
Fit Continuous Open In Colab
Fit Discrete Open In Colab
Fit Accelerate [Sample>100K] Open In Colab
Fit Specific Disribution Open In Colab
Working Distribution Open In Colab

General

import phitter

## Define your dataset
data: list[int | float] = [...]

## Make a continuous fit using Phitter
phi = phitter.PHITTER(data)
phi.fit()

Full continuous implementation

import phitter

## Define your dataset
data: list[int | float] = [...]

## Make a continuous fit using Phitter
phi = phitter.PHITTER(
    data=data,
    fit_type="continuous",
    num_bins=15,
    confidence_level=0.95,
    minimum_sse=1e-2,
    distributions_to_fit=["beta", "normal", "fatigue_life", "triangular"],
)
phi.fit(n_workers=6)

Full discrete implementation

import phitter

## Define your dataset
data: list[int | float] = [...]

## Make a discrete fit using Phitter
phi = phitter.PHITTER(
    data=data,
    fit_type="discrete",
    confidence_level=0.95,
    minimum_sse=1e-2,
    distributions_to_fit=["binomial", "geometric"],
)
phi.fit(n_workers=2)

Phitter: properties and methods

import phitter

## Define your dataset
data: list[int | float] = [...]

## Make a fit using Phitter
phi = phitter.PHITTER(data)
phi.fit(n_workers=2)

## Global methods and properties
phi.summarize(k: int) -> pandas.DataFrame
phi.summarize_info(k: int) -> pandas.DataFrame
phi.best_distribution -> dict
phi.sorted_distributions_sse -> dict
phi.not_rejected_distributions -> dict
phi.df_sorted_distributions_sse -> pandas.DataFrame
phi.df_not_rejected_distributions -> pandas.DataFrame

## Specific distribution methods and properties
phi.get_parameters(id_distribution: str) -> dict
phi.get_test_chi_square(id_distribution: str) -> dict
phi.get_test_kolmmogorov_smirnov(id_distribution: str) -> dict
phi.get_test_anderson_darling(id_distribution: str) -> dict
phi.get_sse(id_distribution: str) -> float
phi.get_n_test_passed(id_distribution: str) -> int
phi.get_n_test_null(id_distribution: str) -> int

Histogram Plot

import phitter
data: list[int | float] = [...]
phi = phitter.PHITTER(data)
phi.fit()

phi.plot_histogram()
phitter_histogram

Histogram PDF Dsitributions Plot

import phitter
data: list[int | float] = [...]
phi = phitter.PHITTER(data)
phi.fit()

phi.plot_histogram_distributions()
phitter_histogram

Histogram PDF Dsitribution Plot

import phitter
data: list[int | float] = [...]
phi = phitter.PHITTER(data)
phi.fit()

phi.plot_distribution("beta")
phitter_histogram

ECDF Plot

import phitter
data: list[int | float] = [...]
phi = phitter.PHITTER(data)
phi.fit()

phi.plot_ecdf()
phitter_histogram

ECDF Distribution Plot

import phitter
data: list[int | float] = [...]
phi = phitter.PHITTER(data)
phi.fit()

phi.plot_ecdf_distribution("beta")
phitter_histogram

QQ Plot

import phitter
data: list[int | float] = [...]
phi = phitter.PHITTER(data)
phi.fit()

phi.qq_plot("beta")
phitter_histogram

QQ - Regression Plot

import phitter
data: list[int | float] = [...]
phi = phitter.PHITTER(data)
phi.fit()

phi.qq_plot_regression("beta")
phitter_histogram

Working with distributions: Methods and properties

import phitter

distribution = phitter.continuous.BETA({"alpha": 5, "beta": 3, "A": 200, "B": 1000})

## CDF, PDF, PPF, PMF receive float or numpy.ndarray. For discrete distributions PMF instead of PDF. Parameters notation are in description of ditribution
distribution.cdf(752) # -> 0.6242831129533498
distribution.pdf(388) # -> 0.0002342575686629883
distribution.ppf(0.623) # -> 751.5512889417921
distribution.sample(2) # -> [550.800114   514.85410326]

## STATS
distribution.mean # -> 700.0
distribution.variance # -> 16666.666666666668
distribution.standard_deviation # -> 129.09944487358058
distribution.skewness # -> -0.3098386676965934
distribution.kurtosis # -> 2.5854545454545454
distribution.median # -> 708.707130841534
distribution.mode # -> 733.3333333333333

Continuous Distributions

1. PDF File Documentation Continuous Distributions

2. Resources Continuous Distributions

Distribution Phitter Playground Excel File Google Sheets Files
alpha ▶️phitter:alpha 📊alpha.xlsx 🌐gs:alpha
arcsine ▶️phitter:arcsine 📊arcsine.xlsx 🌐gs:arcsine
argus ▶️phitter:argus 📊argus.xlsx 🌐gs:argus
beta ▶️phitter:beta 📊beta.xlsx 🌐gs:beta
beta_prime ▶️phitter:beta_prime 📊beta_prime.xlsx 🌐gs:beta_prime
beta_prime_4p ▶️phitter:beta_prime_4p 📊beta_prime_4p.xlsx 🌐gs:beta_prime_4p
bradford ▶️phitter:bradford 📊bradford.xlsx 🌐gs:bradford
burr ▶️phitter:burr 📊burr.xlsx 🌐gs:burr
burr_4p ▶️phitter:burr_4p 📊burr_4p.xlsx 🌐gs:burr_4p
cauchy ▶️phitter:cauchy 📊cauchy.xlsx 🌐gs:cauchy
chi_square ▶️phitter:chi_square 📊chi_square.xlsx 🌐gs:chi_square
chi_square_3p ▶️phitter:chi_square_3p 📊chi_square_3p.xlsx 🌐gs:chi_square_3p
dagum ▶️phitter:dagum 📊dagum.xlsx 🌐gs:dagum
dagum_4p ▶️phitter:dagum_4p 📊dagum_4p.xlsx 🌐gs:dagum_4p
erlang ▶️phitter:erlang 📊erlang.xlsx 🌐gs:erlang
erlang_3p ▶️phitter:erlang_3p 📊erlang_3p.xlsx 🌐gs:erlang_3p
error_function ▶️phitter:error_function 📊error_function.xlsx 🌐gs:error_function
exponential ▶️phitter:exponential 📊exponential.xlsx 🌐gs:exponential
exponential_2p ▶️phitter:exponential_2p 📊exponential_2p.xlsx 🌐gs:exponential_2p
f ▶️phitter:f 📊f.xlsx 🌐gs:f
f_4p ▶️phitter:f_4p 📊f_4p.xlsx 🌐gs:f_4p
fatigue_life ▶️phitter:fatigue_life 📊fatigue_life.xlsx 🌐gs:fatigue_life
folded_normal ▶️phitter:folded_normal 📊folded_normal.xlsx 🌐gs:folded_normal
frechet ▶️phitter:frechet 📊frechet.xlsx 🌐gs:frechet
gamma ▶️phitter:gamma 📊gamma.xlsx 🌐gs:gamma
gamma_3p ▶️phitter:gamma_3p 📊gamma_3p.xlsx 🌐gs:gamma_3p
generalized_extreme_value ▶️phitter:gen_extreme_value 📊gen_extreme_value.xlsx 🌐gs:gen_extreme_value
generalized_gamma ▶️phitter:gen_gamma 📊gen_gamma.xlsx 🌐gs:gen_gamma
generalized_gamma_4p ▶️phitter:gen_gamma_4p 📊gen_gamma_4p.xlsx 🌐gs:gen_gamma_4p
generalized_logistic ▶️phitter:gen_logistic 📊gen_logistic.xlsx 🌐gs:gen_logistic
generalized_normal ▶️phitter:gen_normal 📊gen_normal.xlsx 🌐gs:gen_normal
generalized_pareto ▶️phitter:gen_pareto 📊gen_pareto.xlsx 🌐gs:gen_pareto
gibrat ▶️phitter:gibrat 📊gibrat.xlsx 🌐gs:gibrat
gumbel_left ▶️phitter:gumbel_left 📊gumbel_left.xlsx 🌐gs:gumbel_left
gumbel_right ▶️phitter:gumbel_right 📊gumbel_right.xlsx 🌐gs:gumbel_right
half_normal ▶️phitter:half_normal 📊half_normal.xlsx 🌐gs:half_normal
hyperbolic_secant ▶️phitter:hyperbolic_secant 📊hyperbolic_secant.xlsx 🌐gs:hyperbolic_secant
inverse_gamma ▶️phitter:inverse_gamma 📊inverse_gamma.xlsx 🌐gs:inverse_gamma
inverse_gamma_3p ▶️phitter:inverse_gamma_3p 📊inverse_gamma_3p.xlsx 🌐gs:inverse_gamma_3p
inverse_gaussian ▶️phitter:inverse_gaussian 📊inverse_gaussian.xlsx 🌐gs:inverse_gaussian
inverse_gaussian_3p ▶️phitter:inverse_gaussian_3p 📊inverse_gaussian_3p.xlsx 🌐gs:inverse_gaussian_3p
johnson_sb ▶️phitter:johnson_sb 📊johnson_sb.xlsx 🌐gs:johnson_sb
johnson_su ▶️phitter:johnson_su 📊johnson_su.xlsx 🌐gs:johnson_su
kumaraswamy ▶️phitter:kumaraswamy 📊kumaraswamy.xlsx 🌐gs:kumaraswamy
laplace ▶️phitter:laplace 📊laplace.xlsx 🌐gs:laplace
levy ▶️phitter:levy 📊levy.xlsx 🌐gs:levy
loggamma ▶️phitter:loggamma 📊loggamma.xlsx 🌐gs:loggamma
logistic ▶️phitter:logistic 📊logistic.xlsx 🌐gs:logistic
loglogistic ▶️phitter:loglogistic 📊loglogistic.xlsx 🌐gs:loglogistic
loglogistic_3p ▶️phitter:loglogistic_3p 📊loglogistic_3p.xlsx 🌐gs:loglogistic_3p
lognormal ▶️phitter:lognormal 📊lognormal.xlsx 🌐gs:lognormal
maxwell ▶️phitter:maxwell 📊maxwell.xlsx 🌐gs:maxwell
moyal ▶️phitter:moyal 📊moyal.xlsx 🌐gs:moyal
nakagami ▶️phitter:nakagami 📊nakagami.xlsx 🌐gs:nakagami
non_central_chi_square ▶️phitter:non_central_chi_square 📊non_central_chi_square.xlsx 🌐gs:non_central_chi_square
non_central_f ▶️phitter:non_central_f 📊non_central_f.xlsx 🌐gs:non_central_f
non_central_t_student ▶️phitter:non_central_t_student 📊non_central_t_student.xlsx 🌐gs:non_central_t_student
normal ▶️phitter:normal 📊normal.xlsx 🌐gs:normal
pareto_first_kind ▶️phitter:pareto_first_kind 📊pareto_first_kind.xlsx 🌐gs:pareto_first_kind
pareto_second_kind ▶️phitter:pareto_second_kind 📊pareto_second_kind.xlsx 🌐gs:pareto_second_kind
pert ▶️phitter:pert 📊pert.xlsx 🌐gs:pert
power_function ▶️phitter:power_function 📊power_function.xlsx 🌐gs:power_function
rayleigh ▶️phitter:rayleigh 📊rayleigh.xlsx 🌐gs:rayleigh
reciprocal ▶️phitter:reciprocal 📊reciprocal.xlsx 🌐gs:reciprocal
rice ▶️phitter:rice 📊rice.xlsx 🌐gs:rice
semicircular ▶️phitter:semicircular 📊semicircular.xlsx 🌐gs:semicircular
t_student ▶️phitter:t_student 📊t_student.xlsx 🌐gs:t_student
t_student_3p ▶️phitter:t_student_3p 📊t_student_3p.xlsx 🌐gs:t_student_3p
trapezoidal ▶️phitter:trapezoidal 📊trapezoidal.xlsx 🌐gs:trapezoidal
triangular ▶️phitter:triangular 📊triangular.xlsx 🌐gs:triangular
uniform ▶️phitter:uniform 📊uniform.xlsx 🌐gs:uniform
weibull ▶️phitter:weibull 📊weibull.xlsx 🌐gs:weibull
weibull_3p ▶️phitter:weibull_3p 📊weibull_3p.xlsx 🌐gs:weibull_3p

Discrete Distributions

1. PDF File Documentation Discrete Distributions

2. Resources Discrete Distributions

Distribution Phitter Playground Excel File Google Sheets Files
bernoulli ▶️phitter:bernoulli 📊bernoulli.xlsx 🌐gs:bernoulli
binomial ▶️phitter:binomial 📊binomial.xlsx 🌐gs:binomial
geometric ▶️phitter:geometric 📊geometric.xlsx 🌐gs:geometric
hypergeometric ▶️phitter:hypergeometric 📊hypergeometric.xlsx 🌐gs:hypergeometric
logarithmic ▶️phitter:logarithmic 📊logarithmic.xlsx 🌐gs:logarithmic
negative_binomial ▶️phitter:negative_binomial 📊negative_binomial.xlsx 🌐gs:negative_binomial
poisson ▶️phitter:poisson 📊poisson.xlsx 🌐gs:poisson
uniform ▶️phitter:uniform 📊uniform.xlsx 🌐gs:uniform

Benchmarks

Fit time continuous distributions

Sample Size / Workers 1 2 6 10 20
1K 8.2981 7.1242 8.9667 9.9287 16.2246
10K 20.8711 14.2647 10.5612 11.6004 17.8562
100K 152.6296 97.2359 57.7310 51.6182 53.2313
500K 914.9291 640.8153 370.0323 267.4597 257.7534
1M 1580.8501 972.3985 573.5429 496.5569 425.7809

Estimation time parameters discrete distributions

Sample Size / Workers 1 2 4
1K 0.1688 2.6402 2.8719
10K 0.4462 2.4452 3.0471
100K 4.5598 6.3246 7.5869
500K 19.0172 21.8047 19.8420
1M 39.8065 29.8360 30.2334

Estimation time parameters continuous distributions

Distribution / Sample Size 1K 10K 100K 500K 1M 10M
alpha 0.3345 0.4625 2.5933 18.3856 39.6533 362.2951
arcsine 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
argus 0.0559 0.2050 2.2472 13.3928 41.5198 362.2472
beta 0.1880 0.1790 0.1940 0.2110 0.1800 0.3134
beta_prime 0.1766 0.7506 7.6039 40.4264 85.0677 812.1323
beta_prime_4p 0.0720 0.3630 3.9478 20.2703 40.2709 413.5239
bradford 0.0110 0.0000 0.0000 0.0000 0.0000 0.0010
burr 0.0733 0.6931 5.5425 36.7684 79.8269 668.2016
burr_4p 0.1552 0.7981 8.4716 44.4549 87.7292 858.0035
cauchy 0.0090 0.0160 0.1581 1.1052 2.1090 21.5244
chi_square 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
chi_square_3p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
dagum 0.3381 0.8278 9.6907 45.5855 98.6691 917.6713
dagum_4p 0.3646 1.3307 13.3437 70.9462 140.9371 1396.3368
erlang 0.0010 0.0000 0.0000 0.0000 0.0000 0.0000
erlang_3p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
error_function 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
exponential 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
exponential_2p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
f 0.0592 0.2948 2.6920 18.9458 29.9547 402.2248
fatigue_life 0.0352 0.1101 1.7085 9.0090 20.4702 186.9631
folded_normal 0.0020 0.0020 0.0020 0.0022 0.0033 0.0040
frechet 0.1313 0.4359 5.7031 39.4202 43.2469 671.3343
f_4p 0.3269 0.7517 0.6183 0.6037 0.5809 0.2073
gamma 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
gamma_3p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
generalized_extreme_value 0.0833 0.2054 2.0337 10.3301 22.1340 243.3120
generalized_gamma 0.0298 0.0178 0.0227 0.0236 0.0170 0.0241
generalized_gamma_4p 0.0371 0.0116 0.0732 0.0725 0.0707 0.0730
generalized_logistic 0.1040 0.1073 0.1037 0.0819 0.0989 0.0836
generalized_normal 0.0154 0.0736 0.7367 2.4831 5.9752 55.2417
generalized_pareto 0.3189 0.8978 8.9370 51.3813 101.6832 1015.2933
gibrat 0.0328 0.0432 0.4287 2.7159 5.5721 54.1702
gumbel_left 0.0000 0.0000 0.0000 0.0000 0.0010 0.0010
gumbel_right 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
half_normal 0.0010 0.0000 0.0000 0.0010 0.0000 0.0000
hyperbolic_secant 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
inverse_gamma 0.0308 0.0632 0.7233 5.0127 10.7885 99.1316
inverse_gamma_3p 0.0787 0.1472 1.6513 11.1161 23.4587 227.6125
inverse_gaussian 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
inverse_gaussian_3p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
johnson_sb 0.2966 0.7466 4.0707 40.2028 56.2130 728.2447
johnson_su 0.0070 0.0010 0.0010 0.0143 0.0010 0.0010
kumaraswamy 0.0164 0.0120 0.0130 0.0123 0.0125 0.0150
laplace 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
levy 0.0100 0.0314 0.2296 1.1365 2.7211 26.4966
loggamma 0.0085 0.0050 0.0050 0.0070 0.0062 0.0080
logistic 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
loglogistic 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
loglogistic_3p 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
lognormal 0.0000 0.0000 0.0000 0.0000 0.0010 0.0000
maxwell 0.0000 0.0000 0.0000 0.0000 0.0000 0.0010
moyal 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
nakagami 0.0000 0.0030 0.0213 0.1215 0.2649 2.2457
non_central_chi_square 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
non_central_f 0.0190 0.0182 0.0210 0.0192 0.0190 0.0200
non_central_t_student 0.0874 0.0822 0.0862 0.1314 0.2516 0.1781
normal 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
pareto_first_kind 0.0010 0.0030 0.0390 0.2494 0.5226 5.5246
pareto_second_kind 0.0643 0.1522 1.1722 10.9871 23.6534 201.1626
pert 0.0052 0.0030 0.0030 0.0040 0.0040 0.0092
power_function 0.0075 0.0040 0.0040 0.0030 0.0040 0.0040
rayleigh 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
reciprocal 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
rice 0.0182 0.0030 0.0040 0.0060 0.0030 0.0050
semicircular 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
trapezoidal 0.0083 0.0072 0.0073 0.0060 0.0070 0.0060
triangular 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
t_student 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
t_student_3p 0.3892 1.1860 11.2759 71.1156 143.1939 1409.8578
uniform 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
weibull 0.0010 0.0000 0.0000 0.0000 0.0010 0.0010
weibull_3p 0.0061 0.0040 0.0030 0.0040 0.0050 0.0050

Estimation time parameters discrete distributions

Distribution / Sample Size 1K 10K 100K 500K 1M 10M
bernoulli 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
binomial 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
geometric 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
hypergeometric 0.0773 0.0061 0.0030 0.0020 0.0030 0.0051
logarithmic 0.0210 0.0035 0.0171 0.0050 0.0030 0.0756
negative_binomial 0.0293 0.0000 0.0000 0.0000 0.0000 0.0000
poisson 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
uniform 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

Contribution

If you would like to contribute to the Phitter project, please create a pull request with your proposed changes or enhancements. All contributions are welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phitter-0.7.2.tar.gz (103.2 kB view details)

Uploaded Source

Built Distribution

phitter-0.7.2-py3-none-any.whl (241.0 kB view details)

Uploaded Python 3

File details

Details for the file phitter-0.7.2.tar.gz.

File metadata

  • Download URL: phitter-0.7.2.tar.gz
  • Upload date:
  • Size: 103.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for phitter-0.7.2.tar.gz
Algorithm Hash digest
SHA256 73c0d28fb2d38db5ac0a535abeaf90a7e79e37fbe4813c0faadb430585a90e08
MD5 d688aa10e73ccb30bfb69602f681176c
BLAKE2b-256 a49836f67d5fecf72393899d01c47bec8d57e774958a085c91c34498fc4aa937

See more details on using hashes here.

File details

Details for the file phitter-0.7.2-py3-none-any.whl.

File metadata

  • Download URL: phitter-0.7.2-py3-none-any.whl
  • Upload date:
  • Size: 241.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for phitter-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 29d6ebc5f2c185e1f6b230151ff2946a5675c28f70a7cd10208fd6963a1e4cac
MD5 eb218a133afa8292f9ef4071bd868915
BLAKE2b-256 1bcac225a10a939add1d68da9a71f877115d3c80044e5d892f2fce61b74222fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page