distfit is a Python library for probability density fitting.
Project description
Key Features
| Feature | Description | Medium | Gumroad+Podcast |
|---|---|---|---|
| Parametric Fitting | Fit distributions on empirical data X. | Link | Link |
| Non-Parametric Fitting | Fit distributions on empirical data X using non-parametric approaches (quantile, percentiles). | - | - |
| Multivariate Fitting | Fit multivariate distributions on empirical data X that contains multiple columns. | - | - |
| Discrete Fitting | Fit distributions on empirical data X using binomial distribution. | - | - |
| Predict | Compute probabilities for response variables y. | - | - |
| Outlier Detection | Detect anomalies using fitted distributions. | Link | Link |
| Synthetic Data | Generate synthetic data. | Link | Link |
| Plots | Various plotting functionalities. | - | - |
Resources and Links
- Example Notebooks: Examples
- Medium Blogs Medium
- Gumroad Blogs with podcast: GumRoad
- Documentation: Website
- Bug Reports and Feature Requests: GitHub Issues
Background
-
For the parametric approach, The distfit library can determine the best fit across 89 theoretical distributions. To score the fit, one of the scoring statistics for the good-of-fitness test can be used used, such as RSS/SSE, Wasserstein, Kolmogorov-Smirnov (KS), or Energy. After finding the best-fitted theoretical distribution, the loc, scale, and arg parameters are returned, such as mean and standard deviation for normal distribution.
-
For the non-parametric approach, the distfit library contains two methods, the quantile and percentile method. Both methods assume that the data does not follow a specific probability distribution. In the case of the quantile method, the quantiles of the data are modeled whereas for the percentile method, the percentiles are modeled.
-
In case the dataset contains discrete values, the distift library contains the option for discrete fitting. The best fit is then derived using the binomial distribution.
Installation
Install distfit from PyPI
pip install distfit
Install from Github source
pip install git+https://github.com/erdogant/distfit
Imort Library
import distfit
print(distfit.__version__)
# Import library
from distfit import distfit
Examples
Example: Quick start to find best fit for your input data
# [distfit] >INFO> fit
# [distfit] >INFO> transform
# [distfit] >INFO> [norm ] [0.00 sec] [RSS: 0.00108326] [loc=-0.048 scale=1.997]
# [distfit] >INFO> [expon ] [0.00 sec] [RSS: 0.404237] [loc=-6.897 scale=6.849]
# [distfit] >INFO> [pareto ] [0.00 sec] [RSS: 0.404237] [loc=-536870918.897 scale=536870912.000]
# [distfit] >INFO> [dweibull ] [0.06 sec] [RSS: 0.0115552] [loc=-0.031 scale=1.722]
# [distfit] >INFO> [t ] [0.59 sec] [RSS: 0.00108349] [loc=-0.048 scale=1.997]
# [distfit] >INFO> [genextreme] [0.17 sec] [RSS: 0.00300806] [loc=-0.806 scale=1.979]
# [distfit] >INFO> [gamma ] [0.05 sec] [RSS: 0.00108459] [loc=-1862.903 scale=0.002]
# [distfit] >INFO> [lognorm ] [0.32 sec] [RSS: 0.00121597] [loc=-110.597 scale=110.530]
# [distfit] >INFO> [beta ] [0.10 sec] [RSS: 0.00105629] [loc=-16.364 scale=32.869]
# [distfit] >INFO> [uniform ] [0.00 sec] [RSS: 0.287339] [loc=-6.897 scale=14.437]
# [distfit] >INFO> [loggamma ] [0.12 sec] [RSS: 0.00109042] [loc=-370.746 scale=55.722]
# [distfit] >INFO> Compute confidence intervals [parametric]
# [distfit] >INFO> Compute significance for 9 samples.
# [distfit] >INFO> Multiple test correction method applied: [fdr_bh].
# [distfit] >INFO> Create PDF plot for the parametric method.
# [distfit] >INFO> Mark 5 significant regions
# [distfit] >INFO> Estimated distribution: beta [loc:-16.364265, scale:32.868811]
Example: Plot summary of the tested distributions
The distfit library provides multivariate distribution fitting that enables modeling complex dependencies between multiple variables using copula-based methods.
from distfit import distfit
# Initialize with multivariate mode
dfit = distfit(multivariate=True)
# Load example data
X = dfit.import_example(data='multi_normal')
# X = dfit.import_example(data='multi_t')
# Fit model
dfit.fit_transform(X)
# Access estimated correlation matrix (Gaussian copula)
print(dfit.model.corr)
# Evaluate joint density
results = dfit.evaluate_pdf(X)
print(results['score'])
print(results['copula_density'])
# Generate synthetic samples
Xnew = dfit.generate(n=10)
# Detect multivariate outliers
bool_outliers = dfit.predict_outliers(X)
Example: Plot summary of the tested distributions
After we have a fitted model, we can make some predictions using the theoretical distributions. After making some predictions, we can plot again but now the predictions are automatically included.
Example: Make predictions using the fitted distribution
Example: Test for one specific distributions
The full list of distributions is listed here: https://erdogant.github.io/distfit/pages/html/Parametric.html
Example: Test for multiple distributions
The full list of distributions is listed here: https://erdogant.github.io/distfit/pages/html/Parametric.html
Example: Fit discrete distribution
from scipy.stats import binom
# Generate random numbers
# Set parameters for the test-case
n = 8
p = 0.5
# Generate 10000 samples of the distribution of (n, p)
X = binom(n, p).rvs(10000)
print(X)
# [5 1 4 5 5 6 2 4 6 5 4 4 4 7 3 4 4 2 3 3 4 4 5 1 3 2 7 4 5 2 3 4 3 3 2 3 5
# 4 6 7 6 2 4 3 3 5 3 5 3 4 4 4 7 5 4 5 3 4 3 3 4 3 3 6 3 3 5 4 4 2 3 2 5 7
# 5 4 8 3 4 3 5 4 3 5 5 2 5 6 7 4 5 5 5 4 4 3 4 5 6 2...]
# Import distfit
from distfit import distfit
# Initialize for discrete distribution fitting
dfit = distfit(method='discrete')
# Run distfit to and determine whether we can find the parameters from the data.
dfit.fit_transform(X)
# [distfit] >fit..
# [distfit] >transform..
# [distfit] >Fit using binomial distribution..
# [distfit] >[binomial] [SSE: 7.79] [n: 8] [p: 0.499959] [chi^2: 1.11]
# [distfit] >Compute confidence interval [discrete]
Example: Make predictions on unseen data for discrete distribution
Example: Generate samples based on the fitted distribution
Star history
Contributors
Thank the contributors!
Maintainer
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file distfit-2.0.1.tar.gz.
File metadata
- Download URL: distfit-2.0.1.tar.gz
- Upload date:
- Size: 53.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2bc40d7dbe16bdbd2a8f684bca62bc4f5a3dc9d7653ae4eabeb9194f12a93e9
|
|
| MD5 |
cbd84bd19ea88ceccf26315ac25bbfad
|
|
| BLAKE2b-256 |
90ba317dd45bc6b1eaaa230dc765eb5454e3b42493051961fb845b569664befb
|
File details
Details for the file distfit-2.0.1-py3-none-any.whl.
File metadata
- Download URL: distfit-2.0.1-py3-none-any.whl
- Upload date:
- Size: 51.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72e20482b54f4ae06a6610dc191aed02da0ab2a6068b04f2ecb640859177a5d8
|
|
| MD5 |
d1b6382a30165cf7d6bf46cf38c471e1
|
|
| BLAKE2b-256 |
f4fba3bceb6eaa73e488c1c40dcf0897ddac6a90d757758ab7ebbaf86b95dcb8
|