Find the best probability distribution for your data
Project description
bestdist ๐
Find the best probability distribution for your data
bestdist is a Python package that helps you identify which probability distribution best fits your data using statistical tests and information criteria.
Features
- ๐ฏ Automatic Distribution Fitting: Test multiple distributions at once
- ๐ Statistical Tests: Kolmogorov-Smirnov, Anderson-Darling, Chi-square
- ๐ Information Criteria: AIC and BIC for model selection
- ๐จ Visualization: Built-in plotting for fit assessment
- ๐ง Extensible: Easy to add custom distributions
- ๐ผ Pandas Integration: Works seamlessly with pandas DataFrames
- โ Type Hints: Full type annotation support
- ๐งช Well Tested: Comprehensive test suite
Installation
From PyPI (when published)
pip install bestdist
From source
git clone https://github.com/Wilmar3752/pdist.git
cd pdist
pip install -e .
Development installation
pip install -e ".[dev]"
Quick Start
from bestdist import DistributionFitter
import numpy as np
# Your data (can be list, numpy array, or pandas Series)
data = np.random.gamma(2, 2, 1000)
# Create fitter and find best distribution
fitter = DistributionFitter(data)
results = fitter.fit()
# Get best distribution
best = fitter.get_best_distribution()
print(f"Best fit: {best['distribution']}")
print(f"Parameters: {best['parameters']}")
print(f"P-value: {best['p_value']:.4f}")
# View summary of all fits
print(fitter.summary())
# Visualize the best fit
fitter.plot_best_fit()
# Compare all distributions
fitter.compare_distributions()
Supported Distributions
Continuous Distributions
- Normal (Gaussian): Symmetric, bell-shaped
- Gamma: Skewed, positive values
- Beta: Bounded [0, 1], flexible shapes
- Weibull: Common in reliability engineering
Coming Soon
- Lognormal
- Exponential
- Uniform
- Student's t
- Chi-square
- Poisson (discrete)
- Binomial (discrete)
Advanced Usage
Custom Distribution List
from bestdist import DistributionFitter
from bestdist.distributions.continuous import Normal, Gamma, Beta
# Only fit specific distributions
fitter = DistributionFitter(
data,
distributions=[Normal, Gamma, Beta]
)
results = fitter.fit()
Selection Criteria
# Select best by different criteria
best_pvalue = fitter.get_best_distribution(criterion='p_value')
best_aic = fitter.get_best_distribution(criterion='aic')
best_bic = fitter.get_best_distribution(criterion='bic')
Individual Distribution Usage
from bestdist.distributions.continuous import Normal
import numpy as np
# Generate data
data = np.random.normal(5, 2, 1000)
# Fit distribution
dist = Normal(data)
params = dist.fit()
print(f"Mean: {dist.mean:.2f}")
print(f"Std: {dist.std:.2f}")
# Test goodness of fit
ks_stat, p_value = dist.test_goodness_of_fit()
print(f"KS statistic: {ks_stat:.4f}, p-value: {p_value:.4f}")
# Generate samples
samples = dist.rvs(size=100, random_state=42)
# Evaluate PDF/CDF
x = np.linspace(0, 10, 100)
pdf_values = dist.pdf(x)
cdf_values = dist.cdf(x)
Working with Pandas
import pandas as pd
from bestdist import DistributionFitter
# Load data
df = pd.read_csv('data.csv')
# Fit distribution to a column
fitter = DistributionFitter(df['column_name'])
best = fitter.get_best_distribution()
# Get summary as DataFrame
summary_df = fitter.summary()
print(summary_df)
Custom Distributions
from bestdist.core.base import BaseDistribution
from scipy.stats import expon, rv_continuous
from typing import Tuple
class Exponential(BaseDistribution):
"""Custom exponential distribution."""
def _get_scipy_dist(self) -> rv_continuous:
return expon
def _extract_params(self, fit_result: Tuple) -> dict:
return {
'loc': float(fit_result[0]),
'scale': float(fit_result[1])
}
# Use your custom distribution
fitter = DistributionFitter(data, distributions=[Exponential])
results = fitter.fit()
API Reference
DistributionFitter
Main class for fitting multiple distributions.
Parameters:
data: Array-like data to fitdistributions: List of distribution classes (default: all available)method: Goodness-of-fit test method ('ks', 'ad', 'chi2')
Methods:
fit(verbose=True): Fit all distributionsget_best_distribution(criterion='p_value'): Get best fitsummary(top_n=None): Get summary DataFrameplot_best_fit(bins=30): Plot best fit distributioncompare_distributions(): Compare all fits
BaseDistribution
Abstract base class for distributions.
Methods:
fit(): Fit distribution to datatest_goodness_of_fit(method='ks'): Perform GOF testpdf(x): Probability density functioncdf(x): Cumulative distribution functionppf(q): Percent point function (inverse CDF)rvs(size, random_state): Generate random samplesget_info(): Get distribution information
Testing
Run the test suite:
# Run all tests
pytest
# Run with coverage
pytest --cov=pdist --cov-report=html
# Run specific test file
pytest tests/test_distributions/test_normal.py
Development
Setup Development Environment
# Clone repository
git clone https://github.com/yourusername/pdist.git
cd pdist
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
Code Quality
# Format code
black src tests
# Sort imports
isort src tests
# Lint
flake8 src tests
# Type checking
mypy src
Project Structure
pdist/
โโโ src/pdist/
โ โโโ __init__.py
โ โโโ core/
โ โ โโโ base.py # Abstract base class
โ โ โโโ fitter.py # Main fitter
โ โโโ distributions/
โ โ โโโ continuous/
โ โ โโโ normal.py
โ โ โโโ gamma.py
โ โ โโโ beta.py
โ โ โโโ weibull.py
โ โโโ utils/
โ โโโ exceptions.py
โ โโโ types.py
โโโ tests/
โ โโโ test_distributions/
โ โโโ test_core/
โ โโโ conftest.py
โโโ pyproject.toml
โโโ README.md
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this package in your research, please cite:
@software{bestdist2024,
author = {Sepulveda, Wilmar},
title = {bestdist: Find the best probability distribution for your data},
year = {2024},
url = {https://github.com/Wilmar3752/pdist}
}
Roadmap
- Add more distributions (lognormal, exponential, etc.)
- Support for discrete distributions
- Parallel fitting for large datasets
- GUI/Web interface
- Integration with scikit-learn
- Bayesian model selection
- Mixture distributions
Acknowledgments
- Built with scipy and numpy
- Inspired by the need for easy distribution fitting in data science workflows
Contact
- GitHub: @Wilmar3752
- Email: your.email@example.com
Made with โค๏ธ for the data science community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bestdist-0.1.0.tar.gz.
File metadata
- Download URL: bestdist-0.1.0.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1300cdaf94a376333ace059ed93036f6e57626c89ea4ca67abe0b662fbeab04
|
|
| MD5 |
e25759455a4b1cf687b52e0c8bda8c34
|
|
| BLAKE2b-256 |
43de2c811c92ca8932a1110fc1ec464d3b6af0af61b4ad0979bbfeab5f45c7ad
|
Provenance
The following attestation bundles were made for bestdist-0.1.0.tar.gz:
Publisher:
python-publish.yml on Wilmar3752/pdist
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bestdist-0.1.0.tar.gz -
Subject digest:
e1300cdaf94a376333ace059ed93036f6e57626c89ea4ca67abe0b662fbeab04 - Sigstore transparency entry: 798534185
- Sigstore integration time:
-
Permalink:
Wilmar3752/pdist@074008fe8402ae2075a738e8b1b8a270d01c580c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Wilmar3752
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@074008fe8402ae2075a738e8b1b8a270d01c580c -
Trigger Event:
release
-
Statement type:
File details
Details for the file bestdist-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bestdist-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2dc791c3761768df2acdb7ed805fcc16399f7918cf87095d15aa1d30ffc003d
|
|
| MD5 |
3c7044064f17d4f9502e8a6c9f8829d4
|
|
| BLAKE2b-256 |
4ff67a9fe1f0c937f25ee858f18b3c07c1995c68fb02910406b9f613a270fea7
|
Provenance
The following attestation bundles were made for bestdist-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on Wilmar3752/pdist
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bestdist-0.1.0-py3-none-any.whl -
Subject digest:
c2dc791c3761768df2acdb7ed805fcc16399f7918cf87095d15aa1d30ffc003d - Sigstore transparency entry: 798534186
- Sigstore integration time:
-
Permalink:
Wilmar3752/pdist@074008fe8402ae2075a738e8b1b8a270d01c580c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/Wilmar3752
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@074008fe8402ae2075a738e8b1b8a270d01c580c -
Trigger Event:
release
-
Statement type: