Intuitive syntax for complex statistical model fitting and parameter sensitivity analysis.
Project description
About
Pykelihood is a Python package for statistical analysis designed to give more flexibility to likelihood-based inference than is possible with scipy.stats. Distributions are designed from an Object Oriented Programming (OOP) point of view.
Main features include:
use any scipy.stats distribution, or make your own,
fit distributions of arbitrary complexity to your data,
add trends of different forms in the parameters of any distribution,
condition the log-likelihood with any form of penalty,
profile parameters with a penalised log-likelihood,
more to come…
Installation
Using pip
pip install pykelihood
From sources
git clone https://www.github.com/OpheliaMiralles/pykelihood
or
gh repo clone OpheliaMiralles/pykelihood
Usage
Basics
The most basic thing you can use pykelihood for is creating and manipulating distributions as objects.
>>> from pykelihood.distributions import Normal >>> n = Normal(1, 2) >>> n Normal(loc=1.0, scale=2.0)
n is an object of type Normal. It has 2 parameters, loc and scale. They can be accessed like standard Python attributes:
>>> n.loc 1.0
Using the Normal object, you can calculate standard values using the same semantics as scipy.stats:
>>> n.pdf([0, 1, 2]) array([0.17603266, 0.19947114, 0.17603266]) >>> n.cdf([0, 1, 2]) array([0.30853754, 0.5 , 0.69146246])
Or you can also generate random values according to this distribution:
>>> n.rvs(10) array([ 3.31370986, 5.02699468, -0.3573229 , 1.00460378, -3.26044871, 1.86362711, -0.84192901, 0.81132182, -2.03266978, 1.48079944])
Fitting
Let’s generate a larger sample from our previous object:
>>> data = n.rvs(1000) >>> data.mean() 1.025039359276458 >>> data.std() 1.9376460645596842
We can fit a Normal distribution to this data, which will return another Normal object:
>>> Normal.fit(data) Normal(loc=1.0250822420920338, scale=1.9376400770300832)
As you can see, the values are slightly different from the moments in the data. This is due to the fact that the fit method returns the Maximum Likelihood Estimator (MLE) for the data, and is thus the result of an optimisation (using scipy.optimize).
We can also fix the value for some parameters if we know them:
>>> Normal.fit(data, loc=1) Normal(loc=1.0, scale=1.9377929687500024)
Trend fitting
One of the most powerful features of pykelihood is the ability to fit arbitrary distributions. For instance, say our data has a linear trend with a very little gaussian noise we would like to capture:
>>> import numpy as np >>> data = np.linspace(-1, 1, 365) + np.random.normal(0, 0.001, 365) >>> data[:10] array([-0.99802364, -0.99503679, -0.98900434, -0.98277981, -0.979487 , -0.97393519, -0.96853445, -0.96149152, -0.95564004, -0.95054887])
If we try to fit this without a trend, the resulting distribution will miss out on most of the information:
>>> Normal.fit(data) Normal(loc=-3.6462053656578005e-05, scale=0.5789668679237372)
Let’s fit a Normal distribution with a trend in the loc parameter:
>>> from pykelihood import kernels >>> Normal.fit(data, loc=kernels.linear(np.arange(365))) Normal(loc=linear(a=-1.0000458359290572, b=0.005494714384381866), scale=0.0010055323717468906)
kernels.linear(X) builds a linear model in the form a + bX where a and b are parameters to be optimised for, and X is some covariate used to fit the data. If we assume the data were daily observations, then we find all the values we expected: -1 was the value on the first day, 0.05 was the daily increment (2 / 365 = 0.05), and there was a noise with std deviation 0.001.
Fitting with penalties
Another useful feature of pykelihood is the ability to customize the log-likelihood function with penalties, conditioning methods, stability conditions, etc. Most statistics-related packages offer to fit data using the standard opposite log-likelihood function, or in the best case, preselected models. To our knowledge, pykelihood is the only Python package allowing to easily customize the log-likelihood function to fit data.
>>> data = np.random.normal(0, 1, 1000) >>> def lassolike_score(distribution, data): ... return -np.sum(distribution.logpdf(data)) + 5 * np.abs(distribution.loc()) ... >>> std_fit = Normal.fit(data) >>> cond_fit = Normal.fit(data, score=lassolike_score) >>> std_fit.loc.value -0.010891307380632494 >>> cond_fit.loc.value -0.006210406541824357
Parameter profiling
Likelihood based inference relies on parameter estimation. This is why it’s important to quantify the sensitivity of a chosen model to each of those parameters. The stats_utils module in pykelihood includes the Profiler class that allows to link a model to a set of observations by providing goodness of fit metrics and profiles for all parameters.
>>> from pykelihood.profiler import Profiler >>> from pykelihood.distributions import GEV >>> fitted_gev = GEV.fit(data, loc=kernels.linear(np.linspace(-1, 1, len(data)))) >>> ll = Profiler(fitted_gev, data, inference_confidence=0.99) # level of confidence for tests >>> ll.AIC # the standard fit is without trend {'AIC MLE': -359.73533182968777, 'AIC Standard MLE Fit': 623.9896838880583} >>> ll.profiles.keys() dict_keys(['loc_a', 'loc_b', 'scale', 'shape']) >>> ll.profiles["shape"].head(5) loc_a loc_b scale shape likelihood 0 -0.000122 1.000812 0.002495 -0.866884 1815.022132 1 -0.000196 1.000795 0.001964 -0.662803 1882.043541 2 -0.000283 1.000477 0.001469 -0.458721 1954.283256 3 -0.000439 1.000012 0.000987 -0.254640 2009.740282 4 -0.000555 1.000016 0.000948 -0.050558 1992.812843
Confidence intervals can be computed for specified metrics:
>>> def metric(gev): return gev.loc() >>> ll.confidence_interval(metric) [-4.160287666875364, 4.7039931595123825]
Contributing
Poetry is used to manage pykelihood’s dependencies and build system. To install Poetry, you can refer to the installation instructions, but it boils down to running:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
To configure your environment to work on pykelihood, run:
git clone https://www.github.com/OpheliaMiralles/pykelihood # or any other clone method
cd pykelihood
poetry install
This will create a virtual environment for the project and install the required dependencies. To activate the virtual
environment, be sure to run poetry shell
prior to executing any code.
We also use the pre-commit library which adds git hooks to the repository. These must be installed with:
pre-commit install
Some parts of the code base use the matplotlib and hawkeslib package, but are for now not required to run most of the code, including the tests.
Tests
Tests are run using pytest. To run all tests, navigate to the root folder or the
tests folder and type pytest
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pykelihood-0.4.0.tar.gz
.
File metadata
- Download URL: pykelihood-0.4.0.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/22.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | af3597fe0b799030f1276fd01c68d4fb96b7d83c3e1d5c03cd5710f3100b84e4 |
|
MD5 | 9a053bbb77333c0fc79e598066fea4bd |
|
BLAKE2b-256 | 694a97e4d34b351b602e1d054a52139243508ff6293b05fa5a5ebb12f2662f72 |
File details
Details for the file pykelihood-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: pykelihood-0.4.0-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/22.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbbdaa584c6c126d7e683b34c282a243a42ee83e4c3ae0d3fbe54424af751acf |
|
MD5 | 18b3d99facc19fe833006d99fdaccf0d |
|
BLAKE2b-256 | d477501b707433cd81b07b40669b11030433cfedac6d4ff45c11ae4dabcf3b2e |