the package fits data to metalog distribution and generates samples, quantiles, densities and probabilities based on the fitted distribution.

Project description

Sergey Kim, Reidar Brumer Bratvold

Metalog Distribution

The metalog distributions constitute a new system of continuous univariate probability distributions designed for flexibility, simplicity, and ease/speed of use in practice. The system is comprised of unbounded, semi-bounded, and bounded distributions, each of which offers nearly unlimited shape flexibility compared to Pearson, Johnson, and other traditional systems of distributions.

The package requires the following packages: numpy, pandas, matplotlib and scipy (ver 1.3.1).

The following paper and website provide a full background of the metalog distribution.

Using the Package

This Python package was transfered from RMetalog package by Isaac J. Faber and therefore shares the same R-based structure.

The data used for demonstration are body length of salmon and were collected in 2008-2010:

import numpy as np
import pandas as pd

salmon = pd.read_csv("Chinook and forage fish lengths.csv")

# Filtered data for eelgrass vegetation and chinook salmon
salmon = salmon[(salmon['Vegetation'] == 'Eelgrass') & (salmon['Species'] == 'Chinook_salmon')]
salmon = np.array(salmon['Length'])

To import package with metalog distribution run the code:

from metalog import metalog

To fit the data to metalog distribution one should use function metalog.fit(). It has the following arguments:

• x: data.
• bounds: bounds of metalog distribution. Depending on boundedness argument can take zero, one or two values.
• boundedness: boundedness of metalog distribution. Can take values 'u' for unbounded, 'sl' for semi-bounded lower, 'su' for semi-bounded upper and 'b' for bounded on both sides.
• term_limit: maximum number of terms to specify the metalog distribution. Can take values from 3 to 30.
• term_lower_bound: the lowest number of terms to specify the metalog distribution. Must be greater or equal to 2 and less than term_limit. The argument is optional. Default value is 2.
• step_len: size of steps to summarize the distribution. The argument is optional. Default value is 0.01.
• probs: probabilities corresponding to data. The argument is optional. Default value is numpy.nan.
• fit_method: fit method 'OLS', 'LP' or 'any'. The argument is optional. Default value is 'any'.
• save_data: if True then data will be saved for future update. The argument is optional. Default values is False.

Fit metalog distribution to data and store the result to variable metalog_salmon. The distribution is bounded on both sides: from 0 to 200. Term limit is set to 10:

metalog_salmon = metalog.fit(x=salmon, boundedness='b', bounds=[0, 200], term_limit=10)

To get summary of distribution call the following function with only one argument m - the variable that stores fitted metalog distribution:

metalog.summary(m=metalog_salmon)

Output:

-----------------------------------------------
SUMMARY OF METALOG DISTRIBUTION OBJECT
-----------------------------------------------

PARAMETERS

Term Limit:  10
Term Lower Bound:  2
Boundedness:  b
Bounds (only used based on boundedness):  [0, 200]
Step Length for Distribution Summary:  0.01
Method Use for Fitting:  any
Number of Data Points Used:  138
Original Data Saved:  False

VALIDATION AND FIT METHOD

term valid method
2      2   yes    OLS
3      3   yes    OLS
4      4   yes    OLS
5      5   yes    OLS
6      6   yes    OLS
7      7   yes    OLS
8      8   yes    OLS
9      9   yes    OLS
10    10   yes    OLS

It’s possible to plot corresponding PDF and CDF of metalog distribution:

metalog.plot(m=metalog_salmon)

Output: To draw samples from distribution use metalog.r() function where n is number of samples and term specifies the terms of distribution to sample from:

metalog.r(m=metalog_salmon, n=5, term=10)

Output:

array([73.81897286, 86.74055734, 84.22509619, 83.80426247, 97.79800677])

To get densities based on quantiles type metalog.d() function where q is vector of quantiles:

metalog.d(m=metalog_salmon, q=[50, 110, 150], term=10)

Output:

array([0.00038265, 0.00712032, 0.00373991])

To calculate probabilities based on quantiles use metalog.p() function:

metalog.p(m=metalog_salmon, q=[50, 110, 150], term=10)

Output:

array([0.00275336, 0.82349578, 0.98686581])

Finally, to get quantiles from probabilites input metalog.q():

metalog.q(m=metalog_salmon, y=[0.00275336, 0.82349578, 0.98686581], term=10)

Output:

array([ 50.02583336, 109.99861143, 149.99737059])

Project details

This version 0.2.2