Package for performing Bayesian inference with insufficient and robust statistics.
Project description
Insufficient Gibbs Sampling
The InsufficientGibbs
package offers functionalities for sampling from posterior parameters when only robust and insufficient statistics of the data are available. A comprehensive explanation of the underlying theory can be found in the associated paper on the arXiv. Additionally, the code responsible for generating all figures presented in the paper is located in the figures
folder.
Table of contents
Abstract • Install • Tutorial • Examples • Add new distributions/models
Abstract
In some applied scenarios, the availability of complete data is restricted, often due to privacy concerns; only aggregated, robust and inefficient statistics derived from the data are made accessible. These robust statistics are not sufficient, but they demonstrate reduced sensitivity to outliers and offer enhanced data protection due to their higher breakdown point. We consider a parametric framework and propose a method to sample from the posterior distribution of parameters conditioned on various robust and inefficient statistics: specifically, the pairs (median, MAD) or (median, IQR), or a collection of quantiles. Our approach leverages a Gibbs sampler and simulates latent augmented data, which facilitates simulation from the posterior distribution of parameters belonging to specific families of distributions. A by-product of these samples from the joint posterior distribution of parameters and data given the observed statistics is that we can estimate Bayes factors based on observed statistics via bridge sampling. We validate and outline the limitations of the proposed methods through toy examples and an application to real-world income data.
InsufficientGibbs
package
Install
Install latest release via pip
pip install InsufficientGibbs
For latest development version clone the repository and install via pip
git clone https://github.com/antoineluciano/Insufficient-Gibbs-Sampling
cd src/InsufficientGibbs
pip install .
Available distributions
- Normal (
Normal
) - Cauchy (
Cauchy
) - Laplace(
Laplace
) - Gamma (
Gamma
) and its translated version (TranslatedGamma
) - LogNormal (
LogNormal
) and its translated version (TranslatedLogNormal
) - Weibull (
Weibull
) and its translated version (TranslatedWeibull
) - Generalized Pareto (
GeneralizedPareto
)
Add Model
to the end of their function names to use them as models.
Tutorial
Create prior distributions
For each parameter of your model, create a variable of type Distribution
representing its associated prior distribution. You must specify its hyperparameters and optionally its name.
Create the model
Create a variable of type Model with your selected distribution and its predefined parameters as arguments.
Sample from the posterior given insufficient statistics
To sample from the posterior of the model parameters given the observed statistics, you can use the three functions of the class Model
:
Gibbs_Quantile
: the observed data consists of a sequence of quantiles.Gibbs_med_IQR
: the observed data consists of the median and the interquartile range (IQR).Gibbs_med_MAD
: the observed data consists of the median and the median absolute deviation (MAD).
Examples
Cauchy distribution with Normal and Gamma priors
from InsufficientGibbs.Distribution import Normal, InverseGamma
from InsufficientGibbs.Models import CauchyModel
# Creating the prior distributions variables
mu = Normal(0,10, name= "x_0")
sigma = Gamma(2,2, name = "gamma")
# Creating the model variable
model = CauchyModel(mu,sigma)
T, N = 10000, 100
# Quantile case
q1, med, q3 = -1, 0, 1
probs = [.25, .5, .75]
Cauchy_Q = model.Gibbs_Quantiles(T, N, [q1, med, q3], probs)
med, IQR = 0, 2
Cauchy_med_IQR = model.Gibbs_med_IQR(T, N, med, IQR)
# Median, MAD case
med, MAD = 0, 1
Cauchy_med_MAD = model.Gibbs_med_MAD(T, N, med, MAD)
You can display the chain by using the display_chains
function.
How to add new distributions and models
As outlined in our paper, our method is applicable to all continuous distributions with compact support, as it only requires simulation according to a truncated version. Therefore, we provide guidance on how users can seamlessly integrate their own models by simply adding instances to the Distribution
and Model
classes.
For the Distribution
class, an instance should include an initialization function __init__
defining a variable _distribution
containing all relevant distribution information (e.g., pdf, cdf, ppf) and a domain function describing the domain of definition.
Similarly, for the Model
class, an instance should include an initialization function __init__
, along with functions returning initialization parameters for the three studied scenarios (Init_theta_Quantile
, Init_theta_med_MAD
, and Init_theta_med_IQR
).
To illustrate, we present examples of distributions. First, we showcase a distribution already implemented in SciPy (the Pareto distribution), followed by a custom implementation (the Pareto Type II distribution).
Example of the Pareto Distribution
Add the the instance Pareto
of the class Distribution
in the file Distribution.py
:
from scipy.stats import pareto
class Pareto(Distribution):
"""
Container for Pareto Distribution
Parameters
----------
scale : float
scale of the pareto
shape : float
shape of the pareto
name : str
"""
def __init__(self,
scale: float=1,
shape: float=1,
name: str="") -> None:
self.scale = PositiveContinuousVariable(scale, name='scale')
self.shape = PositiveContinuousVariable(shape, name='shape')
self.name = name
self._distribution = pareto(b=self.shape.value, scale=self.scale.value)
parameters_dict = {'scale':self.scale, 'shape':self.shape}
super().__init__(parameters_dict, self.name)
def domain(self) -> Tuple[float, float]:
return (self.parameters_value["scale"], float('inf'))
Add the the instance ParetoModel
of the class Model
in the file Models.py
:
class ParetoModel(Model):
def __init__(self, scale:Distribution, shape:Distribution) -> None:
if scale.name == "": scale.name = "scale"
if shape.name == "": shape.name = "shape"
if scale.name == shape.name:
raise ValueError("parameters must have different names.")
self.scale = scale
self.shape = shape
self.type_distribution = Pareto
self.parameters_dict = {scale.name: scale, shape.name: shape}
super().__init__(self.parameters_dict)
self.distrib_name = "pareto"
self.init_method = "naive"
def domain(self) -> Tuple[float, float]:
return (0, float('inf'))
def Init_theta_Quantile(self, Q, P):
scale = Q[0]-1
shape = (Q[-1] - Q[0]) / (Pareto(scale=scale)._distribution.ppf(P[-1]) - Pareto(scale=scale)._distribution.ppf(P[0]))
return {self.scale.name: scale, self.shape.name: shape}
def Init_theta_med_MAD(self, med, MAD):
scale = 1
shape = 1.5
return {self.scale.name: scale, self.shape.name: shape}
def Init_theta_med_IQR(self, med, IQR):
scale = 1
shape = 1.5
return {self.scale.name: scale, self.shape.name: shape}
Example of the Pareto Type II distribution
First, define a class pareto2
that plays the same role as the functions in SciPy.
class pareto2:
def __init__(self, loc=0, scale=1,shape=1):
self.loc = loc
self.scale = scale
self.shape = shape
def pdf(self,x):
x=np.array(x)
return np.where(x>=self.loc,(self.shape/self.scale)*(1+(x-self.loc)/self.scale)**(-self.shape-1),0)
def cdf(self,x):
x=np.array(x)
return np.where(x>=self.loc,1-(1+(x-self.loc)/self.scale)**(-self.shape),0)
def ppf(self,x):
x=np.array(x)
return self.loc+self.scale*((1-x)**(-1/self.shape)-1)
def logpdf(self,x):
x=np.array(x)
return np.where(x>=self.loc,np.log(self.shape)-np.log(self.scale)-(self.shape+1)*np.log(1+(x-self.loc)/self.scale),-np.inf)
def rvs(self,size):
return self.ppf(np.random.random(size))
Then, add the instance ParetoType2
of the class Distribution
using the above class pareto2 in the file Distribution.py
:
class ParetoType2(Distribution):
def __init__(self,
loc: float=0,
scale: float=1,
shape: float=1,
name: str="",
theta: list = []) -> None:
if theta!=[]:
loc,scale,shape = theta
self.loc = ContinuousVariable(loc, name='loc')
self.scale = PositiveContinuousVariable(scale, name='scale')
self.shape = PositiveContinuousVariable(shape, name='shape')
self.name = name
self._distribution = pareto2(loc=self.loc.value, scale=self.scale.value,shape=self.shape.value)
parameters_dict = {'loc': self.loc, 'scale':self.scale, 'shape':self.shape}
super().__init__(parameters_dict, self.name)
def domain(self) -> Tuple[float, float]:
return (self.parameters_value["loc"], float('inf'))
Finally, add the instance ParetoType2Model
of the class Model
in the file Models.py
:
class ParetoType2Model(Model):
def __init__(self, loc:Distribution, scale:Distribution, shape:Distribution) -> None:
if loc.name == "": loc.name = "loc"
if scale.name == "": scale.name = "scale"
if shape.name == "": shape.name = "shape"
if loc.name == scale.name or loc.name == shape.name or scale.name == shape.name:
raise ValueError("parameters must have different names.")
self.loc = loc
self.scale = scale
self.shape = shape
self.type_distribution = ParetoType2
self.parameters_dict = {loc.name: loc, scale.name: scale, shape.name: shape}
super().__init__(self.parameters_dict)
self.distrib_name = "pareto_type2"
self.init_method = "naive"
def domain(self) -> Tuple[float, float]:
return (self.parameters_value["loc"], float('inf'))
def Init_theta_Quantile(self, Q, P):
loc = 2*Q[0]-Q[1]
shape = 2.5
scale = (Q[-1] - Q[0]) / (ParetoType2(loc=loc, shape=shape)._distribution.ppf(P[-1]) - ParetoType2(loc=loc, shape=shape)._distribution.ppf(P[0]))
return {self.loc.name: loc, self.scale.name: scale, self.shape.name: shape}
def Init_theta_med_MAD(self, med, MAD):
loc = med-2*MAD
scale = 1
shape = 1.5
return {self.loc.name: loc, self.scale.name: scale, self.shape.name: shape}
def Init_theta_med_IQR(self, med, IQR):
loc = med-IQR
scale = 1
shape = 1.5
return {self.loc.name: loc, self.scale.name: scale, self.shape.name: shape}
MIT License
Copyright (c) [2023] [Antoine LUCIANO]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file InsufficientGibbs-1.2.1.tar.gz
.
File metadata
- Download URL: InsufficientGibbs-1.2.1.tar.gz
- Upload date:
- Size: 8.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 39975470ad422906f809b24a7c6a7c68461abc1694841a6fbf2bf5fbc5adc2d6 |
|
MD5 | 91766dfa117a3ae114a1a43e50704a80 |
|
BLAKE2b-256 | 3d893fff9ea398607c61064c2e948eed7d41f086b8c68cfa87e2ae1aff440fdc |
File details
Details for the file InsufficientGibbs-1.2.1-py3-none-any.whl
.
File metadata
- Download URL: InsufficientGibbs-1.2.1-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d86be78b2a40c30ae5629e141ebebe8302636d3d1468d3d7760839963fa4b04 |
|
MD5 | 0ccc0aa19dfa52ce0d546393354d7df5 |
|
BLAKE2b-256 | b74dd251e7f23e9832186dacb167fc7daffb8ab85637d5cb8b33dadbf40df1af |