Skip to main content

Empirical Information Bottleneck

Project description

EMBO - Empirical Bottleneck

License PyPI version Build status

A Python implementation of the Information Bottleneck analysis framework [Tishby, Pereira, Bialek 2001], especially geared towards the analysis of concrete, finite-size data sets.


embo requires Python 3, numpy and scipy.


To install the latest release, run:

pip install embo

(depending on your system, you may need to use pip3 instead of pip in the command above).


(requires setuptools). If embo is already installed on your system, look for the copy of the script installed alongside the rest of the embo files and execute it. For example:

python /usr/lib/python3.X/site-packages/embo/

Alternatively, if you have downloaded the source, from within the root folder of the source distribution run:

python test

This should run through all tests specified in embo/test.


The Information Bottleneck

We refer to [Tishby, Pereira, Bialek 2001] for a general introduction to the Information Bottleneck. Briefly, if X and Y are two random variables, we are interested in finding another random variable M (called the "bottleneck" variable) that solves the following optimisation problem:

min_{p(m|x)}I(M:X) - β I(M:Y)

for any β>0, and where M is constrained to be independent on Y conditional on X:

p(x,m,y) = p(x)p(m|x)p(y|x)

Intuitively, we want to find the stochastic mapping p(M|X) that extracts from X as much information about Y as possible while forgetting all irrelevant information. β is a free parameter that sets the relative importance of forgetting irrelevant information versus remembering useful information. Usually, one is interested in the curve described by I(M:X) and I(M:Y) at the solution of the bottleneck problem for a range of values of β. This curve gives the optimal tradeoff of compression and prediction, telling us what is the minimum amount of information one needs to know about X to be able to predict Y to a certain accuracy, or vice versa, what is the maximum accuracy one can have in predicting Y given a certain amount of information about X.

Using embo

In embo, we assume that the true joint distribution of X and Y is not available, and that we only have a set of joint empirical observations. We also assume that X and Y both take on a finite number of discrete values. The main point of entry to the package is the EmpiricalBottleneck class. In its constructor, EmpiricalBottleneck takes as arguments an array of observations for X and an (equally long) array of observations for Y, together with other optional parameters (see the docstring for details). In the most basic use case, users can call the get_information_bottleneck method of an EmpiricalBottleneck object, which will return a set of β values and the optimal values of I(M:X) and I(M:Y) corresponding to those β. The optimal tradeoff can then be visualised by plotting I(M:Y) vs I(M:Y).

For instance:

import numpy as np
from matplotlib import pyplot as plt
from embo import EmpiricalBottleneck

# data sequences
x = np.array([0,0,0,1,0,1,0,1,0,1])
y = np.array([0,1,0,1,0,1,0,1,0,1])

# compute the IB bound from the data
I_x,I_y,β = EmpiricalBottleneck(x,y).get_empirical_bottleneck()

# plot the optimal compression-prediction bound

More examples

A simple example of usage with synthetic data is located at embo/examples/Basic-Example.ipynb. A more meaningful example is located at embo/examples/Markov-Chains.ipynb, where we compute the Information Bottleneck between the past and the future of time series generated from different Markov chains.

Further details

For more details, please consult the docstrings for empirical_bottleneck and IB.


embo is maintained by Eugenio Piasini, Alexandre Filipowicz and Jonathan Levine.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for embo, version 1.0.2
Filename, size File type Python version Upload date Hashes
Filename, size embo-1.0.2.tar.gz (260.8 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page