Skip to main content

Empirical Information Bottleneck

Project description

EMBO - Empirical Bottleneck

License PyPI version Build status

A Python implementation of the Information Bottleneck analysis framework [Tishby, Pereira, Bialek 2001], especially geared towards the analysis of concrete, finite-size data sets.

Requirements

embo requires Python 3, numpy and scipy.

Installation

To install the latest release, run:

pip install embo

(depending on your system, you may need to use pip3 instead of pip in the command above).

Testing

(requires setuptools). If embo is already installed on your system, look for the copy of the test_embo.py script installed alongside the rest of the embo files and execute it. For example:

python /usr/lib/python3.X/site-packages/embo/test_embo.py

Alternatively, if you have downloaded the source, from within the root folder of the source distribution run:

python setup.py test

This should run through all tests specified in embo/test.

Usage

The Information Bottleneck

We refer to [Tishby, Pereira, Bialek 2001] for a general introduction to the Information Bottleneck. Briefly, if X and Y are two random variables, we are interested in finding another random variable M (called the "bottleneck" variable) that solves the following optimisation problem:

min_{p(m|x)}I(M:X) - β I(M:Y)

for any β>0, and where M is constrained to be independent on Y conditional on X:

p(x,m,y) = p(x)p(m|x)p(y|x)

Intuitively, we want to find the stochastic mapping p(M|X) that extracts from X as much information about Y as possible while forgetting all irrelevant information. β is a free parameter that sets the relative importance of forgetting irrelevant information versus remembering useful information. Usually, one is interested in the curve described by I(M:X) and I(M:Y) at the solution of the bottleneck problem for a range of values of β. This curve gives the optimal tradeoff of compression and prediction, telling us what is the minimum amount of information one needs to know about X to be able to predict Y to a certain accuracy, or vice versa, what is the maximum accuracy one can have in predicting Y given a certain amount of information about X.

Using embo

In embo, we assume that the true joint distribution of X and Y is not available, and that we only have a set of joint empirical observations. We also assume that X and Y both take on a finite number of discrete values. In its most basic usage, the empirical_bottleneck function takes as arguments an array of observations for X and an (equally long) array of observations for Y, and it returns a set of β values and the optimal values of I(M:X) and I(M:Y) corresponding to those β. The optimal tradeoff can then be visualised by plotting I(M:Y) vs I(M:Y).

For instance:

import numpy as np
from matplotlib import pyplot as plt
from embo import empirical_bottleneck

# data sequences
x = np.array([0,0,0,1,0,1,0,1,0,1])
y = np.array([0,1,0,1,0,1,0,1,0,1])

# compute the IB bound from the data
I_x,I_y,_,_,_,_ = empirical_bottleneck(x,y)

# plot the optimal compression-prediction bound
plt.plot(I_x,I_y)

More examples

A simple example of usage with synthetic data is located at embo/examples/Basic-Example.ipynb. A more meaningful example is located at embo/examples/Markov-Chains.ipynb, where we compute the Information Bottleneck between the past and the future of time series generated from different Markov chains.

Authors

embo is maintained by Eugenio Piasini, Alexandre Filipowicz and Jonathan Levine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for embo, version 0.3.3
Filename, size File type Python version Upload date Hashes
Filename, size embo-0.3.3-py3-none-any.whl (200.3 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size embo-0.3.3.tar.gz (164.7 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page