Skip to main content

Empirical Information Bottleneck

Project description

EMBO - Empirical Bottleneck

License PyPI version Build status

A Python implementation of the Information Bottleneck analysis framework [Tishby, Pereira, Bialek 2001], especially geared towards the analysis of concrete, finite-size data sets.

Requirements

embo requires Python 3, numpy and scipy.

Installation

To install the latest release, run:

pip install embo

(depending on your system, you may need to use pip3 instead of pip in the command above).

Testing

(requires setuptools). If embo is already installed on your system, look for the copy of the test_embo.py script installed alongside the rest of the embo files and execute it. For example:

python /usr/lib/python3.X/site-packages/embo/test_embo.py

Alternatively, if you have downloaded the source, from within the root folder of the source distribution run:

python setup.py test

This should run through all tests specified in embo/test.

Usage

The Information Bottleneck

We refer to [Tishby, Pereira, Bialek 2001] for a general introduction to the Information Bottleneck. Briefly, if X and Y are two random variables, we are interested in finding another random variable M (called the "bottleneck" variable) that solves the following optimisation problem:

min_{p(m|x)}I(M:X) - β I(M:Y)

for any β>0, and where M is constrained to be independent on Y conditional on X:

p(x,m,y) = p(x)p(m|x)p(y|x)

Intuitively, we want to find the stochastic mapping p(M|X) that extracts from X as much information about Y as possible while forgetting all irrelevant information. β is a free parameter that sets the relative importance of forgetting irrelevant information versus remembering useful information. Usually, one is interested in the curve described by I(M:X) and I(M:Y) at the solution of the bottleneck problem for a range of values of β. This curve gives the optimal tradeoff of compression and prediction, telling us what is the minimum amount of information one needs to know about X to be able to predict Y to a certain accuracy, or vice versa, what is the maximum accuracy one can have in predicting Y given a certain amount of information about X.

Using embo

In embo, we assume that the true joint distribution of X and Y is not available, and that we only have a set of joint empirical observations. We also assume that X and Y both take on a finite number of discrete values. In its most basic usage, the empirical_bottleneck function takes as arguments an array of observations for X and an (equally long) array of observations for Y, and it returns a set of β values and the optimal values of I(M:X) and I(M:Y) corresponding to those β. The optimal tradeoff can then be visualised by plotting I(M:Y) vs I(M:Y).

For instance:

import numpy as np
from matplotlib import pyplot as plt
from embo import empirical_bottleneck

# data sequences
x = np.array([0,0,0,1,0,1,0,1,0,1])
y = np.array([0,1,0,1,0,1,0,1,0,1])

# compute the IB bound from the data
I_x,I_y,_ = empirical_bottleneck(x,y)

# plot the optimal compression-prediction bound
plt.plot(I_x,I_y)

More examples

A simple example of usage with synthetic data is located at embo/examples/Basic-Example.ipynb. A more meaningful example is located at embo/examples/Markov-Chains.ipynb, where we compute the Information Bottleneck between the past and the future of time series generated from different Markov chains.

Further details

For more details, please consult the docstrings for empirical_bottleneck and IB.

Authors

embo is maintained by Eugenio Piasini, Alexandre Filipowicz and Jonathan Levine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embo-0.4.0.tar.gz (182.9 kB view details)

Uploaded Source

Built Distribution

embo-0.4.0-py3-none-any.whl (218.1 kB view details)

Uploaded Python 3

File details

Details for the file embo-0.4.0.tar.gz.

File metadata

  • Download URL: embo-0.4.0.tar.gz
  • Upload date:
  • Size: 182.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9

File hashes

Hashes for embo-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a9a9189ab470e64aa8a21cd4d7f3a30225c3a4c7525e2fc289ce75c2d0763966
MD5 a739ccde38f9b0154b296adf33726eb6
BLAKE2b-256 08a0d4f895e19340ae5b004cac5b743af7fdf338cbb7b6d910f5c54f8585ba4b

See more details on using hashes here.

File details

Details for the file embo-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: embo-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 218.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.6.9

File hashes

Hashes for embo-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f4c9a4a1ccf4fcd9181d2f9912909217cc984e8313b2784af181871f599f2d6
MD5 eea9725b9d0b8d76a7b6a6968ceb5e6e
BLAKE2b-256 265d0ebe9da171e3be1bde56a91f3f5aa5a075fe17f94c54af1e8d71272d46c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page