Python package for information theory on discrete random variables.
Project description
dit is a Python package for information theory.
Introduction
Information theory is a powerful extension to probability and statistics, quantifying dependencies among arbitrary random variables in a way that is consistent and comparable across systems and scales. Information theory was originally developed to quantify how quickly and reliably information could be transmitted across an arbitrary channel. The demands of modern, data-driven science have been coopting and extending these quantities and methods into unknown, multivariate settings where the interpretation and best practices are not known. For example, there are at least four reasonable multivariate generalizations of the mutual information, none of which inherit all the interpretations of the standard bivariate case. Which is best to use is context-dependent. dit implements a vast range of multivariate information measures in an effort to allow information practitioners to study how these various measures behave and interact in a variety of contexts. We hope that having all these measures and techniques implemented in one place will allow the development of robust techniques for the automated quantification of dependencies within a system and concrete interpretation of what those dependencies mean.
Citing
If you use dit in your research, please cite it as:
@article{dit,
Author = {James, R. G. and Ellison, C. J. and Crutchfield, J. P.},
Title = {{dit}: a {P}ython package for discrete information theory},
Journal = {The Journal of Open Source Software},
Volume = {3},
Number = {25},
Pages = {738},
Year = {2018},
Doi = {https://doi.org/10.21105/joss.00738}
}
Basic Information
Documentation
Downloads
https://anaconda.org/conda-forge/dit
Dependencies |
|---|
Optional Dependencies
colorama: colored column heads in PID indicating failure modes
cython: faster sampling from distributions
hypothesis: random sampling of distributions
jax, jaxlib: JAX-based optimization backend with autodiff support
matplotlib, python-ternary: plotting of various information-theoretic expansions
numdifftools: numerical evaluation of gradients and hessians during optimization
pint: add units to informational values
pycddlib-standalone, pypoman: polytope vertex enumeration for convex-maximization-based measures
pytensor: PyTensor-based optimization backend with autodiff support
scikit-learn: faster nearest-neighbor lookups during entropy/mutual information estimation from samples
torch: PyTorch-based optimization backend with autodiff and GPU support
Install
The easiest way to install is:
pip install dit
If you want to install dit within a conda environment, you can simply do:
conda install -c conda-forge dit
For development, we recommend uv:
git clone https://github.com/dit/dit.git
cd dit
uv sync --extra dev
This installs dit in editable mode with all development dependencies (tests, docs, linting, type checking, and optional backends).
Testing
# Using uv (recommended)
uv run pytest
# Or with pip
pip install -e ".[test]"
pytest
Code and bug tracker
License
BSD 3-Clause, see LICENSE.txt for details.
Implemented Measures
dit implements the following information measures. Most of these are implemented in multivariate & conditional generality, where such generalizations either exist in the literature or are relatively obvious — for example, though it is not in the literature, the multivariate conditional exact common information is implemented here.
Entropies
|
Mutual Informations
|
Divergences
|
Common Informations
|
Other Measures
|
|
Partial Information Decomposition
|
||
Secret Key Agreement Bounds
|
Quickstart
The basic usage of dit corresponds to creating distributions, modifying them if need be, and then computing properties of those distributions. First, we import:
>>> import dit
Suppose we have a really thick coin, one so thick that there is a reasonable chance of it landing on its edge. Here is how we might represent the coin in dit.
>>> d = dit.Distribution(['H', 'T', 'E'], [.4, .4, .2])
>>> print(d)
Class: Distribution
Alphabet: ('E', 'H', 'T') for all rvs
Base: linear
Outcome Class: str
Outcome Length: 1
RV Names: None
x p(x)
E 0.2
H 0.4
T 0.4
Calculate the probability of H and also of the combination H or T.
>>> d['H']
0.4
>>> d.event_probability(['H','T'])
0.8
Calculate the Shannon entropy and extropy of the joint distribution.
>>> dit.shannon.entropy(d)
1.5219280948873621
>>> dit.other.extropy(d)
1.1419011889093373
Create a distribution where Z = xor(X, Y).
>>> import dit.example_dists
>>> d = dit.example_dists.Xor()
>>> d.set_rv_names(['X', 'Y', 'Z'])
>>> print(d)
Class: Distribution
Alphabet: ('0', '1') for all rvs
Base: linear
Outcome Class: str
Outcome Length: 3
RV Names: ('X', 'Y', 'Z')
x p(x)
000 0.25
011 0.25
101 0.25
110 0.25
Calculate the Shannon mutual informations I[X:Z], I[Y:Z], and I[X,Y:Z].
>>> dit.shannon.mutual_information(d, ['X'], ['Z'])
0.0
>>> dit.shannon.mutual_information(d, ['Y'], ['Z'])
0.0
>>> dit.shannon.mutual_information(d, ['X', 'Y'], ['Z'])
1.0
Calculate the marginal distribution P(X,Z). Then print its probabilities as fractions, showing the mask.
>>> d2 = d.marginal(['X', 'Z'])
>>> print(d2.to_string(show_mask=True, exact=True))
Class: Distribution
Alphabet: ('0', '1') for all rvs
Base: linear
Outcome Class: str
Outcome Length: 2 (mask: 3)
RV Names: ('X', 'Z')
x p(x)
0*0 1/4
0*1 1/4
1*0 1/4
1*1 1/4
Convert the distribution probabilities to log (base 3.5) probabilities, and access its probability mass function.
>>> d2.set_base(3.5)
>>> d2.pmf
array([-1.10658951, -1.10658951, -1.10658951, -1.10658951])
Draw 5 random samples from this distribution.
>>> dit.math.prng.seed(1)
>>> d2.rand(5)
['01', '10', '00', '01', '00']
Source and Channel Coding
Beyond measures, dit builds explicit codes and ships a catalog of channels.
The dit.coding module constructs lossless source codes (Shannon, Fano, Shannon-Fano-Elias, Huffman, length-limited Huffman, Golomb/Rice, Tunstall, and the universal integer codes) and reports their code-theoretic properties (rate, redundancy, efficiency, the Kraft sum, and whether the code is prefix-free / uniquely decodable / optimal).
>>> from dit.coding import huffman
>>> d = dit.Distribution(['a', 'b', 'c', 'd', 'e'], [0.4, 0.2, 0.2, 0.1, 0.1])
>>> code = huffman(d)
>>> code.average_length()
2.2
>>> float(code.source_entropy())
2.1219280948873624
>>> code.is_optimal(), code.is_prefix_free()
(True, True)
It also builds binary (GF(2)) channel codes — linear block codes (repetition, parity-check, Hamming, Reed-Muller, Golay) as well as LDPC, polar, and convolutional codes — and evaluates them against a noisy channel supplied as a conditional Distribution p(Y|X).
>>> from dit.coding import hamming
>>> code = hamming(3)
>>> code.length, code.dimension, code.minimum_distance()
(7, 4, 3)
>>> bsc = dit.example_channels.binary_symmetric_channel(0.05)
>>> float(code.probability_of_error(bsc, method='exact'))
0.04438054218749993
The dit.example_channels module is a catalog of canonical discrete memoryless channels (binary symmetric, binary erasure, Z-channel, q-ary symmetric/erasure, noisy typewriter, …). Each constructor returns a conditional Distribution p(Y|X) ready for dit.algorithms.channel_capacity or the coding layer above.
>>> from dit.algorithms import channel_capacity
>>> bec = dit.example_channels.binary_erasure_channel(0.25)
>>> float(channel_capacity(bec)[0])
0.75
Contributions & Help
If you’d like to feature added to dit, please file an issue. Or, better yet, open a pull request. Ideally, all code should be tested and documented, but please don’t let this be a barrier to contributing. We’ll work with you to ensure that all pull requests are in a mergable state.
If you’d like to get in contact about anything, you can reach us through our slack channel.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dit-2.1.tar.gz.
File metadata
- Download URL: dit-2.1.tar.gz
- Upload date:
- Size: 475.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2f3c8267e95c48309ad41703f25d417fb40a6ccb4cbf83f215fa9a9ed99e048
|
|
| MD5 |
6e5b1a3e4d8f15ba07866cef19d7b3bc
|
|
| BLAKE2b-256 |
e74976c41727a2b328c0994d8f92a6289052d94f81ee3f0eee9178884c449d48
|
Provenance
The following attestation bundles were made for dit-2.1.tar.gz:
Publisher:
publish.yml on dit/dit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dit-2.1.tar.gz -
Subject digest:
e2f3c8267e95c48309ad41703f25d417fb40a6ccb4cbf83f215fa9a9ed99e048 - Sigstore transparency entry: 1935103421
- Sigstore integration time:
-
Permalink:
dit/dit@84d1596b5f5f43a238e8a70c8029522779f6190b -
Branch / Tag:
refs/tags/v2.1 - Owner: https://github.com/dit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84d1596b5f5f43a238e8a70c8029522779f6190b -
Trigger Event:
release
-
Statement type:
File details
Details for the file dit-2.1-py3-none-any.whl.
File metadata
- Download URL: dit-2.1-py3-none-any.whl
- Upload date:
- Size: 474.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cde420e6ce7d2ab96480254f67396521dc778f19e83c7445fe4d415f0d4898d
|
|
| MD5 |
49bd55ade475bf10d6843a6d7545f6a6
|
|
| BLAKE2b-256 |
b50ef2a6cafb83504c28b55d0ddc19bb755339694231f394b5f8e843c1e6307d
|
Provenance
The following attestation bundles were made for dit-2.1-py3-none-any.whl:
Publisher:
publish.yml on dit/dit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dit-2.1-py3-none-any.whl -
Subject digest:
2cde420e6ce7d2ab96480254f67396521dc778f19e83c7445fe4d415f0d4898d - Sigstore transparency entry: 1935103493
- Sigstore integration time:
-
Permalink:
dit/dit@84d1596b5f5f43a238e8a70c8029522779f6190b -
Branch / Tag:
refs/tags/v2.1 - Owner: https://github.com/dit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@84d1596b5f5f43a238e8a70c8029522779f6190b -
Trigger Event:
release
-
Statement type: