Information Theoretic Measures of Entropy and Divergence
Project description
Divergence
The Dissolution of Uncertainty — One Bit at a Time
Why Divergence?
In the summer of 1948, a 32-year-old mathematician at Bell Labs named Claude Shannon published a paper that quietly changed the world. "A Mathematical Theory of Communication" gave humanity something it had never had before: a rigorous, mathematical definition of information. Shannon showed that uncertainty could be measured, that communication had fundamental limits, and that a single quantity — entropy — sat at the heart of it all.
What followed was an explosion of ideas. In 1951, Solomon Kullback and Richard Leibler — working as cryptanalysts at the NSA — formalized the concept of relative entropy, measuring how one probability distribution diverges from another. In 1961, Alfréd Rényi generalized Shannon's entropy into a family parameterized by a single number, revealing that information has not one but infinitely many faces. In the decades since, mathematicians, statisticians, and computer scientists have built an extraordinary edifice on Shannon's foundation: f-divergences, optimal transport distances, kernel methods, score-based measures — each offering a different lens on the same fundamental question: how different are these two probability distributions?
Divergence puts this entire toolkit into your hands.
It is the most comprehensive information-theoretic package for Python — spanning Shannon measures, f-divergences, Rényi families, integral probability metrics, kNN estimators, score-based divergences, optimal transport, and Bayesian MCMC diagnostics in a single, unified API. Whether you are a Bayesian statistician checking MCMC convergence, a machine learning researcher comparing generative models, a neuroscientist measuring information flow between brain regions, or a student encountering entropy for the first time, Divergence provides the tools you need.
Who is this for?
-
Bayesian practitioners using PyMC, Stan, NumPyro, PyJAGS, or emcee — Divergence integrates directly with ArviZ to provide information-theoretic diagnostics that go far beyond R-hat: information gain, chain convergence testing, Bayesian surprise, uncertainty decomposition, and the only diagnostic that tests convergence to the correct target (kernel Stein discrepancy)
-
Machine learning researchers — compare distributions with energy distance, MMD, Wasserstein, or Sinkhorn divergence; run formal two-sample tests; measure feature dependence with mutual information and total correlation
-
Scientists and engineers — detect causal information flow with transfer entropy, measure multivariate dependence, quantify distributional shift in monitoring systems
-
Students and educators — six interactive notebooks with historical narrative, from Shannon's foundations through modern optimal transport, including an end-to-end Bayesian detective story using real hydrological data
Everything in one place
Divergence brings together measures that are otherwise scattered across different packages, subfields, and textbooks — Shannon measures, f-divergences, Rényi families, integral probability metrics, kNN estimators, score-based divergences, optimal transport, and Bayesian diagnostics — in a single, coherent API. Discrete and continuous, sample-based and density-based, with configurable units (nats/bits/hartleys), Numba-accelerated performance at scale, and seamless ArviZ integration for Bayesian workflows.
What You Can Compute
Shannon Measures — The Foundation
Claude Shannon (1948), Solomon Kullback & Richard Leibler (1951)
| Measure | Function | What it tells you |
|---|---|---|
| Entropy | entropy(sample) |
How much uncertainty a distribution carries |
| Cross Entropy | cross_entropy(p, q) |
The cost of encoding P using Q's code |
| KL Divergence | kl_divergence(p, q) |
Information lost when approximating P with Q |
| Jensen-Shannon | jensen_shannon_divergence(p, q) |
Symmetric, bounded distributional difference |
| Mutual Information | mutual_information(x, y) |
How much knowing X tells you about Y |
| Joint Entropy | joint_entropy(x, y) |
Total uncertainty in a pair of variables |
| Conditional Entropy | conditional_entropy(x, y) |
Remaining uncertainty after observing the other |
All support discrete=True/False and base=np.e (nats) / 2 (bits) / 10 (hartleys).
f-Divergences — The Unifying Family
Imre Csiszár (1963), Shun-ichi Amari (1985)
| Measure | Function | Properties |
|---|---|---|
| Total Variation | total_variation_distance(p, q) |
Symmetric, bounded [0, 1], true metric |
| Squared Hellinger | squared_hellinger_distance(p, q) |
Symmetric, bounded [0, 2], robust to outliers |
| Chi-Squared | chi_squared_divergence(p, q) |
Asymmetric, unbounded, classical goodness-of-fit |
| Jeffreys | jeffreys_divergence(p, q) |
Symmetric KL (sum of both directions) |
| Cressie-Read | cressie_read_divergence(p, q, lambda_param) |
Parameterized family unifying KL, chi², Hellinger |
| General f-divergence | f_divergence(p, q, f=...) |
Any convex generator function |
Rényi Family — The Alpha Telescope
Alfréd Rényi (1961)
| Measure | Function | Special cases |
|---|---|---|
| Rényi Entropy | renyi_entropy(x, alpha) |
α→0: Hartley, α→1: Shannon, α=2: collision, α→∞: min-entropy |
| Rényi Divergence | renyi_divergence(p, q, alpha) |
α→1: KL divergence, monotonically non-decreasing in α |
Integral Probability Metrics — Geometry on Distributions
Leonid Kantorovich (1942), Gábor Székely (2004), Arthur Gretton (2006)
| Measure | Function | Key advantage |
|---|---|---|
| Energy Distance | energy_distance(p, q) |
No hyperparameters, works in any dimension |
| Wasserstein | wasserstein_distance(p, q, p=1) |
True metric, interpretable units |
| Sliced Wasserstein | sliced_wasserstein_distance(p, q) |
Scales to high dimensions via random projections |
| MMD | maximum_mean_discrepancy(p, q) |
Kernel-based, consistent against all alternatives |
kNN Estimators — Density-Free Information Theory
Kozachenko & Leonenko (1987), Kraskov, Stögbauer & Grassberger (2004)
| Measure | Function | Key advantage |
|---|---|---|
| kNN Entropy | knn_entropy(x, k=5) |
Scales gracefully to high dimensions |
| kNN KL Divergence | knn_kl_divergence(p, q, k=5) |
No density estimation needed |
| KSG Mutual Information | ksg_mutual_information(x, y, k=5) |
Detects all dependence, linear and nonlinear |
Multivariate Dependence — Beyond Pairwise
Satosi Watanabe (1960), Marina Meilă (2003)
| Measure | Function | What it measures |
|---|---|---|
| Total Correlation | total_correlation(samples) |
Total redundancy among d ≥ 2 variables |
| Normalized MI | normalized_mutual_information(x, y) |
MI on a [0, 1] scale for comparison |
| Variation of Information | variation_of_information(x, y) |
True metric on partitions (triangle inequality) |
Causal and Temporal — The Arrow of Information
Thomas Schreiber (2000)
| Measure | Function | What it detects |
|---|---|---|
| Transfer Entropy | transfer_entropy(source, target) |
Directed information flow between time series |
Score-Based Measures — Slopes Instead of Heights
R. A. Fisher (1925), Qiang Liu, Jason Lee & Michael Jordan (2016), Jackson Gorham & Lester Mackey (2017)
| Measure | Function | Key advantage |
|---|---|---|
| Fisher Divergence | fisher_divergence(p, score_q) |
Compares score functions, no normalizing constant |
| Kernel Stein Discrepancy | kernel_stein_discrepancy(x, score) |
Goodness-of-fit without computing Z (RBF + IMQ kernels) |
Optimal Transport — The Cost of Rearrangement
Marco Cuturi (2013), Aude Genevay (2018)
| Measure | Function | Key advantage |
|---|---|---|
| Sinkhorn Divergence | sinkhorn_divergence(p, q) |
Fast, differentiable optimal transport |
Two-Sample Testing — Is the Difference Real?
Ronald Fisher (1930s), Arthur Gretton (2012)
| Function | What it does |
|---|---|
two_sample_test(p, q, method="mmd") |
Permutation test with calibrated p-values (MMD, energy, kNN methods) |
Bayesian MCMC Diagnostics — The ArviZ Companion
Dennis Lindley (1956), Andrew Gelman & Donald Rubin (1992)
| Function | What it answers |
|---|---|
information_gain(idata) |
How much did the data update our beliefs? |
chain_divergence(idata) |
Are chains sampling the same distribution? |
chain_ksd(idata, score_fn) |
Have chains converged to the correct target? |
chain_two_sample_test(idata) |
Formal p-values for chain homogeneity |
mixing_diagnostic(idata) |
Has each chain reached stationarity? |
bayesian_surprise(idata) |
Which observations are most unexpected? |
uncertainty_decomposition(idata) |
How much is noise vs. parameter uncertainty? |
prior_sensitivity(idata, ref) |
Does the conclusion depend on the prior? |
model_divergence(idata1, idata2) |
How different are two models' predictions? |
Works with PyMC, Stan, NumPyro, PyJAGS, emcee — any package that produces ArviZ InferenceData.
Performance
For large-scale computations (n ≥ 5,000), energy distance, MMD, and KSD automatically use Numba JIT-compiled kernels with O(1) memory and multicore parallelism — 10-18x faster than vectorized implementations and enabling n=50,000+ (which previously exhausted memory).
| Function | n=5K | n=20K | n=50K |
|---|---|---|---|
| Energy distance | 21ms | 276ms | 1.4s |
| MMD | 136ms | 1.9s | 12s |
| KSD | 114ms | 1.4s | 8.6s |
Installation
pip install divergence
For Bayesian diagnostics with ArviZ:
pip install "divergence[bayesian]"
Quick Start
import numpy as np
from divergence import entropy, kl_divergence, two_sample_test
rng = np.random.default_rng(42)
p = rng.normal(0, 1, 5000)
q = rng.normal(0.5, 1.2, 5000)
# How much uncertainty?
h = entropy(p)
# How different are these distributions?
kl = kl_divergence(p, q)
# Is the difference statistically significant?
result = two_sample_test(p, q, method="energy", n_permutations=500)
print(f"p-value: {result.p_value:.4f}")
Tutorials
Six interactive notebooks form a progressive learning path, with historical narrative and visualizations throughout:
| # | Notebook | Topics |
|---|---|---|
| 1 | Shannon's Foundations | Entropy, KL divergence, mutual information, the information-theoretic web |
| 2 | Beyond KL | f-divergences, Cressie-Read continuum, Rényi family |
| 3 | Distances & Testing | Wasserstein, energy, MMD, kNN estimators, permutation tests |
| 4 | Dependence & Causality | Total correlation, variation of information, transfer entropy |
| 5 | Scores & Transport | Fisher divergence, kernel Stein discrepancy, Sinkhorn |
| 6 | The Nile's Secret | End-to-end Bayesian inference with emcee — a detective story |
Documentation
Full API reference and rendered tutorials at michaelnowotny.github.io/divergence.
Development
git clone https://github.com/michaelnowotny/divergence.git
cd divergence
uv venv .venv --python 3.12 && source .venv/bin/activate
uv pip install -e ".[dev]"
make test # Run 345 tests
make lint # Ruff check + format
make docs-serve # Live documentation preview
References
- Shannon, C. E. (1948). "A Mathematical Theory of Communication." Bell System Technical Journal, 27(3), 379-423.
- Kullback, S. & Leibler, R. A. (1951). "On Information and Sufficiency." Annals of Mathematical Statistics, 22(1), 79-86.
- Rényi, A. (1961). "On Measures of Entropy and Information." Proc. 4th Berkeley Symposium, 1, 547-561.
- Csiszár, I. (1963). "Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizitat von Markoffschen Ketten." Magyar Tud. Akad. Mat. Kutato Int. Kozl., 8, 85-108.
- Gretton, A. et al. (2012). "A Kernel Two-Sample Test." JMLR, 13, 723-773.
- Kraskov, A., Stögbauer, H. & Grassberger, P. (2004). "Estimating Mutual Information." Physical Review E, 69(6), 066138.
- Gorham, J. & Mackey, L. (2017). "Measuring Sample Quality with Kernels." ICML.
- Peyré, G. & Cuturi, M. (2019). Computational Optimal Transport. Foundations and Trends in Machine Learning.
- Cover, T. M. & Thomas, J. A. (2006). Elements of Information Theory, 2nd edition. Wiley.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file divergence-1.6.0.tar.gz.
File metadata
- Download URL: divergence-1.6.0.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22fa5e9190dc664110812b59abf788c212ef3c9c47b3c6dd80cfa2ed103f4dd9
|
|
| MD5 |
e26cf5c272d022d652d9ac2b680b3f66
|
|
| BLAKE2b-256 |
4f6b12a163f1872e6fe3101e7e234608080db696e4933577056d8d1c20a9082d
|
Provenance
The following attestation bundles were made for divergence-1.6.0.tar.gz:
Publisher:
release.yml on michaelnowotny/divergence
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
divergence-1.6.0.tar.gz -
Subject digest:
22fa5e9190dc664110812b59abf788c212ef3c9c47b3c6dd80cfa2ed103f4dd9 - Sigstore transparency entry: 1128291385
- Sigstore integration time:
-
Permalink:
michaelnowotny/divergence@4958d1d8f31fbd45e6395af385c194521fc37c6d -
Branch / Tag:
refs/tags/1.6.0 - Owner: https://github.com/michaelnowotny
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4958d1d8f31fbd45e6395af385c194521fc37c6d -
Trigger Event:
push
-
Statement type:
File details
Details for the file divergence-1.6.0-py3-none-any.whl.
File metadata
- Download URL: divergence-1.6.0-py3-none-any.whl
- Upload date:
- Size: 64.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35e899bac034f135ae074992dd4a78f17bb35fb11c93011247bc383522156c14
|
|
| MD5 |
4b95fef06fd9ed3771cfa370726a6caa
|
|
| BLAKE2b-256 |
581b8a3695ed07e96011be908c6e2815629bbfd0502f01141a98135b8b5c07c4
|
Provenance
The following attestation bundles were made for divergence-1.6.0-py3-none-any.whl:
Publisher:
release.yml on michaelnowotny/divergence
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
divergence-1.6.0-py3-none-any.whl -
Subject digest:
35e899bac034f135ae074992dd4a78f17bb35fb11c93011247bc383522156c14 - Sigstore transparency entry: 1128291447
- Sigstore integration time:
-
Permalink:
michaelnowotny/divergence@4958d1d8f31fbd45e6395af385c194521fc37c6d -
Branch / Tag:
refs/tags/1.6.0 - Owner: https://github.com/michaelnowotny
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4958d1d8f31fbd45e6395af385c194521fc37c6d -
Trigger Event:
push
-
Statement type: