Skip to main content

Information Theoretic Measures of Entropy and Divergence

Project description

Divergence

Divergence

The Dissolution of Uncertainty — One Bit at a Time

Tests PyPI Python License: MIT Docs


Why Divergence?

In the summer of 1948, a 32-year-old mathematician at Bell Labs named Claude Shannon published a paper that quietly changed the world. "A Mathematical Theory of Communication" gave humanity something it had never had before: a rigorous, mathematical definition of information. Shannon showed that uncertainty could be measured, that communication had fundamental limits, and that a single quantity — entropy — sat at the heart of it all.

What followed was an explosion of ideas. In 1951, Solomon Kullback and Richard Leibler — working as cryptanalysts at the NSA — formalized the concept of relative entropy, measuring how one probability distribution diverges from another. In 1961, Alfréd Rényi generalized Shannon's entropy into a family parameterized by a single number, revealing that information has not one but infinitely many faces. In the decades since, mathematicians, statisticians, and computer scientists have built an extraordinary edifice on Shannon's foundation: f-divergences, optimal transport distances, kernel methods, score-based measures — each offering a different lens on the same fundamental question: how different are these two probability distributions?

Divergence puts this entire toolkit into your hands.

It is the most comprehensive information-theoretic package for Python — spanning Shannon measures, f-divergences, Rényi families, integral probability metrics, kNN estimators, score-based divergences, optimal transport, and Bayesian MCMC diagnostics in a single, unified API. Whether you are a Bayesian statistician checking MCMC convergence, a machine learning researcher comparing generative models, a neuroscientist measuring information flow between brain regions, or a student encountering entropy for the first time, Divergence provides the tools you need.

Who is this for?

  • Bayesian practitioners using PyMC, Stan, NumPyro, PyJAGS, or emcee — Divergence integrates directly with ArviZ to provide information-theoretic diagnostics that go far beyond R-hat: information gain, chain convergence testing, Bayesian surprise, uncertainty decomposition, and the only diagnostic that tests convergence to the correct target (kernel Stein discrepancy)

  • Machine learning researchers — compare distributions with energy distance, MMD, Wasserstein, or Sinkhorn divergence; run formal two-sample tests; measure feature dependence with mutual information and total correlation

  • Scientists and engineers — detect causal information flow with transfer entropy, measure multivariate dependence, quantify distributional shift in monitoring systems

  • Students and educators — six interactive notebooks with historical narrative, from Shannon's foundations through modern optimal transport, including an end-to-end Bayesian detective story using real hydrological data

Everything in one place

Divergence brings together measures that are otherwise scattered across different packages, subfields, and textbooks — Shannon measures, f-divergences, Rényi families, integral probability metrics, kNN estimators, score-based divergences, optimal transport, and Bayesian diagnostics — in a single, coherent API. Discrete and continuous, sample-based and density-based, with configurable units (nats/bits/hartleys), Numba-accelerated performance at scale, and seamless ArviZ integration for Bayesian workflows.


What You Can Compute

Shannon Measures — The Foundation

Claude Shannon (1948), Solomon Kullback & Richard Leibler (1951)

Measure Function What it tells you
Entropy entropy(sample) How much uncertainty a distribution carries
Cross Entropy cross_entropy(p, q) The cost of encoding P using Q's code
KL Divergence kl_divergence(p, q) Information lost when approximating P with Q
Jensen-Shannon jensen_shannon_divergence(p, q) Symmetric, bounded distributional difference
Mutual Information mutual_information(x, y) How much knowing X tells you about Y
Joint Entropy joint_entropy(x, y) Total uncertainty in a pair of variables
Conditional Entropy conditional_entropy(x, y) Remaining uncertainty after observing the other

All support discrete=True/False and base=np.e (nats) / 2 (bits) / 10 (hartleys).

f-Divergences — The Unifying Family

Imre Csiszár (1963), Shun-ichi Amari (1985)

Measure Function Properties
Total Variation total_variation_distance(p, q) Symmetric, bounded [0, 1], true metric
Squared Hellinger squared_hellinger_distance(p, q) Symmetric, bounded [0, 2], robust to outliers
Chi-Squared chi_squared_divergence(p, q) Asymmetric, unbounded, classical goodness-of-fit
Jeffreys jeffreys_divergence(p, q) Symmetric KL (sum of both directions)
Cressie-Read cressie_read_divergence(p, q, lambda_param) Parameterized family unifying KL, chi², Hellinger
General f-divergence f_divergence(p, q, f=...) Any convex generator function

Rényi Family — The Alpha Telescope

Alfréd Rényi (1961)

Measure Function Special cases
Rényi Entropy renyi_entropy(x, alpha) α→0: Hartley, α→1: Shannon, α=2: collision, α→∞: min-entropy
Rényi Divergence renyi_divergence(p, q, alpha) α→1: KL divergence, monotonically non-decreasing in α

Integral Probability Metrics — Geometry on Distributions

Leonid Kantorovich (1942), Gábor Székely (2004), Arthur Gretton (2006)

Measure Function Key advantage
Energy Distance energy_distance(p, q) No hyperparameters, works in any dimension
Wasserstein wasserstein_distance(p, q, p=1) True metric, interpretable units
Sliced Wasserstein sliced_wasserstein_distance(p, q) Scales to high dimensions via random projections
MMD maximum_mean_discrepancy(p, q) Kernel-based, consistent against all alternatives

kNN Estimators — Density-Free Information Theory

Kozachenko & Leonenko (1987), Kraskov, Stögbauer & Grassberger (2004)

Measure Function Key advantage
kNN Entropy knn_entropy(x, k=5) Scales gracefully to high dimensions
kNN KL Divergence knn_kl_divergence(p, q, k=5) No density estimation needed
KSG Mutual Information ksg_mutual_information(x, y, k=5) Detects all dependence, linear and nonlinear

Multivariate Dependence — Beyond Pairwise

Satosi Watanabe (1960), Marina Meilă (2003)

Measure Function What it measures
Total Correlation total_correlation(samples) Total redundancy among d ≥ 2 variables
Normalized MI normalized_mutual_information(x, y) MI on a [0, 1] scale for comparison
Variation of Information variation_of_information(x, y) True metric on partitions (triangle inequality)

Causal and Temporal — The Arrow of Information

Thomas Schreiber (2000)

Measure Function What it detects
Transfer Entropy transfer_entropy(source, target) Directed information flow between time series

Score-Based Measures — Slopes Instead of Heights

R. A. Fisher (1925), Qiang Liu, Jason Lee & Michael Jordan (2016), Jackson Gorham & Lester Mackey (2017)

Measure Function Key advantage
Fisher Divergence fisher_divergence(p, score_q) Compares score functions, no normalizing constant
Kernel Stein Discrepancy kernel_stein_discrepancy(x, score) Goodness-of-fit without computing Z (RBF + IMQ kernels)

Optimal Transport — The Cost of Rearrangement

Marco Cuturi (2013), Aude Genevay (2018)

Measure Function Key advantage
Sinkhorn Divergence sinkhorn_divergence(p, q) Fast, differentiable optimal transport

Two-Sample Testing — Is the Difference Real?

Ronald Fisher (1930s), Arthur Gretton (2012)

Function What it does
two_sample_test(p, q, method="mmd") Permutation test with calibrated p-values (MMD, energy, kNN methods)

Bayesian MCMC Diagnostics — The ArviZ Companion

Dennis Lindley (1956), Andrew Gelman & Donald Rubin (1992)

Function What it answers
information_gain(idata) How much did the data update our beliefs?
chain_divergence(idata) Are chains sampling the same distribution?
chain_ksd(idata, score_fn) Have chains converged to the correct target?
chain_two_sample_test(idata) Formal p-values for chain homogeneity
mixing_diagnostic(idata) Has each chain reached stationarity?
bayesian_surprise(idata) Which observations are most unexpected?
uncertainty_decomposition(idata) How much is noise vs. parameter uncertainty?
prior_sensitivity(idata, ref) Does the conclusion depend on the prior?
model_divergence(idata1, idata2) How different are two models' predictions?

Works with PyMC, Stan, NumPyro, PyJAGS, emcee — any package that produces ArviZ InferenceData.


Performance

For large-scale computations (n ≥ 5,000), energy distance, MMD, and KSD automatically use Numba JIT-compiled kernels with O(1) memory and multicore parallelism — 10-18x faster than vectorized implementations and enabling n=50,000+ (which previously exhausted memory).

Function n=5K n=20K n=50K
Energy distance 21ms 276ms 1.4s
MMD 136ms 1.9s 12s
KSD 114ms 1.4s 8.6s

Installation

pip install divergence

For Bayesian diagnostics with ArviZ:

pip install "divergence[bayesian]"

Quick Start

import numpy as np
from divergence import entropy, kl_divergence, two_sample_test

rng = np.random.default_rng(42)
p = rng.normal(0, 1, 5000)
q = rng.normal(0.5, 1.2, 5000)

# How much uncertainty?
h = entropy(p)

# How different are these distributions?
kl = kl_divergence(p, q)

# Is the difference statistically significant?
result = two_sample_test(p, q, method="energy", n_permutations=500)
print(f"p-value: {result.p_value:.4f}")

Tutorials

Six interactive notebooks form a progressive learning path, with historical narrative and visualizations throughout:

# Notebook Topics
1 Shannon's Foundations Entropy, KL divergence, mutual information, the information-theoretic web
2 Beyond KL f-divergences, Cressie-Read continuum, Rényi family
3 Distances & Testing Wasserstein, energy, MMD, kNN estimators, permutation tests
4 Dependence & Causality Total correlation, variation of information, transfer entropy
5 Scores & Transport Fisher divergence, kernel Stein discrepancy, Sinkhorn
6 The Nile's Secret End-to-end Bayesian inference with emcee — a detective story

Documentation

Full API reference and rendered tutorials at michaelnowotny.github.io/divergence.

Development

git clone https://github.com/michaelnowotny/divergence.git
cd divergence
uv venv .venv --python 3.12 && source .venv/bin/activate
uv pip install -e ".[dev]"

make test          # Run 345 tests
make lint          # Ruff check + format
make docs-serve    # Live documentation preview

References

  1. Shannon, C. E. (1948). "A Mathematical Theory of Communication." Bell System Technical Journal, 27(3), 379-423.
  2. Kullback, S. & Leibler, R. A. (1951). "On Information and Sufficiency." Annals of Mathematical Statistics, 22(1), 79-86.
  3. Rényi, A. (1961). "On Measures of Entropy and Information." Proc. 4th Berkeley Symposium, 1, 547-561.
  4. Csiszár, I. (1963). "Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizitat von Markoffschen Ketten." Magyar Tud. Akad. Mat. Kutato Int. Kozl., 8, 85-108.
  5. Gretton, A. et al. (2012). "A Kernel Two-Sample Test." JMLR, 13, 723-773.
  6. Kraskov, A., Stögbauer, H. & Grassberger, P. (2004). "Estimating Mutual Information." Physical Review E, 69(6), 066138.
  7. Gorham, J. & Mackey, L. (2017). "Measuring Sample Quality with Kernels." ICML.
  8. Peyré, G. & Cuturi, M. (2019). Computational Optimal Transport. Foundations and Trends in Machine Learning.
  9. Cover, T. M. & Thomas, J. A. (2006). Elements of Information Theory, 2nd edition. Wiley.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

divergence-1.7.0.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

divergence-1.7.0-py3-none-any.whl (69.5 kB view details)

Uploaded Python 3

File details

Details for the file divergence-1.7.0.tar.gz.

File metadata

  • Download URL: divergence-1.7.0.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for divergence-1.7.0.tar.gz
Algorithm Hash digest
SHA256 4cf594e6d32896008edf4b34e57c5e6f087fd252e257f4ce32aabd45b27d6681
MD5 89f9d8866f05c2961a76d2da2e2174bf
BLAKE2b-256 eee9591898baecef30fafbfd7863906e692eb68458456d673940c1fa300b49f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for divergence-1.7.0.tar.gz:

Publisher: release.yml on michaelnowotny/divergence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file divergence-1.7.0-py3-none-any.whl.

File metadata

  • Download URL: divergence-1.7.0-py3-none-any.whl
  • Upload date:
  • Size: 69.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for divergence-1.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 153ff3adc4c31fbb651c784ee711d2d947a6648bd96f2c02718831d3fa66a243
MD5 b480875108e2756446156de586272d8a
BLAKE2b-256 b7f22dcee26d50ee1d36161c8678b9ede583884f3137a721edb92d223bb4b8ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for divergence-1.7.0-py3-none-any.whl:

Publisher: release.yml on michaelnowotny/divergence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page