Skip to main content

No project description provided

Project description

NAM_Entropy

Reproducible research code for computing and analyzing entropy-based metrics and related experiments. Built with Python 3.12, Poetry, and Jupyter Notebooks for a clean, portable workflow.

Python 3.12 Poetry Tests License: MIT

Table of Contents

Overview

NAM_Entropy provides utilities and experiments for entropy-related analysis. Typical use cases include:

  • Computing Shannon entropy, cross-entropy, KL-divergence, and related information-theoretic measures.
  • Understanding the efficacy of sentence-level and token-level embeddings in a model by computing its information, variation, regularity, disentanglement.

Our target audience is:

  • Cognitive Scientists who want an easy way to understand information-theoretic metrics for the information that a Nerual Network model carry about token and sentences via their embeddings.
  • Maching Learning researchers who are interested in using parts of our efficient implementation in their workflow to understand the entropy and mutual information in their latent space model representations.
  • Other researchers interested in computing entropy and information theory to their work.

Installation

To install the package dependencies, you can use either poetry or conda after changing to the package directory (containing pyproject.toml and environment.yml).

Install with Poetry

poetry install

(This creates a poetry virtual environment: nam-entropy-<hash>.)

Install with Conda

conda env create -f environment.yml

(This creates the conda virtual environment: nam-entropy-conda.)

Project Structure

nam_entropy/
├─ pyproject.toml
├─ poetry.lock
├─ README.md
├─ LICENSE
├─ .gitignore
├─ Makefile                           # optional convenience commands
├─ .env.example                       # example environment variables
├─ notebooks/
│  ├─ A. Automatic entropy and information calculations.ipynb
│  ├─ B. Interactive entropy and information calculations.ipynb
│  └─ C. Programmatic entropy and information calculations.ipynb
├─ src/
│  └─ nam_entropy/
│     ├─ __init__.py
│     ├─ make_data.py
│     ├─ data_prep.py
│     ├─ bin_distribution_plots.py
│     ├─ integrated_distribution_2d_sampler.py
│     └─ h.py
└─ tests/

Soft Entropy Estimates

This package uses the soft entropy estimation methodology in Conklin (2025), Section 5.4 to provide a discrete approximation to differentiable entropy computation. This approach enables gradient-based optimization while maintaining the interpretability of traditional entropy measures. The key insight is mapping continuous neural representations to probability distributions over learned bin centers, preserving both representational nuances and computational tractability.

Metrics

1. Entropy

We compute the entropy $H(Z)$ of our given vector-valued distribution $Z$ by creating an associated soft-binned probability distrubtion $\mathrm{SoftBin}(Z)$ on a finite set of points randomly associated to the given dataset, and then compute the Shannon entropy $H(\mathrm{SoftBin}(Z))$ of this finite distribution.

For more details on Shannon entropy, see Wikipedia: Entropy or Science Direct: Shannon Entropy.

2. Conditional Entropy

We also can compute the conditional entropy of the given vector-valued distribution $Z$ with respect to a given discrete labelling of $Z$. We think of this as a (categorical) label random variable $L$ valued in the finite set of labels that shares a common probability space with $Z$. Then we compute the conditional entropy $H(Z|L)$ of $Z$ conditioned on (knowing) $L$ by the usual formula

$$H(Z|L) = \sum_{l \in L} p(l) H(Z|L=l)$$

where the entropy conditioned on the particular label value $L = l$ is given by

$$H(Z|L=l) = -\sum_{z \in Z} p(z|l) \log p(z|l),$$

and where

  • $Z$ := Population data distribution (continuous vector-valued RV),
  • $L$ := Sub-population label (categorical RV).

For more details on condtional entropy, see Wikipedia: Conditional Entropy or Science Direct: Conditional Entropy.

3. Mutual Information

The mutual information describes the amount of informatino that of two random variables give about each other. By definition, the mutual information $I(Z, L)$ of the two random variables $Z$ and $L$ is given by

$$I(Z, L) := H(Z) - H(Z|L),$$

where

  • $Z$ := Population data distribution (continuous vector-valued RV),
  • $L$ := Sub-population label (categorical RV).

For more details on mutual information, see Wikipedia: Mutual Information or Science Direct: Mutual Information.

References

Primary Theoretical Framework

Conklin, H. (2025). Information Structure in Mappings: An Approach to Learning, Representation, and Generalisation. PhD Thesis, University of Edinburgh. arXiv:2505.23960. https://arxiv.org/abs/2505.23960

For the mathematical foundations of soft entropy estimation used in this project, see Section 5.4: "Soft Entropy Estimation" (pp. 94-99), which provides the theoretical basis for the continuous-to-discrete entropy approximation methods implemented in our codebase.

Related Publications

Conklin, H., & Smith, K. (2024). Representations as Language: An Information-Theoretic Framework for Interpretability. International Meeting of the Cognitive Science Society. arXiv:2406.02449. https://arxiv.org/abs/2406.02449

Conklin, H., & Smith, K. (2024). Compositionality With Variation Reliably Emerges in Neural Networks. International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=-Yzz6vlX7V-

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nam_entropy-0.1.0.tar.gz (42.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nam_entropy-0.1.0-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file nam_entropy-0.1.0.tar.gz.

File metadata

  • Download URL: nam_entropy-0.1.0.tar.gz
  • Upload date:
  • Size: 42.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.0 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for nam_entropy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9cf15fe6093b82c191c92781957b6f5c7cfe8b51d8ac216a98a3f658b47fda7a
MD5 b9ad59df53515090352209fe9468b38a
BLAKE2b-256 a1709b34659cdaaa7c47fabe7299e886b1b4e1ace575acd13cebe46f50af5573

See more details on using hashes here.

File details

Details for the file nam_entropy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nam_entropy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.0 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for nam_entropy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c7505e197f03bff32a206fba068662ecad0aa5e98add18bb4be9cb75d716a09
MD5 a51e3549466debd3e250a0144ba2d525
BLAKE2b-256 086ac521a3b208f8eed8dbad4c771ce971afee651c272d0f205a95e6321cddb6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page