Skip to main content

Information theory related estimators

Project description

infopy-estimators

Python PyPI License: MIT

A comprehensive Python library for estimating mutual information (MI) and entropy in discrete, continuous, and mixed random variables. Built with a focus on performance and ease of use, infopy-estimators provides implementations of various information-theoretic estimators.

Features

  • Multiple MI Estimators: Support for discrete-discrete, continuous-discrete, continuous-continuous, and mixed variable types
  • Multidimensional Support: All estimators handle multidimensional random vectors (X, Y ∈ ℝⁿ)
  • Pointwise MI: Calculate mutual information per sample, not just averages
  • Conditional MI: Compute conditional mutual information I(X;Y|Z)
  • Automatic Estimator Selection: Let the library choose the best estimator based on your data types
  • Entropy Estimation: Dedicated entropy estimators for continuous and discrete variables
  • Type-Safe: Full type hints and mypy compliance

Installation

Using pip

pip install infopy-estimators

Using uv (recommended for development)

uv add infopy-estimators

From source

git clone https://github.com/jurrutiag/infopy.git
cd infopy
uv sync  # or pip install -e .

Quick Start

Basic Usage

import numpy as np
from infopy import get_mi_estimator

# Generate sample data
X = np.random.randn(1000, 2)  # Continuous 2D variable
Y = np.random.randn(1000, 1)  # Continuous 1D variable

# Automatically select appropriate estimator
estimator = get_mi_estimator(x_type="continuous", y_type="continuous")

# Estimate mutual information
mi = estimator.estimate(X, Y)
print(f"Mutual Information: {mi:.4f}")

Pointwise Mutual Information

# Get MI for each sample instead of average
estimator = get_mi_estimator(x_type="continuous", y_type="continuous", pointwise_suited=True)
pointwise_mi = estimator.estimate(X, Y, pointwise=True)
print(f"Pointwise MI shape: {pointwise_mi.shape}")  # (1000,)

Mixed Variable Types

# Continuous and discrete variables
X_continuous = np.random.randn(1000, 3)
Y_discrete = np.random.randint(0, 5, size=(1000, 1))

estimator = get_mi_estimator(x_type="continuous", y_type="discrete")
mi = estimator.estimate(X_continuous, Y_discrete)
print(f"Continuous-Discrete MI: {mi:.4f}")

Conditional Mutual Information

# Compute I(X;Y|Z)
X = np.random.randn(1000, 2)
Y = np.random.randn(1000, 2)
Z = np.random.randn(1000, 1)

estimator = get_mi_estimator(x_type="continuous", y_type="continuous")
cmi = estimator.estimate_conditional(X, Y, Z)
print(f"Conditional MI: {cmi:.4f}")

Entropy Estimation

from infopy import ContinuousEntropyEstimator, DiscreteEntropyEstimator

# Continuous entropy
X_continuous = np.random.randn(1000, 3)
cont_estimator = ContinuousEntropyEstimator()
entropy = cont_estimator.estimate(X_continuous)
print(f"Continuous Entropy: {entropy:.4f}")

# Discrete entropy
X_discrete = np.random.randint(0, 10, size=(1000, 2))
disc_estimator = DiscreteEntropyEstimator()
entropy = disc_estimator.estimate(X_discrete)
print(f"Discrete Entropy: {entropy:.4f}")

Available Estimators

Mutual Information Estimators

Estimator Variable Types Method Reference
DDMIEstimator Discrete-Discrete Maximum likelihood PMF estimation -
CDMIRossEstimator Continuous-Discrete Ross method Ross (2014) [1]
CDMIEntropyBasedEstimator Continuous-Discrete Kozachenko-Leonenko entropy -
CCMIEstimator Continuous-Continuous Kraskov estimator Kraskov et al. (2004) [2]
MixedMIEstimator Mixed types Gao estimator (experimental) Gao et al. (2018) [3]

Entropy Estimators

Estimator Variable Type Method
ContinuousEntropyEstimator Continuous Kozachenko-Leonenko k-NN
DiscreteEntropyEstimator Discrete Maximum likelihood

Automatic Estimator Selection

The get_mi_estimator() function automatically selects the appropriate estimator:

from infopy import get_mi_estimator

# For continuous-continuous variables
estimator = get_mi_estimator(x_type="continuous", y_type="continuous")

# For discrete-discrete variables
estimator = get_mi_estimator(x_type="discrete", y_type="discrete")

# For mixed types
estimator = get_mi_estimator(x_type="continuous", y_type="discrete")

# For pointwise MI calculation
estimator = get_mi_estimator(x_type="continuous", y_type="continuous", pointwise_suited=True)

API Reference

BaseMIEstimator

All MI estimators inherit from BaseMIEstimator and implement:

class BaseMIEstimator:
    def estimate(self, X: np.ndarray, Y: np.ndarray, pointwise: bool = False) -> Union[float, np.ndarray]:
        """Estimate mutual information between X and Y."""

    def estimate_conditional(self, X: np.ndarray, Y: np.ndarray, Z: np.ndarray) -> float:
        """Estimate conditional mutual information I(X;Y|Z)."""

Parameters

  • X, Y: Input arrays of shape (n_samples, n_features)
  • pointwise: If True, return MI for each sample; if False, return average MI
  • Z: Conditioning variable for conditional MI

Returns

  • estimate(): Float (average MI) or array of shape (n_samples,) (pointwise MI)
  • estimate_conditional(): Float (conditional MI)

Advanced Usage

Custom k-NN Parameters

from infopy import CCMIEstimator

# Use custom k for k-NN estimation
estimator = CCMIEstimator(k=5)  # Default is k=3
mi = estimator.estimate(X, Y)

Handling High-Dimensional Data

# For high-dimensional continuous data
X = np.random.randn(1000, 50)  # 50-dimensional
Y = np.random.randn(1000, 30)  # 30-dimensional

# CCMIEstimator handles high dimensions well
estimator = CCMIEstimator(k=10)  # Increase k for stability
mi = estimator.estimate(X, Y)

Development

Setup Development Environment

# Clone the repository
git clone https://github.com/jurrutiag/infopy.git
cd infopy

# Install with development dependencies
uv sync --extra testing

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=infopy

# Run specific test
uv run pytest tests/test_estimators.py::TestCCMIEstimator

Code Quality

# Format code
uv run ruff format

# Lint code
uv run ruff check

# Type checking
uv run mypy src/

Contributing

We welcome contributions! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Ensure all tests pass and code is formatted
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Development Standards

  • Formatter: ruff format (100 char line length)
  • Linter: ruff with comprehensive rules
  • Type Checker: mypy with strict settings
  • Test Framework: pytest with coverage
  • Package Manager: uv

Citation

If you use this library in your research, please cite:

@software{infopy_estimators,
  author = {Urrutia, Juan},
  title = {infopy-estimators: Information Theory Estimators for Python},
  url = {https://github.com/jurrutiag/infopy},
  year = {2024}
}

References

  1. B. C. Ross "Mutual Information between Discrete and Continuous Data Sets". PLoS ONE 9(2), 2014.
  2. A. Kraskov, H. Stogbauer and P. Grassberger, "Estimating mutual information". Phys. Rev. E 69, 2004.
  3. Gao, Weihao, et al. "Estimating Mutual Information for Discrete-Continuous Mixtures". NeurIPS, 2018.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infopy_estimators-0.1.3.tar.gz (76.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

infopy_estimators-0.1.3-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file infopy_estimators-0.1.3.tar.gz.

File metadata

  • Download URL: infopy_estimators-0.1.3.tar.gz
  • Upload date:
  • Size: 76.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.4

File hashes

Hashes for infopy_estimators-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5c4b5573884eaaf018cd4afb8f6bfb52c2778a5ce5fc4d706c97585d03c5bba5
MD5 54740ae19094e7f66dc4c52850120b23
BLAKE2b-256 36936d5612e248a22001c266b0950de2335cf0083c4731ad60b32bdc26e01403

See more details on using hashes here.

File details

Details for the file infopy_estimators-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for infopy_estimators-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6f10bf7ee22f146edbc1b161c8b2d15d70b439624440756f71676f566eb98ec8
MD5 644daaf66f1fa6bd90c8cba82130e7e7
BLAKE2b-256 8a38d56d7f6474d232c988affeef95764c8eedcbaac61fadce87586418a6d6b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page