Skip to main content

Causal network discovery using optimal causation entropy

Project description

CausationEntropy

License: MIT Python 3.8+ Documentation Status codecov Tests DOI

A Python library for discovering causal networks from time series data using Optimal Causation Entropy (oCSE).

Overview

CausationEntropy implements state-of-the-art information-theoretic methods for causal discovery from multivariate time series. The library provides robust algorithms that can identify causal relationships while controlling for confounding variables and false discoveries.

What it does

Given time series data, CausationEntropy finds which variables cause changes in other variables by:

  1. Predictive Testing: Testing if knowing variable X at time t helps predict variable Y at time t+1
  2. Information Theory: Using conditional mutual information to measure predictive relationships
  3. Statistical Control: Rigorous statistical testing to avoid false discoveries
  4. Multiple Methods: Supporting various information estimators and discovery algorithms

Installation

From PyPI (recommended)

pip install causationentropy

Development Installation

git clone https://github.com/Center-For-Complex-Systems-Science/causationentropy.git
cd causationentropy
pip install -e .

Quick Start

Basic Usage

import numpy as np
import pandas as pd
from causationentropy import discover_network

# Load your time series data (variables as columns, time as rows)
data = pd.read_csv('data.csv')

# Discover causal network
network = discover_network(data, method='standard', max_lag=5)

Note: This implementation of this algorithm runs in O(n^2 T log T) where N is the number of variables and T is the length of the time series. Application of this algorithm without optimizations is computationally intensive. When running this algorithm, please be patient. Optimizations of the algorithm are planned for a later release that leverage singular value decomposition and KD-Trees. However, these optimizations are not part of the original algorithm. Adding additional lags also contributes to additional performance degradations.

Advanced Configuration

# Configure discovery parameters
network = discover_network(
    data,
    method='standard',          # 'standard', 'alternative', 'information_lasso', or 'lasso'
    information='gaussian',     # 'gaussian', 'knn', 'kde', 'geometric_knn', or 'poisson'
    max_lag=5,                  # Maximum time lag to consider
    alpha_forward=0.05,         # Forward selection significance
    alpha_backward=0.05,        # Backward elimination significance
    n_shuffles=200              # Permutation test iterations
)

Synthetic Data Example

from causationentropy.datasets import synthetic

# Generate synthetic causal time series
data, true_network = synthetic.linear_stochastic_gaussian_process(
    n_variables=5, 
    n_samples=1000, 
    sparsity=0.3
)

# Discover network
discovered = discover_network(data)

Key Features

  • Multiple Algorithms: Standard, alternative, information lasso, and lasso variants of oCSE
  • Flexible Information Estimators: Gaussian, k-NN, KDE, geometric k-NN, and Poisson methods
  • Statistical Rigor: Permutation-based significance testing with comprehensive test coverage
  • Synthetic Data: Built-in generators for testing and validation
  • Visualization: Network plotting and analysis tools
  • Performance: Optimized implementations with parallel processing support

Mathematical Foundation

The algorithm uses conditional mutual information to quantify causal relationships:

$$I(X; Y | Z) = H(X | Z) + H(Y | Z) - H(X, Y | Z)$$

This measures how much variable X tells us about variable Y, beyond what we already know from conditioning set Z.

Causal Discovery Rule: Variable X causes Y if knowing X(t) significantly improves prediction of Y(t+1), even when controlling for all other relevant variables.

The algorithm implements a two-phase approach:

  1. Forward Selection: Iteratively adds predictors that maximize conditional mutual information
  2. Backward Elimination: Removes predictors that lose significance when conditioned on others

Documentation

📚 Read the full documentation on ReadTheDocs

  • API Reference: Complete function and class documentation
  • User Guide: Detailed tutorials and examples
  • Theory: Mathematical background and algorithms
  • Examples: Check the examples/ and notebooks/ directories
  • Research Papers: See the papers/ directory for theoretical foundations

Local Documentation

Build documentation locally:

cd docs/
make html
# Open docs/_build/html/index.html

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Citation

If you use this library in your research, please cite:

   @misc{slote2025causationentropy,
     author  = {Slote, Kevin and Fish, Jeremie and Bollt, Erik},
     title   = {CausationEntropy: A Python Library for Causal Discovery},
     url     = {https://github.com/Center-For-Complex-Systems-Science/causationentropy},
     doi     = {10.5281/zenodo.17047565}
   }

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Acknowledgments

This work builds upon fundamental research in information theory, causal inference, and time series analysis. Special thanks to the open-source scientific Python community.

Original Code

LLM Disclosure

Generative AI was used to help with doc strings, documentation, and unit tests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causationentropy-1.0.0.tar.gz (60.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causationentropy-1.0.0-py3-none-any.whl (66.7 kB view details)

Uploaded Python 3

File details

Details for the file causationentropy-1.0.0.tar.gz.

File metadata

  • Download URL: causationentropy-1.0.0.tar.gz
  • Upload date:
  • Size: 60.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for causationentropy-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a08cbdfceedc9b7c2f1dba17d5b52a5ce4f00d2450c9f95c9b12989f3f0fe6ef
MD5 e49087679181de331cf49a4af8fe9183
BLAKE2b-256 3061b2c273a73117b56303375808ab2cf90e4df59fd01cd954e5ab2b4e64ef8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for causationentropy-1.0.0.tar.gz:

Publisher: release.yml on Center-For-Complex-Systems-Science/causationentropy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file causationentropy-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for causationentropy-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d9b0aaf9e5adf880a87c07fe2953ebdcd6ba441d0fc6d45218494a4db078cdde
MD5 35cd4d1deb992433b64f4da88a50a80e
BLAKE2b-256 9130a960edbf98f296e86b98c2132974872b6607521bcf82617ebda4d3c16b40

See more details on using hashes here.

Provenance

The following attestation bundles were made for causationentropy-1.0.0-py3-none-any.whl:

Publisher: release.yml on Center-For-Complex-Systems-Science/causationentropy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page