Skip to main content

Causal network discovery using optimal causation entropy

Project description

CausationEntropy

License: MIT Python 3.8+ Documentation Status codecov Tests DOI

A Python library for discovering causal networks from time series data using Optimal Causation Entropy (oCSE).

Overview

CausationEntropy implements state-of-the-art information-theoretic methods for causal discovery from multivariate time series. The library provides robust algorithms that can identify causal relationships while controlling for confounding variables and false discoveries.

What it does

Given time series data, CausationEntropy finds which variables cause changes in other variables by:

  1. Predictive Testing: Testing if knowing variable X at time t helps predict variable Y at time t+1
  2. Information Theory: Using conditional mutual information to measure predictive relationships
  3. Statistical Control: Rigorous statistical testing to avoid false discoveries
  4. Multiple Methods: Supporting various information estimators and discovery algorithms

Installation

From PyPI (recommended)

pip install causationentropy

Development Installation

git clone https://github.com/Center-For-Complex-Systems-Science/causationentropy.git
cd causationentropy
pip install -e .
Run the tests
python -m pytest causationentropy/tests/ --cov=causationentropy --cov-report=xml --cov-report=term-missing -v

Quick Start

See our Quick Start colab notebook: Open In Colab

Basic Usage

Get the relationships as a data frame:

import pandas as pd
from causationentropy import discover_network
from causationentropy.graph import network_to_dataframe

# Load your time series data (variables as columns, time as rows)
data = pd.read_csv('data.csv')

# Discover causal network
network = discover_network(data, method='standard', max_lag=5)
df = network_to_dataframe(network)
df.head()

Plot the causal network:

from causationentropy import discover_network
from causationentropy.core.plotting import plot_causal_network

# Load your time series data (variables as columns, time as rows)
data = pd.read_csv('data.csv')

# Discover causal network
network = discover_network(data, method='standard', max_lag=5)
fig, ax = plot_causal_network(network, save_path="network.png")

Note: This implementation of this algorithm runs in O(n^2 T log T) where N is the number of variables and T is the length of the time series. Application of this algorithm without optimizations is computationally intensive. When running this algorithm, please be patient. Optimizations of the algorithm are planned for a later release that leverage singular value decomposition and KD-Trees. However, these optimizations are not part of the original algorithm. Adding additional lags also contributes to additional performance degradations.

Advanced Configuration

from causationentropy import discover_network

# Configure discovery parameters
network = discover_network(
    data,
    method='standard',          # 'standard', 'alternative', 'information_lasso', or 'lasso'
    information='gaussian',     # 'gaussian', 'knn', 'kde', 'geometric_knn', or 'poisson'
    max_lag=5,                  # Maximum time lag to consider
    alpha_forward=0.05,         # Forward selection significance
    alpha_backward=0.05,        # Backward elimination significance
    n_shuffles=200              # Permutation test iterations
)

Synthetic Data Example

from causationentropy.datasets import synthetic
from causationentropy import discover_network

# Generate synthetic causal time series
data, true_network = synthetic.linear_stochastic_gaussian_process(
    n_variables=5, 
    n_samples=1000, 
    sparsity=0.3
)

# Discover network
discovered = discover_network(data)

Key Features

  • Multiple Algorithms: Standard, alternative, information lasso, and lasso variants of oCSE
  • Flexible Information Estimators: Gaussian, k-NN, KDE, geometric k-NN, and Poisson methods
  • Statistical Rigor: Permutation-based significance testing with comprehensive test coverage
  • Synthetic Data: Built-in generators for testing and validation
  • Visualization: Network plotting and analysis tools

Mathematical Foundation

The algorithm uses conditional mutual information to quantify causal relationships:

$$I(X; Y | Z) = H(X | Z) + H(Y | Z) - H(X, Y | Z)$$

This measures how much variable X tells us about variable Y, beyond what we already know from conditioning set Z.

Causal Discovery Rule: Variable X causes Y if knowing X(t) significantly improves prediction of Y(t+1), even when controlling for all other relevant variables.

The algorithm implements a two-phase approach:

  1. Forward Selection: Iteratively adds predictors that maximize conditional mutual information
  2. Backward Elimination: Removes predictors that lose significance when conditioned on others

Documentation

📚 Read the full documentation on ReadTheDocs

  • API Reference: Complete function and class documentation
  • User Guide: Detailed tutorials and examples
  • Theory: Mathematical background and algorithms
  • Examples: Check the notebooks/ directory
  • Research Papers: See the theory glossary in the documentation

Local Documentation

Build documentation locally:

cd docs/
make html
# Open docs/_build/html/index.html

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Citation

If you use this library in your research, please cite:

   @misc{slote2025causationentropy,
     author  = {Slote, Kevin and Fish, Jeremie and Bollt, Erik},
     title   = {CausationEntropy: A Python Library for Causal Discovery},
     url     = {https://github.com/Center-For-Complex-Systems-Science/causationentropy},
     doi     = {10.5281/zenodo.17047565}
   }

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Acknowledgments

This work builds upon fundamental research in information theory, causal inference, and time series analysis. Special thanks to the open-source scientific Python community.

Original Code

LLM Disclosure

Generative AI was used to help with doc strings, documentation, and unit tests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causationentropy-1.1.0.tar.gz (83.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causationentropy-1.1.0-py3-none-any.whl (91.3 kB view details)

Uploaded Python 3

File details

Details for the file causationentropy-1.1.0.tar.gz.

File metadata

  • Download URL: causationentropy-1.1.0.tar.gz
  • Upload date:
  • Size: 83.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for causationentropy-1.1.0.tar.gz
Algorithm Hash digest
SHA256 0eb401d5f7b8602d3105b0ff4c57e1ecb5b0590f86ac9afc1b37278bafab544b
MD5 c18274649885920f9d9a0020dd986721
BLAKE2b-256 8796f8ef74b13b0214a7fd170c81fbb1deba04bb0567c7e887e72002c1626991

See more details on using hashes here.

Provenance

The following attestation bundles were made for causationentropy-1.1.0.tar.gz:

Publisher: release.yml on Center-For-Complex-Systems-Science/causationentropy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file causationentropy-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for causationentropy-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5b80bc53081d652660ce9e4f222b2147ddb651e354e875864ae644fcfb7d2b8
MD5 4621caee3f44e58d609aca07f4e956e8
BLAKE2b-256 822c5414b678d6debeddd3bab38a52d2d2ecde1bcc89533ea665c1d1b7393c4b

See more details on using hashes here.

Provenance

The following attestation bundles were made for causationentropy-1.1.0-py3-none-any.whl:

Publisher: release.yml on Center-For-Complex-Systems-Science/causationentropy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page