Skip to main content

Causal Forests with Fixed Effects for Panel and Difference-in-Differences Settings

Project description

causalfe

Causal Forests with Fixed Effects in Python

Python 3.9+ License: MIT

Overview

causalfe provides the first fully Pythonic implementation of Causal Forests with Fixed Effects (CFFE), enabling researchers and practitioners to estimate heterogeneous treatment effects in panel and difference-in-differences settings while rigorously controlling for unit and time fixed effects.

This package is a Python implementation inspired by Kattenberg, Scheer, and Thiel (2023), who developed the CFFE methodology and released an R package. We built this Python version to make CFFE accessible to the broader Python econometrics community.

Key Features

  • Node-level FE residualization: Fixed effects are removed within each tree node, not globally
  • τ-heterogeneity splitting: Splits maximize treatment effect heterogeneity, not outcome variance
  • Honest estimation: Separate samples for tree structure and leaf estimation
  • Cluster-aware inference: Valid standard errors for panel data
  • Backward compatible: Reduces to standard causal forest when no fixed effects present

Installation

git clone https://github.com/haytug/causalfe.git
cd causalfe
pip install .

For development:

pip install -e ".[dev]"

For EconML comparison:

pip install -e ".[compare]"

Quick Start

from causalfe import CFFEForest

# Your panel data
# X: covariates (n, p)
# Y: outcome (n,)
# D: treatment (n,)
# unit: unit identifiers (n,)
# time: time identifiers (n,)

forest = CFFEForest(n_trees=100, max_depth=5, min_leaf=20)
forest.fit(X, Y, D, unit, time)

# Point estimates
tau_hat = forest.predict(X)

# With confidence intervals
tau_hat, ci_lower, ci_upper = forest.predict_interval(X, alpha=0.05)

Example with Simulated Data

from causalfe import CFFEForest
from causalfe.simulations.did_dgp import dgp_did_heterogeneous
import numpy as np

# Generate heterogeneous DiD data
X, Y, D, unit, time, tau_true = dgp_did_heterogeneous(N=200, T=6)

# Fit CFFE
forest = CFFEForest(n_trees=100, max_depth=4, min_leaf=20)
forest.fit(X, Y, D, unit, time)
tau_hat = forest.predict(X)

# Evaluate
corr = np.corrcoef(tau_hat, tau_true)[0, 1]
print(f"Correlation with true τ: {corr:.3f}")  # ~0.9

Validation Results

Simulation Mean τ̂ RMSE Corr(τ̂, τ) Status
FE-only (τ=0) ~0 ~0.4 N/A
Homogeneous (τ=2) ~1.8 ~0.4 N/A
Heterogeneous DiD varies ~0.5 0.93
Staggered Adoption varies ~0.6 0.88

Inference

Multiple variance estimation methods are available:

from causalfe import half_sample_variance, cluster_robust_variance

# Half-sample variance (fast, default)
tau_hat, var_hat = forest.predict_with_variance(X)

# Or use standalone functions
var_half = half_sample_variance(forest.trees, X)

# Cluster-robust variance for ATE
var_cluster = cluster_robust_variance(tau_hat, unit)

API Reference

CFFEForest

CFFEForest(
    n_trees=100,      # Number of trees
    max_depth=5,      # Maximum tree depth
    min_leaf=20,      # Minimum samples per leaf
    honest=True,      # Use honest estimation
    subsample_ratio=0.5,  # Fraction of units to subsample
    seed=None,        # Random seed
)

Methods:

  • fit(X, Y, D, unit, time): Fit the forest
  • predict(X): Predict CATEs
  • predict_with_variance(X, method="half_sample"): Predict with variance
  • predict_interval(X, alpha=0.05): Predict with confidence intervals

Variance Functions

  • half_sample_variance(trees, X): Fast half-sample variance
  • jackknife_variance(trees, X): More stable jackknife variance
  • cluster_robust_variance(tau_hat, clusters): Cluster-robust variance
  • cluster_bootstrap_variance(...): Full cluster bootstrap

Methodology

CFFE modifies the standard causal forest in two key ways:

  1. Node-level FE orthogonalization: Within each node, we residualize Y and D:

    • Ỹ = Y - α̂ᵢ - γ̂ₜ
    • D̃ = D - α̂ᴰᵢ - γ̂ᴰₜ
  2. τ-heterogeneity splitting: Splits maximize:

    • Δ(Sₗ, Sᵣ) = (nₗ·nᵣ/n²) · (τ̂ₗ - τ̂ᵣ)²
  3. IV-style leaf estimation:

    • τ̂ = Σ D̃Ỹ / Σ D̃²

See docs/methods.md for full methodology.

Citation

If you use this package in your research, please cite the original CFFE paper:

@article{kattenberg2023causal,
  title={Causal Forests with Fixed Effects for Treatment Effect Heterogeneity in Difference-in-Differences},
  author={Kattenberg, Mark A.C. and Scheer, Bas J. and Thiel, Jurre H.},
  journal={CPB Discussion Paper},
  year={2023},
  institution={Netherlands Institute for Economic Policy Analysis (CPB)}
}

And optionally this Python implementation:

@software{causalfe,
  title = {causalfe: Causal Forests with Fixed Effects in Python},
  author = {Aytug, Harry},
  year = {2026},
  url = {https://github.com/haytug/causalfe}
}

References

  • Kattenberg, M.A.C., Scheer, B.J., & Thiel, J.H. (2023). Causal Forests with Fixed Effects for Treatment Effect Heterogeneity in Difference-in-Differences. CPB Discussion Paper. — The foundational paper for this implementation.
  • Athey, S., & Imbens, G. (2016). Recursive Partitioning for Heterogeneous Causal Effects. PNAS.
  • Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. JASA.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causalfe-0.1.0.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causalfe-0.1.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file causalfe-0.1.0.tar.gz.

File metadata

  • Download URL: causalfe-0.1.0.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for causalfe-0.1.0.tar.gz
Algorithm Hash digest
SHA256 623751dbddbeb4c5593b7b02716259efdb50412eb84c9653bd522540e8135069
MD5 2ed336e28927405390445273d94544d2
BLAKE2b-256 59cb245eca0f79c9042330725a32a2ea4d3bba50c87736ee3be10afc7d2e533d

See more details on using hashes here.

File details

Details for the file causalfe-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: causalfe-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for causalfe-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55d1a94ecd3ed4cc57d19b53c44ea4183cb22684e1f1219b723780a30b2c5caf
MD5 31959725a37d8b2cf61867a16179b87b
BLAKE2b-256 c8a057dd3d7ab1c9780858912741c2f894b2ac5f4a84241d14d5ce7a94ea5559

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page