Skip to main content

Causal Forests with Fixed Effects for Panel and Difference-in-Differences Settings

Project description

causalfe

Causal Forests with Fixed Effects in Python

Python 3.9+ License: MIT

Overview

causalfe provides the first fully Pythonic implementation of Causal Forests with Fixed Effects (CFFE), enabling researchers and practitioners to estimate heterogeneous treatment effects in panel and difference-in-differences settings while rigorously controlling for unit and time fixed effects.

This package is a Python implementation inspired by Kattenberg, Scheer, and Thiel (2023), who developed the CFFE methodology and released an R package. We built this Python version to make CFFE accessible to the broader Python econometrics community.

Key Features

  • Node-level FE residualization: Fixed effects are removed within each tree node, not globally
  • τ-heterogeneity splitting: Splits maximize treatment effect heterogeneity, not outcome variance
  • Honest estimation: Separate samples for tree structure and leaf estimation
  • Cluster-aware inference: Valid standard errors for panel data
  • Backward compatible: Reduces to standard causal forest when no fixed effects present

Installation

git clone https://github.com/haytug/causalfe.git
cd causalfe
pip install .

For development:

pip install -e ".[dev]"

For EconML comparison:

pip install -e ".[compare]"

Quick Start

from causalfe import CFFEForest

# Your panel data
# X: covariates (n, p)
# Y: outcome (n,)
# D: treatment (n,)
# unit: unit identifiers (n,)
# time: time identifiers (n,)

forest = CFFEForest(n_trees=100, max_depth=5, min_leaf=20)
forest.fit(X, Y, D, unit, time)

# Point estimates
tau_hat = forest.predict(X)

# With confidence intervals
tau_hat, ci_lower, ci_upper = forest.predict_interval(X, alpha=0.05)

Example with Simulated Data

from causalfe import CFFEForest
from causalfe.simulations.did_dgp import dgp_did_heterogeneous
import numpy as np

# Generate heterogeneous DiD data
X, Y, D, unit, time, tau_true = dgp_did_heterogeneous(N=200, T=6)

# Fit CFFE
forest = CFFEForest(n_trees=100, max_depth=4, min_leaf=20)
forest.fit(X, Y, D, unit, time)
tau_hat = forest.predict(X)

# Evaluate
corr = np.corrcoef(tau_hat, tau_true)[0, 1]
print(f"Correlation with true τ: {corr:.3f}")  # ~0.9

Validation Results

Simulation Mean τ̂ RMSE Corr(τ̂, τ) Status
FE-only (τ=0) ~0 ~0.4 N/A
Homogeneous (τ=2) ~1.8 ~0.4 N/A
Heterogeneous DiD varies ~0.5 0.93
Staggered Adoption varies ~0.6 0.88

Inference

Multiple variance estimation methods are available:

from causalfe import half_sample_variance, cluster_robust_variance

# Half-sample variance (fast, default)
tau_hat, var_hat = forest.predict_with_variance(X)

# Or use standalone functions
var_half = half_sample_variance(forest.trees, X)

# Cluster-robust variance for ATE
var_cluster = cluster_robust_variance(tau_hat, unit)

API Reference

CFFEForest

CFFEForest(
    n_trees=100,      # Number of trees
    max_depth=5,      # Maximum tree depth
    min_leaf=20,      # Minimum samples per leaf
    honest=True,      # Use honest estimation
    subsample_ratio=0.5,  # Fraction of units to subsample
    seed=None,        # Random seed
)

Methods:

  • fit(X, Y, D, unit, time): Fit the forest
  • predict(X): Predict CATEs
  • predict_with_variance(X, method="half_sample"): Predict with variance
  • predict_interval(X, alpha=0.05): Predict with confidence intervals
  • get_params(deep=True): Get estimator parameters (scikit-learn compatible)
  • set_params(**params): Set estimator parameters (scikit-learn compatible)
  • score(X, Y, D, unit, time, tau_true=None): R² score for CATE predictions
  • clone(): Create an unfitted copy with same parameters

Scikit-learn Compatibility:

The CFFEForest class follows scikit-learn conventions:

# Informative string representation
>>> forest = CFFEForest(n_trees=100, max_depth=4)
>>> print(forest)
CFFEForest(n_trees=100, max_depth=4, min_leaf=20)
  Fitted: No

# Get/set parameters
>>> forest.get_params()
{'n_trees': 100, 'max_depth': 4, 'min_leaf': 20, ...}
>>> forest.set_params(n_trees=200)

# Score with known true effects (useful for simulations)
>>> score = forest.score(X, Y, D, unit, time, tau_true=tau_true)

Variance Functions

  • half_sample_variance(trees, X): Fast half-sample variance
  • jackknife_variance(trees, X): More stable jackknife variance
  • cluster_robust_variance(tau_hat, clusters): Cluster-robust variance
  • cluster_bootstrap_variance(...): Full cluster bootstrap

Methodology

CFFE modifies the standard causal forest in two key ways:

  1. Node-level FE orthogonalization: Within each node, we residualize Y and D:

    • Ỹ = Y - α̂ᵢ - γ̂ₜ
    • D̃ = D - α̂ᴰᵢ - γ̂ᴰₜ
  2. τ-heterogeneity splitting: Splits maximize:

    • Δ(Sₗ, Sᵣ) = (nₗ·nᵣ/n²) · (τ̂ₗ - τ̂ᵣ)²
  3. IV-style leaf estimation:

    • τ̂ = Σ D̃Ỹ / Σ D̃²

See docs/methods.md for full methodology.

Citation

If you use this package in your research, please cite:

@article{aytug2026causalfe,
  title={causalfe: Causal Forests with Fixed Effects in Python},
  author={Aytug, Harry},
  journal={arXiv preprint arXiv:2601.10555},
  year={2026},
  doi={10.48550/arXiv.2601.10555}
}

The CFFE methodology was originally developed by Kattenberg, Scheer, and Thiel (2023):

@article{kattenberg2023causal,
  title={Causal Forests with Fixed Effects for Treatment Effect Heterogeneity in Difference-in-Differences},
  author={Kattenberg, Mark A.C. and Scheer, Bas J. and Thiel, Jurre H.},
  journal={CPB Discussion Paper},
  year={2023},
  institution={Netherlands Institute for Economic Policy Analysis (CPB)}
}

Alternatively, to cite the software directly:

@software{causalfe,
  title={causalfe: Causal Forests with Fixed Effects in Python},
  author={Aytug, Harry},
  year={2026},
  url={https://github.com/haytug/causalfe}
}

References

  • Kattenberg, M.A.C., Scheer, B.J., & Thiel, J.H. (2023). Causal Forests with Fixed Effects for Treatment Effect Heterogeneity in Difference-in-Differences. CPB Discussion Paper. — The foundational paper for this implementation.
  • Athey, S., & Imbens, G. (2016). Recursive Partitioning for Heterogeneous Causal Effects. PNAS.
  • Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. JASA.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causalfe-0.2.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causalfe-0.2.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file causalfe-0.2.0.tar.gz.

File metadata

  • Download URL: causalfe-0.2.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for causalfe-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d6044a5d832ddbd5be725baf8a3aa53a5afe60582736f7c77ca953a0db63dae8
MD5 189dd443bf55163ff36858fe9b6f0793
BLAKE2b-256 f945036fc6c92f799efd4226db11777475a56405e133c20aacebf458fc91e8bc

See more details on using hashes here.

File details

Details for the file causalfe-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: causalfe-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for causalfe-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea6149769d2f02f8797210b709fc31f6be3feb25dea87fae278f1364bd2b5461
MD5 638eb909e3292fb8be210bef7d7a661e
BLAKE2b-256 bbef6c2b7bea474c808c082ddacfdf1b527e6eaa9bb9c783b29b20d6f2134c94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page