Skip to main content

Causal Forests with Fixed Effects for Panel and Difference-in-Differences Settings

Project description

causalfe

Causal Forests with Fixed Effects in Python

Python 3.9+ License: MIT

Overview

causalfe provides the first fully Pythonic implementation of Causal Forests with Fixed Effects (CFFE), enabling researchers and practitioners to estimate heterogeneous treatment effects in panel and difference-in-differences settings while rigorously controlling for unit and time fixed effects.

This package is a Python implementation inspired by Kattenberg, Scheer, and Thiel (2023), who developed the CFFE methodology and released an R package. We built this Python version to make CFFE accessible to the broader Python econometrics community.

Key Features

  • Node-level FE residualization: Fixed effects are removed within each tree node, not globally
  • τ-heterogeneity splitting: Splits maximize treatment effect heterogeneity, not outcome variance
  • Honest estimation: Separate samples for tree structure and leaf estimation
  • Cluster-aware inference: Valid standard errors for panel data
  • Backward compatible: Reduces to standard causal forest when no fixed effects present

Installation

git clone https://github.com/haytug/causalfe.git
cd causalfe
pip install .

For development:

pip install -e ".[dev]"

For EconML comparison:

pip install -e ".[compare]"

Quick Start

from causalfe import CFFEForest

# Your panel data
# X: covariates (n, p)
# Y: outcome (n,)
# D: treatment (n,)
# unit: unit identifiers (n,)
# time: time identifiers (n,)

forest = CFFEForest(n_trees=100, max_depth=5, min_leaf=20)
forest.fit(X, Y, D, unit, time)

# Point estimates
tau_hat = forest.predict(X)

# With confidence intervals
tau_hat, ci_lower, ci_upper = forest.predict_interval(X, alpha=0.05)

Example with Simulated Data

from causalfe import CFFEForest
from causalfe.simulations.did_dgp import dgp_did_heterogeneous
import numpy as np

# Generate heterogeneous DiD data
X, Y, D, unit, time, tau_true = dgp_did_heterogeneous(N=200, T=6)

# Fit CFFE
forest = CFFEForest(n_trees=100, max_depth=4, min_leaf=20)
forest.fit(X, Y, D, unit, time)
tau_hat = forest.predict(X)

# Evaluate
corr = np.corrcoef(tau_hat, tau_true)[0, 1]
print(f"Correlation with true τ: {corr:.3f}")  # ~0.9

Validation Results

Simulation Mean τ̂ RMSE Corr(τ̂, τ) Status
FE-only (τ=0) ~0 ~0.4 N/A
Homogeneous (τ=2) ~1.8 ~0.4 N/A
Heterogeneous DiD varies ~0.5 0.93
Staggered Adoption varies ~0.6 0.88

Inference

Multiple variance estimation methods are available:

from causalfe import half_sample_variance, cluster_robust_variance

# Half-sample variance (fast, default)
tau_hat, var_hat = forest.predict_with_variance(X)

# Or use standalone functions
var_half = half_sample_variance(forest.trees, X)

# Cluster-robust variance for ATE
var_cluster = cluster_robust_variance(tau_hat, unit)

API Reference

CFFEForest

CFFEForest(
    n_trees=100,      # Number of trees
    max_depth=5,      # Maximum tree depth
    min_leaf=20,      # Minimum samples per leaf
    honest=True,      # Use honest estimation
    subsample_ratio=0.5,  # Fraction of units to subsample
    seed=None,        # Random seed
)

Methods:

  • fit(X, Y, D, unit, time): Fit the forest
  • predict(X): Predict CATEs
  • predict_with_variance(X, method="half_sample"): Predict with variance
  • predict_interval(X, alpha=0.05): Predict with confidence intervals

Variance Functions

  • half_sample_variance(trees, X): Fast half-sample variance
  • jackknife_variance(trees, X): More stable jackknife variance
  • cluster_robust_variance(tau_hat, clusters): Cluster-robust variance
  • cluster_bootstrap_variance(...): Full cluster bootstrap

Methodology

CFFE modifies the standard causal forest in two key ways:

  1. Node-level FE orthogonalization: Within each node, we residualize Y and D:

    • Ỹ = Y - α̂ᵢ - γ̂ₜ
    • D̃ = D - α̂ᴰᵢ - γ̂ᴰₜ
  2. τ-heterogeneity splitting: Splits maximize:

    • Δ(Sₗ, Sᵣ) = (nₗ·nᵣ/n²) · (τ̂ₗ - τ̂ᵣ)²
  3. IV-style leaf estimation:

    • τ̂ = Σ D̃Ỹ / Σ D̃²

See docs/methods.md for full methodology.

Citation

If you use this package in your research, please cite:

@article{aytug2026causalfe,
  title={causalfe: Causal Forests with Fixed Effects in Python},
  author={Aytug, Harry},
  journal={arXiv preprint arXiv:2601.10555},
  year={2026},
  doi={10.48550/arXiv.2601.10555}
}

The CFFE methodology was originally developed by Kattenberg, Scheer, and Thiel (2023):

@article{kattenberg2023causal,
  title={Causal Forests with Fixed Effects for Treatment Effect Heterogeneity in Difference-in-Differences},
  author={Kattenberg, Mark A.C. and Scheer, Bas J. and Thiel, Jurre H.},
  journal={CPB Discussion Paper},
  year={2023},
  institution={Netherlands Institute for Economic Policy Analysis (CPB)}
}

Alternatively, to cite the software directly:

@software{causalfe,
  title={causalfe: Causal Forests with Fixed Effects in Python},
  author={Aytug, Harry},
  year={2026},
  url={https://github.com/haytug/causalfe}
}

References

  • Kattenberg, M.A.C., Scheer, B.J., & Thiel, J.H. (2023). Causal Forests with Fixed Effects for Treatment Effect Heterogeneity in Difference-in-Differences. CPB Discussion Paper. — The foundational paper for this implementation.
  • Athey, S., & Imbens, G. (2016). Recursive Partitioning for Heterogeneous Causal Effects. PNAS.
  • Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. JASA.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causalfe-0.1.1.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causalfe-0.1.1-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file causalfe-0.1.1.tar.gz.

File metadata

  • Download URL: causalfe-0.1.1.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for causalfe-0.1.1.tar.gz
Algorithm Hash digest
SHA256 67651eed3a57b778bcf6a3fbf95066d76f5d0eeb72fd941676cabdc21af5bd18
MD5 f60360ad380126f43a3be25cf8640e82
BLAKE2b-256 014d77c64893257bff385df706def41e3d3ba1834156e0f6a56eb6aee071e09b

See more details on using hashes here.

File details

Details for the file causalfe-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: causalfe-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for causalfe-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d17fc255ed747ab5c0efd1543385a9eccfcbe24986c860a66ca6c2c9b75c985
MD5 c9dcc6dfd4457270dedbe042882e9d18
BLAKE2b-256 cb5eb6cfdaa557633ac198a07f885102277a43428646db8afb2c37fd6bbae826

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page