Causal Forests with Fixed Effects for Panel and Difference-in-Differences Settings
Project description
causalfe
Causal Forests with Fixed Effects in Python
Overview
causalfe provides the first fully Pythonic implementation of Causal Forests with Fixed Effects (CFFE), enabling researchers and practitioners to estimate heterogeneous treatment effects in panel and difference-in-differences settings while rigorously controlling for unit and time fixed effects.
This package is a Python implementation inspired by Kattenberg, Scheer, and Thiel (2023), who developed the CFFE methodology and released an R package. We built this Python version to make CFFE accessible to the broader Python econometrics community.
Key Features
- Node-level FE residualization: Fixed effects are removed within each tree node, not globally
- τ-heterogeneity splitting: Splits maximize treatment effect heterogeneity, not outcome variance
- Honest estimation: Separate samples for tree structure and leaf estimation
- Cluster-aware inference: Valid standard errors for panel data
- Backward compatible: Reduces to standard causal forest when no fixed effects present
Installation
git clone https://github.com/haytug/causalfe.git
cd causalfe
pip install .
For development:
pip install -e ".[dev]"
For EconML comparison:
pip install -e ".[compare]"
Quick Start
from causalfe import CFFEForest
# Your panel data
# X: covariates (n, p)
# Y: outcome (n,)
# D: treatment (n,)
# unit: unit identifiers (n,)
# time: time identifiers (n,)
forest = CFFEForest(n_trees=100, max_depth=5, min_leaf=20)
forest.fit(X, Y, D, unit, time)
# Point estimates
tau_hat = forest.predict(X)
# With confidence intervals
tau_hat, ci_lower, ci_upper = forest.predict_interval(X, alpha=0.05)
Example with Simulated Data
from causalfe import CFFEForest
from causalfe.simulations.did_dgp import dgp_did_heterogeneous
import numpy as np
# Generate heterogeneous DiD data
X, Y, D, unit, time, tau_true = dgp_did_heterogeneous(N=200, T=6)
# Fit CFFE
forest = CFFEForest(n_trees=100, max_depth=4, min_leaf=20)
forest.fit(X, Y, D, unit, time)
tau_hat = forest.predict(X)
# Evaluate
corr = np.corrcoef(tau_hat, tau_true)[0, 1]
print(f"Correlation with true τ: {corr:.3f}") # ~0.9
Validation Results
| Simulation | Mean τ̂ | RMSE | Corr(τ̂, τ) | Status |
|---|---|---|---|---|
| FE-only (τ=0) | ~0 | ~0.4 | N/A | ✓ |
| Homogeneous (τ=2) | ~1.8 | ~0.4 | N/A | ✓ |
| Heterogeneous DiD | varies | ~0.5 | 0.93 | ✓ |
| Staggered Adoption | varies | ~0.6 | 0.88 | ✓ |
Inference
Multiple variance estimation methods are available:
from causalfe import half_sample_variance, cluster_robust_variance
# Half-sample variance (fast, default)
tau_hat, var_hat = forest.predict_with_variance(X)
# Or use standalone functions
var_half = half_sample_variance(forest.trees, X)
# Cluster-robust variance for ATE
var_cluster = cluster_robust_variance(tau_hat, unit)
API Reference
CFFEForest
CFFEForest(
n_trees=100, # Number of trees
max_depth=5, # Maximum tree depth
min_leaf=20, # Minimum samples per leaf
honest=True, # Use honest estimation
subsample_ratio=0.5, # Fraction of units to subsample
seed=None, # Random seed
)
Methods:
fit(X, Y, D, unit, time): Fit the forestpredict(X): Predict CATEspredict_with_variance(X, method="half_sample"): Predict with variancepredict_interval(X, alpha=0.05): Predict with confidence intervalsget_params(deep=True): Get estimator parameters (scikit-learn compatible)set_params(**params): Set estimator parameters (scikit-learn compatible)score(X, Y, D, unit, time, tau_true=None): R² score for CATE predictionsclone(): Create an unfitted copy with same parameters
Scikit-learn Compatibility:
The CFFEForest class follows scikit-learn conventions:
# Informative string representation
>>> forest = CFFEForest(n_trees=100, max_depth=4)
>>> print(forest)
CFFEForest(n_trees=100, max_depth=4, min_leaf=20)
Fitted: No
# Get/set parameters
>>> forest.get_params()
{'n_trees': 100, 'max_depth': 4, 'min_leaf': 20, ...}
>>> forest.set_params(n_trees=200)
# Score with known true effects (useful for simulations)
>>> score = forest.score(X, Y, D, unit, time, tau_true=tau_true)
Variance Functions
half_sample_variance(trees, X): Fast half-sample variancejackknife_variance(trees, X): More stable jackknife variancecluster_robust_variance(tau_hat, clusters): Cluster-robust variancecluster_bootstrap_variance(...): Full cluster bootstrap
Methodology
CFFE modifies the standard causal forest in two key ways:
-
Node-level FE orthogonalization: Within each node, we residualize Y and D:
- Ỹ = Y - α̂ᵢ - γ̂ₜ
- D̃ = D - α̂ᴰᵢ - γ̂ᴰₜ
-
τ-heterogeneity splitting: Splits maximize:
- Δ(Sₗ, Sᵣ) = (nₗ·nᵣ/n²) · (τ̂ₗ - τ̂ᵣ)²
-
IV-style leaf estimation:
- τ̂ = Σ D̃Ỹ / Σ D̃²
See docs/methods.md for full methodology.
Citation
If you use this package in your research, please cite:
@article{aytug2026causalfe,
title={causalfe: Causal Forests with Fixed Effects in Python},
author={Aytug, Harry},
journal={arXiv preprint arXiv:2601.10555},
year={2026},
doi={10.48550/arXiv.2601.10555}
}
The CFFE methodology was originally developed by Kattenberg, Scheer, and Thiel (2023):
@article{kattenberg2023causal,
title={Causal Forests with Fixed Effects for Treatment Effect Heterogeneity in Difference-in-Differences},
author={Kattenberg, Mark A.C. and Scheer, Bas J. and Thiel, Jurre H.},
journal={CPB Discussion Paper},
year={2023},
institution={Netherlands Institute for Economic Policy Analysis (CPB)}
}
Alternatively, to cite the software directly:
@software{causalfe,
title={causalfe: Causal Forests with Fixed Effects in Python},
author={Aytug, Harry},
year={2026},
url={https://github.com/haytug/causalfe}
}
References
- Kattenberg, M.A.C., Scheer, B.J., & Thiel, J.H. (2023). Causal Forests with Fixed Effects for Treatment Effect Heterogeneity in Difference-in-Differences. CPB Discussion Paper. — The foundational paper for this implementation.
- Athey, S., & Imbens, G. (2016). Recursive Partitioning for Heterogeneous Causal Effects. PNAS.
- Wager, S., & Athey, S. (2018). Estimation and Inference of Heterogeneous Treatment Effects using Random Forests. JASA.
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causalfe-0.2.0.tar.gz.
File metadata
- Download URL: causalfe-0.2.0.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6044a5d832ddbd5be725baf8a3aa53a5afe60582736f7c77ca953a0db63dae8
|
|
| MD5 |
189dd443bf55163ff36858fe9b6f0793
|
|
| BLAKE2b-256 |
f945036fc6c92f799efd4226db11777475a56405e133c20aacebf458fc91e8bc
|
File details
Details for the file causalfe-0.2.0-py3-none-any.whl.
File metadata
- Download URL: causalfe-0.2.0-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea6149769d2f02f8797210b709fc31f6be3feb25dea87fae278f1364bd2b5461
|
|
| MD5 |
638eb909e3292fb8be210bef7d7a661e
|
|
| BLAKE2b-256 |
bbef6c2b7bea474c808c082ddacfdf1b527e6eaa9bb9c783b29b20d6f2134c94
|