Skip to main content

Fast counterfactual estimators for panel data — Python reimplementation of R fect

Project description

pyfector

Alpha (v0.1.x). Results have been checked against R fect on synthetic data, but edge cases may remain. APIs may change. Please verify critical results independently. Issues and pull requests are welcome at GitHub.

Counterfactual estimators for panel data in Python. Port of the R fect package (Liu, Wang & Xu, 2024, AJPS), built on numpy, scipy, polars, and joblib, with optional CuPy GPU support.

Installation

pip install pyfector

From source:

git clone https://github.com/AlanHuang99/pyfector.git
cd pyfector
pip install -e ".[dev]"

Optional extras: pip install pyfector[gpu] (CuPy), pip install pyfector[pandas].

Quick Start

import polars as pl
import pyfector

data = pl.read_parquet("panel_data.parquet")

result = pyfector.fect(
    data=data,
    Y="outcome",
    D="treatment",
    index=("unit_id", "year"),
    X=["gdp", "population"],
    method="ife",
    r=(0, 5),
    se=True,
    nboots=500,
    seed=42,
)

result.summary()
result.plot(kind="gap")
result.diagnose()

Estimators

method="fe" -- Two-way fixed effects

Unit and time fixed effects on the control sample, counterfactuals imputed for the treated sample. Assumes parallel trends.

method="ife" -- Interactive fixed effects (Bai 2009, Xu 2017)

Adds latent factors with unit loadings on top of fixed effects. Number of factors r can be fixed or chosen by cross-validation over r=(r_min, r_max).

method="mc" -- Matrix completion (Athey et al. 2021)

Nuclear-norm-penalized matrix completion of the counterfactual matrix. Penalty lam can be fixed or selected by cross-validation.

method="cfe" -- Complex fixed effects

Unit/time-varying interactions with covariates Z and Q.

Cross-validation masks a fraction cv_prop of control cells and picks r (IFE) or lam (MC) by prediction error (mspe, gmspe, or mad).


Parameters

fect()

fect(
    data, Y, D, index,
    *, X=None, W=None, method="ife", force="two-way",
    r=0, lam=None, nlambda=10, CV=True, k=10, cv_prop=0.1, criterion="mspe",
    se=False, vartype="bootstrap", nboots=200, alpha=0.05,
    tol=1e-7, max_iter=5000, min_T0=1, normalize=False,
    device="cpu", n_jobs=1, seed=None,
) -> FectResult
Parameter Type Default Description
data pl.DataFrame or pd.DataFrame required Panel data in long format
Y str required Outcome column
D str required Treatment indicator (0/1)
index (str, str) required (unit_id, time) column names
X list[str] None Time-varying covariate columns
W str None Observation weight column
method str "ife" "fe", "ife", "mc", "cfe"
force str "two-way" "none", "unit", "time", "two-way"
r int or (int, int) 0 Number of factors; tuple triggers CV over range
lam float None MC nuclear-norm penalty; None selects by CV
CV bool True Cross-validate over r or lam
k int 10 CV folds
cv_prop float 0.1 Fraction of control cells masked per CV fold
criterion str "mspe" CV loss: "mspe", "gmspe", "mad"
se bool False Compute standard errors
vartype str "bootstrap" "bootstrap" or "jackknife"
nboots int 200 Bootstrap replications
tol float 1e-7 EM convergence tolerance
max_iter int 5000 Max EM iterations
min_T0 int 1 Minimum pre-treatment periods required per unit
normalize bool False Normalize outcome by standard deviation
device str "cpu" "cpu" or "gpu" (requires pyfector[gpu])
n_jobs int 1 Parallel workers for CV / bootstrap
seed int None Random seed

Data format

Long-format panel with one row per unit per period.

Column Type Description
outcome float Outcome variable
unit id int/str Unit identifier
time int Time period
treatment int 0/1 treatment indicator

Output

fect() returns a FectResult with these fields:

Field Type Description
att_avg float Overall ATT (weighted by cell counts)
att_avg_unit float Unit-averaged ATT
att_on ndarray Dynamic ATT by relative time
time_on ndarray Relative time indices
count_on ndarray Observation count per relative time
beta ndarray Covariate coefficients
Y_ct ndarray (T, N) counterfactual outcome matrix
eff ndarray (T, N) treatment effect matrix
factors ndarray (T, r) estimated factors (IFE)
loadings ndarray (N, r) estimated loadings (IFE)
sigma2 float Residual variance
r_cv int CV-selected r
lambda_cv float CV-selected lam
inference InferenceResult Bootstrap or jackknife results (if se=True)

Methods

Method Description
summary() Formatted text summary
plot(kind, **kwargs) Matplotlib figure. Kinds: gap, status, factors, counterfactual
diagnose(...) Diagnostic tests (see below)

Inference

When se=True, result.inference carries bootstrap or jackknife SEs and CIs for the overall ATT and per-period effects, plus the full bootstrap distribution (att_avg_boot, att_on_boot).

Diagnostics

diag = result.diagnose(
    f_threshold=0.5,
    tost_threshold=0.36,
    placebo_period=(-5, -1),
    loo=True,
)
diag.summary()
Test Output
Pre-trend F-test f_stat, f_pval
Equivalence F-test equiv_f_pval
TOST tost_pvals
Placebo placebo_att, placebo_pval
Carryover carryover_att, carryover_pval
Leave-one-out loo_max_change

Validation against R fect

Synthetic DGP, N=200, T=50. Point estimates:

Scenario pyfector R fect Difference
FE 4.995583 4.995640 -0.000057
FE + X 4.975683 4.975809 -0.000126
IFE r=2 3.010223 3.013046 -0.002822
IFE r=2 + X 2.993155 2.996099 -0.002944
MC lambda=0.01 3.176671 3.176721 -0.000050

FE and MC agree to 4-6 decimal places. IFE differences reflect SVD rotation non-uniqueness.

Bootstrap SEs (500 reps):

Scenario pyfector R fect Ratio
FE 0.020011 0.021291 0.94
IFE r=2 0.017128 0.018382 0.93

Covariate coefficients agree to 6 decimal places. Comparison scripts are in benchmarks/.


Performance notes

Hot paths use vectorised numpy and a randomised truncated SVD for the interactive-FE update when r << min(T, N). Cross-validation and bootstrap replications parallelise over n_jobs via joblib. GPU execution is available through device="gpu" when CuPy is installed. See benchmarks/ for reproducible timing scripts.


Testing

uv run pytest tests/                    # full suite
uv run pytest tests/test_vs_r.py -v -s  # R validation (requires R + fect)

API reference


References

  • Liu, L., Wang, Y., & Xu, Y. (2024). "A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data." American Journal of Political Science, 68(1), 160-176.
  • Xu, Y. (2017). "Generalized Synthetic Control Method." Political Analysis, 25(1), 57-76.
  • Athey, S., Bayati, M., Doudchenko, N., Imbens, G., & Khosravi, K. (2021). "Matrix Completion Methods for Causal Panel Data Models." JASA, 116(536), 1716-1730.
  • Bai, J. (2009). "Panel Data Models with Interactive Fixed Effects." Econometrica, 77(4), 1229-1279.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfector-0.1.1.tar.gz (262.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyfector-0.1.1-py3-none-any.whl (42.8 kB view details)

Uploaded Python 3

File details

Details for the file pyfector-0.1.1.tar.gz.

File metadata

  • Download URL: pyfector-0.1.1.tar.gz
  • Upload date:
  • Size: 262.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyfector-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ff07b1d65815cc780f03d8e41e7ddcdf026e25db8ed7174c79940fd7ab0946f1
MD5 3b7c51ef0599ed30f6eea9a697158ca7
BLAKE2b-256 0c4f95aec611757ff9cca19b5347fed3324be0a67184f95cc88d70d97aa05359

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfector-0.1.1.tar.gz:

Publisher: publish.yml on AlanHuang99/pyfector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyfector-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pyfector-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 42.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyfector-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7d651546b1a96281f88cc16df83effa57bdcd0ed8253bd016a864e38ab127919
MD5 4d0122222afda8850b00b699f827d11b
BLAKE2b-256 f5b45cf2308f29eb78ba96d8c4da36b73822ae978258f9263a18460a0d561670

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyfector-0.1.1-py3-none-any.whl:

Publisher: publish.yml on AlanHuang99/pyfector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page