Skip to main content

Front-door CATE/ATE estimation toolkit with debiased learners.

Project description

FD-CATE

Front-door CATE/ATE estimation toolkit with paper-parity defaults for debiased front-door learners.

This repository keeps the original research scripts (FDCATE.py, analyze_fars_2000_fd.py) and adds a standard-library interface (fd_cate) with a stable artifact contract.

Install

python -m pip install -U pip
python -m pip install fd-cate

Default learner is xgb (XGBoost). nn is also supported via nuisance_learner="nn".

One-Click Quickstart

fdcate demo --outdir ./fdcate-demo

This single command runs:

  • synthetic data generation
  • model fit + artifact contract write
  • optional quick benchmark (enabled by default)

Expected files:

  • ./fdcate-demo/synthetic.csv
  • ./fdcate-demo/fit_out/summary.txt
  • ./fdcate-demo/fit_out/results.json
  • ./fdcate-demo/fit_out/diagnostics.json
  • ./fdcate-demo/fit_out/effects.csv
  • ./fdcate-demo/fit_out/model.pkl
  • ./fdcate-demo/benchmark_quick.json (unless --run-benchmark false)

Quickstart (Python API)

from fd_cate import FDCATE
from FDCATE import simulate_fd_data_md

# synthetic example
D = simulate_fd_data_md(n=500, d=10, seed=0)

est = FDCATE(method="fd-dr", nuisance_learner="xgb", random_state=0)
est.fit(D.C, D.Y, t=D.X, m=D.Z)

tau = est.effect(D.C)
print(est.ate_)
print(est.summary())

Quickstart (CLI)

# generate synthetic csv
fdcate synthetic --n 300 --d 8 --seed 42 --out synthetic.csv

# fit + write standard artifacts
fdcate fit \
  --data synthetic.csv \
  --outcome y --treat t --med m \
  --outdir out/

# diagnostics only
fdcate doctor \
  --data synthetic.csv \
  --outcome y --treat t --med m

Standard artifacts under out/:

  • summary.txt
  • results.json
  • diagnostics.json
  • effects.csv
  • model.pkl

Benchmark (Quick Profile + Golden Regression)

fd-cate now includes a deterministic quick benchmark profile for regression checks.

fdcate benchmark --n 120 --d 6 --seed 2026 --nuisance-learner xgb --out results/benchmark_quick.json

Multi-seed profile (recommended for robust comparisons):

fdcate benchmark \
  --profile multiseed \
  --n 120 --d 6 --seed 2026 --n-seeds 20 \
  --nuisance-learner xgb \
  --fd-r-g-solver direct \
  --fd-r-b-learner xgb \
  --out results/benchmark_multiseed.json

Output schema (fdcate.benchmark, schema_version=0) contains:

  • clean RMSE for fd-pi, fd-dr, fd-r
  • weak-overlap RMSE for fd-pi, fd-dr, fd-r
  • aggregate_mean_rmse across the two scenarios
  • with --profile multiseed: per_seed results + summary statistics (mean/std/min/max)

FD-R benchmarking knobs:

  • --fd-r-g-solver: direct or ratio
  • --fd-r-b-learner: xgb or nn
  • --no-fd-r-swap-average: disable swapped D1/D2 averaging

CI also runs a golden snapshot regression test:

  • tests/test_benchmark_golden.py
  • golden reference file: tests/benchmark_quick_reference.json

Live Demo (Toy + Benchmark)

Primary path (CLI one-click):

fdcate demo --outdir /tmp/fdcate_live_demo

Secondary path (legacy helper script):

bash scripts/run_demo_quick.sh

The demo writes:

  • /tmp/fdcate_live_demo/fit_out/summary.txt
  • /tmp/fdcate_live_demo/fit_out/results.json
  • /tmp/fdcate_live_demo/fit_out/diagnostics.json
  • /tmp/fdcate_live_demo/fit_out/effects.csv
  • /tmp/fdcate_live_demo/fit_out/model.pkl
  • /tmp/fdcate_live_demo/benchmark_quick.json

Manual one-liners:

fdcate synthetic --n 120 --d 6 --seed 2026 --out /tmp/fdcate_live_demo/synthetic.csv
fdcate fit --data /tmp/fdcate_live_demo/synthetic.csv --outcome y --treat t --med m --method fd-dr --nuisance-learner xgb --outdir /tmp/fdcate_live_demo/fit_out
fdcate benchmark --n 60 --d 4 --seed 17 --nuisance-learner xgb --out /tmp/fdcate_live_demo/benchmark_quick.json

Example terminal output preview (fdcate demo --outdir /tmp/fdcate_live_demo):

[demo] output directory: /tmp/fdcate_live_demo
[demo] ATE=0.540874
[demo] generated files:
 - /tmp/fdcate_live_demo/synthetic.csv
 - /tmp/fdcate_live_demo/fit_out/summary.txt
 - /tmp/fdcate_live_demo/fit_out/results.json
 - /tmp/fdcate_live_demo/fit_out/diagnostics.json
 - /tmp/fdcate_live_demo/fit_out/effects.csv
 - /tmp/fdcate_live_demo/fit_out/model.pkl
 - /tmp/fdcate_live_demo/benchmark_quick.json
[demo] next: fdcate effect --model /tmp/fdcate_live_demo/fit_out/model.pkl --data /tmp/fdcate_live_demo/synthetic.csv --out /tmp/fdcate_live_demo/effects_from_model.csv

Final benchmark figures (FD-R full-noise setting):

FD-CATE n-sweep at rho=2, d=30 (FD-R full-noise)

FD-CATE rho-sweep at n=2000, d=30 (FD-R full-noise)

Model Compatibility Policy (model.pkl)

model.pkl loading is allowed only when major.minor package versions match.

  • Example: model saved with 0.1.x can be loaded by 0.1.y.
  • Example: model saved with 0.1.x cannot be loaded by 0.2.x.

Scope (v0.1)

Supported:

  • binary treatment T ∈ {0,1}
  • binary mediator M ∈ {0,1}
  • numeric covariates
  • continuous or binary outcome (regression handling)

Not supported:

  • non-binary T/M
  • automatic categorical encoding pipelines

Legacy Reproduction Scripts

The original paper-focused scripts are preserved:

  • python FDCATE.py --help
  • python analyze_fars_2000_fd.py --help

Development

python -m pip install -e .[dev]
python -m pytest -q
python -m build

Nightly/manual slow tests are separated from PR fast gates:

python -m pytest -q -m "slow"

Release (v0.1.0)

bash scripts/release_preflight.sh

Detailed checklist: RELEASE_RUNBOOK.md

Troubleshooting

  1. fdcate: command not found
  • Re-open your shell after installation, or run with module form:
    • python -m fd_cate --help
  1. XGBoost import/runtime issue
  • Reinstall in a clean environment:
    • python -m pip install -U pip
    • python -m pip install --force-reinstall fd-cate
  1. Permission or write-path errors
  • Use a writable output directory explicitly:
    • fdcate demo --outdir /tmp/fdcate-demo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fd_cate-0.1.1.tar.gz (41.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fd_cate-0.1.1-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file fd_cate-0.1.1.tar.gz.

File metadata

  • Download URL: fd_cate-0.1.1.tar.gz
  • Upload date:
  • Size: 41.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fd_cate-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bd90248dc1bd8cc6e401d68c293d11cb9936e093ea758fc4e7e558cc48ef005f
MD5 c52b4518df918e4fdec99de3673cbd31
BLAKE2b-256 3ae1dc1b06d0eb75e034545002d82fdfe2be61793e1054cbdba72a9d17c3b98a

See more details on using hashes here.

Provenance

The following attestation bundles were made for fd_cate-0.1.1.tar.gz:

Publisher: release.yml on yonghanjung/FD-CATE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fd_cate-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fd_cate-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fd_cate-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 62d4a80e8a105d8351345f348e20d8134e7dc4770ea5e1ea5aaeb6e0c0276e51
MD5 1248a8e65fd939e310af2348580e3b8f
BLAKE2b-256 ef7bdcbd7d04b6ad937a5d22e1f29b88ed2843d2feeb9ac9595a32a58d22fae0

See more details on using hashes here.

Provenance

The following attestation bundles were made for fd_cate-0.1.1-py3-none-any.whl:

Publisher: release.yml on yonghanjung/FD-CATE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page