Skip to main content

Front-door CATE/ATE estimation toolkit with debiased learners.

Project description

FD-CATE

Front-door CATE/ATE estimation toolkit with paper-parity defaults for debiased front-door learners.

This repository keeps the original research scripts (FDCATE.py, analyze_fars_2000_fd.py) and adds a standard-library interface (fd_cate) with a stable artifact contract.

Install

python -m pip install -U pip
python -m pip install fd-cate

Default learner is xgb (XGBoost). nn is also supported via nuisance_learner="nn".

One-Click Quickstart (딸깍 1번)

fdcate demo --outdir ./fdcate-demo

This single command runs:

  • synthetic data generation
  • model fit + artifact contract write
  • optional quick benchmark (enabled by default)

Expected files:

  • ./fdcate-demo/synthetic.csv
  • ./fdcate-demo/fit_out/summary.txt
  • ./fdcate-demo/fit_out/results.json
  • ./fdcate-demo/fit_out/diagnostics.json
  • ./fdcate-demo/fit_out/effects.csv
  • ./fdcate-demo/fit_out/model.pkl
  • ./fdcate-demo/benchmark_quick.json (unless --run-benchmark false)

Quickstart (Python API)

from fd_cate import FDCATE
from FDCATE import simulate_fd_data_md

# synthetic example
D = simulate_fd_data_md(n=500, d=10, seed=0)

est = FDCATE(method="fd-dr", nuisance_learner="xgb", random_state=0)
est.fit(D.C, D.Y, t=D.X, m=D.Z)

tau = est.effect(D.C)
print(est.ate_)
print(est.summary())

Quickstart (CLI)

# generate synthetic csv
fdcate synthetic --n 300 --d 8 --seed 42 --out synthetic.csv

# fit + write standard artifacts
fdcate fit \
  --data synthetic.csv \
  --outcome y --treat t --med m \
  --outdir out/

# diagnostics only
fdcate doctor \
  --data synthetic.csv \
  --outcome y --treat t --med m

Standard artifacts under out/:

  • summary.txt
  • results.json
  • diagnostics.json
  • effects.csv
  • model.pkl

Benchmark (Quick Profile + Golden Regression)

fd-cate now includes a deterministic quick benchmark profile for regression checks.

fdcate benchmark --n 120 --d 6 --seed 2026 --nuisance-learner xgb --out results/benchmark_quick.json

Multi-seed profile (recommended for robust comparisons):

fdcate benchmark \
  --profile multiseed \
  --n 120 --d 6 --seed 2026 --n-seeds 20 \
  --nuisance-learner xgb \
  --fd-r-g-solver direct \
  --fd-r-b-learner xgb \
  --out results/benchmark_multiseed.json

Output schema (fdcate.benchmark, schema_version=0) contains:

  • clean RMSE for fd-pi, fd-dr, fd-r
  • weak-overlap RMSE for fd-pi, fd-dr, fd-r
  • aggregate_mean_rmse across the two scenarios
  • with --profile multiseed: per_seed results + summary statistics (mean/std/min/max)

FD-R benchmarking knobs:

  • --fd-r-g-solver: direct or ratio
  • --fd-r-b-learner: xgb or nn
  • --no-fd-r-swap-average: disable swapped D1/D2 averaging

CI also runs a golden snapshot regression test:

  • tests/test_benchmark_golden.py
  • golden reference file: tests/benchmark_quick_reference.json

Live Demo (Toy + Benchmark)

Primary path (CLI one-click):

fdcate demo --outdir /tmp/fdcate_live_demo

Secondary path (legacy helper script):

bash scripts/run_demo_quick.sh

The demo writes:

  • /tmp/fdcate_live_demo/fit_out/summary.txt
  • /tmp/fdcate_live_demo/fit_out/results.json
  • /tmp/fdcate_live_demo/fit_out/diagnostics.json
  • /tmp/fdcate_live_demo/fit_out/effects.csv
  • /tmp/fdcate_live_demo/fit_out/model.pkl
  • /tmp/fdcate_live_demo/benchmark_quick.json

Manual one-liners:

fdcate synthetic --n 120 --d 6 --seed 2026 --out /tmp/fdcate_live_demo/synthetic.csv
fdcate fit --data /tmp/fdcate_live_demo/synthetic.csv --outcome y --treat t --med m --method fd-dr --nuisance-learner xgb --outdir /tmp/fdcate_live_demo/fit_out
fdcate benchmark --n 60 --d 4 --seed 17 --nuisance-learner xgb --out /tmp/fdcate_live_demo/benchmark_quick.json

Example terminal output preview (fdcate demo --outdir /tmp/fdcate_live_demo):

[demo] output directory: /tmp/fdcate_live_demo
[demo] ATE=0.540874
[demo] generated files:
 - /tmp/fdcate_live_demo/synthetic.csv
 - /tmp/fdcate_live_demo/fit_out/summary.txt
 - /tmp/fdcate_live_demo/fit_out/results.json
 - /tmp/fdcate_live_demo/fit_out/diagnostics.json
 - /tmp/fdcate_live_demo/fit_out/effects.csv
 - /tmp/fdcate_live_demo/fit_out/model.pkl
 - /tmp/fdcate_live_demo/benchmark_quick.json
[demo] next: fdcate effect --model /tmp/fdcate_live_demo/fit_out/model.pkl --data /tmp/fdcate_live_demo/synthetic.csv --out /tmp/fdcate_live_demo/effects_from_model.csv

Final benchmark figures (FD-R full-noise setting):

FD-CATE n-sweep at rho=2, d=30 (FD-R full-noise)

FD-CATE rho-sweep at n=2000, d=30 (FD-R full-noise)

Model Compatibility Policy (model.pkl)

model.pkl loading is allowed only when major.minor package versions match.

  • Example: model saved with 0.1.x can be loaded by 0.1.y.
  • Example: model saved with 0.1.x cannot be loaded by 0.2.x.

Scope (v0.1)

Supported:

  • binary treatment T ∈ {0,1}
  • binary mediator M ∈ {0,1}
  • numeric covariates
  • continuous or binary outcome (regression handling)

Not supported:

  • non-binary T/M
  • automatic categorical encoding pipelines

Legacy Reproduction Scripts

The original paper-focused scripts are preserved:

  • python FDCATE.py --help
  • python analyze_fars_2000_fd.py --help

Development

python -m pip install -e .[dev]
python -m pytest -q
python -m build

Nightly/manual slow tests are separated from PR fast gates:

python -m pytest -q -m "slow"

Release (v0.1.0)

bash scripts/release_preflight.sh

Detailed checklist: RELEASE_RUNBOOK.md

Troubleshooting

  1. fdcate: command not found
  • Re-open your shell after installation, or run with module form:
    • python -m fd_cate --help
  1. XGBoost import/runtime issue
  • Reinstall in a clean environment:
    • python -m pip install -U pip
    • python -m pip install --force-reinstall fd-cate
  1. Permission or write-path errors
  • Use a writable output directory explicitly:
    • fdcate demo --outdir /tmp/fdcate-demo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fd_cate-0.1.0.tar.gz (41.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fd_cate-0.1.0-py3-none-any.whl (39.7 kB view details)

Uploaded Python 3

File details

Details for the file fd_cate-0.1.0.tar.gz.

File metadata

  • Download URL: fd_cate-0.1.0.tar.gz
  • Upload date:
  • Size: 41.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fd_cate-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4cc5d5caeba46822e8e970b22e88135fd695c8f7d09e6607fb894744b9bcadd8
MD5 e465c0329bc75a20bbea78bc589c0c47
BLAKE2b-256 3697e1b6d0a552955ad08fe0d62b59c4b2ac8a5e0f9b376ef25f3731b704972d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fd_cate-0.1.0.tar.gz:

Publisher: release.yml on yonghanjung/FD-CATE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fd_cate-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fd_cate-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 39.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fd_cate-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8473cd2a662b3ea5f99a2d530d6891e8b9bb44bcbca611c496116fde707d725
MD5 b6e742341837ae55ce889e4721c780b1
BLAKE2b-256 42091c576a6f63a6ee2d9e90db4a2d0ad0610db3b864ed7b9b358d11e5f4d999

See more details on using hashes here.

Provenance

The following attestation bundles were made for fd_cate-0.1.0-py3-none-any.whl:

Publisher: release.yml on yonghanjung/FD-CATE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page