Shared factor-model + analysis-pipeline framework with first-class jellycell integration
Project description
factor-factory
A domain-agnostic factor-model + analysis-pipeline framework with a Protocol-based pluggable engine pattern, first-class jellycell integration, and the only production-grade Python implementations of Synthetic Difference-in-Differences (Arkhangelsky et al. 2021, AER) and the Four-way Mediation Decomposition (VanderWeele 2014, Epidemiology).
Authored by Blaise Albis-Burdige.
The same Panel shape hosts NYC-civic data, finance event studies, clinical trials, agronomic dose-response, chemistry assays, climate anomaly studies, education-intervention evaluations, energy-meter data, marketing A/B tests, macroeconomic country panels, ecological biodiversity surveys, and social-network diffusion cascades. Add a new domain by writing extractors; add a new method by writing a ~150-LOC engine adapter that fits the Protocol.
Install
# Default install — tidy layer + diagnostics + jellycell (no engines)
pip install factor-factory
# With specific engine families
pip install factor-factory[did,survival,event-study]
# Everything currently shipping
pip install factor-factory[all]
Supports Python 3.12+. Dependency manager of choice is uv; pip works the same way.
Quick start
Scaffold a showcase, run it, render tearsheets:
python -m factor_factory scaffold my-showcase
cd my-showcase
python notebooks/01_load.py
The scaffolded notebook builds a synthetic panel, runs a TWFE DiD via factor_factory.engines.did.estimate, saves a parallel-trends figure, and regenerates all five canonical manuscripts (METHODOLOGY.md, DIAGNOSTICS_CHECKLIST.md, FINDINGS.md, MANUSCRIPT.md, AUDIT.md).
The canonical pattern inside a notebook:
from datetime import date
from factor_factory.tidy import Panel, TreatmentEvent
from factor_factory.engines.did import estimate as did_estimate
panel = Panel.from_records(
records,
dimension="community_district",
freq="ME",
treatment_events=(TreatmentEvent(
name="rat_pilot",
treated_units=("MN-01", "MN-02"),
treatment_date=date(2024, 6, 1),
dimension="community_district",
),),
outcome_col="complaint_count",
)
# Multi-engine DiD in one call — TWFE + Callaway-Sant'Anna side-by-side
results = did_estimate(panel, methods=("twfe", "cs"), cluster="unit_id")
print(results.summary_table())
See the getting-started guide for cross-domain examples (finance event study, multi-arm RCT, agronomic dose-response, chemistry IC₅₀ assay, etc.).
Architecture
raw records
↓ tidying factor_factory.tidy Panel + TreatmentEvent + Provenance + RecordView
tidied panel
↓ diagnostics factor_factory.diagnostics SMD, parallel-trends, residuals, balance
diagnostic-annotated panel
↓ modeling factor_factory.engines 17 engine families (see below)
modeling results
↓ reporting factor_factory.jellycell 5 tearsheet renderers + scaffold CLI
factor_factory.reporting Quarto (.qmd) alternative
Every engine family follows the same shape: a frozen Result dataclass + an Engine Protocol + a registry-backed estimate() dispatcher. Adapters wrap external packages (linearmodels, differences, lifelines, rdrobust, pysyncon, econml, DoubleML, ruptures, sktime, pyfixest, esda, tick, ndlib, …) or roll their own math when no canonical package exists.
See the design contracts for the data contract and the reference architecture page for the full engine-family contract.
Shipping engine families
Install the extras you need; unlisted adapters fail-fast with a crisp ImportError pointing at the right pip install factor-factory[<family>].
| Family | Adapters | Extra | Canonical citation |
|---|---|---|---|
| DiD | twfe, callaway_santanna, sun_abraham, borusyak_jaravel_spiess |
[did] |
Goodman-Bacon 2021 / Callaway-Sant'Anna 2021 / Sun-Abraham 2021 / Borusyak et al. 2024 |
| Survival | kaplan_meier, cox_ph (+ strata=) |
[survival] |
Kaplan-Meier 1958 / Cox 1972 |
| Event Study | market_adjusted, fama_french (FF3/FF5/Carhart-4) |
[event-study] |
MacKinlay 1997 / Fama-French 1993 & 2015 |
| Synthetic DiD | sdid — jackknife + placebo inference |
(built-in) | Arkhangelsky et al. 2021 (AER) — Python-ecosystem gap closed |
| Mediation | four_way — CDE / INTref / INTmed / PIE + sensitivity() |
(built-in) | VanderWeele 2014 (Epidemiology) — Python-ecosystem gap closed |
| RDD | rd_robust (sharp + fuzzy) |
[rdd] |
Calonico-Cattaneo-Titiunik 2014 |
| SCM | pysyncon, augmented, matrix_completion |
[scm] |
Abadie et al. 2010 / Ben-Michael et al. 2021 / Athey et al. 2021 |
| Heterogeneous TE | causal_forest, bcf |
[het-te] |
Wager-Athey 2018 / Hahn-Murray-Carvalho 2020 |
| DoubleML | plr |
[dml] |
Chernozhukov et al. 2018 |
| Changepoint | ruptures (Pelt / BinSeg / Window) |
[changepoint] |
Truong-Oudre-Vayatis 2020 |
| STL | sktime_stl |
[stl] |
Cleveland et al. 1990 |
| Panel regression | pyfixest (HDFE) |
[panel-reg] |
Correia 2016 |
| Spatial | morans_i |
[spatial] |
Moran 1950 / Anselin 1995 |
| Inequality | theil_t (+ between/within decomposition) |
(built-in) | Theil 1967 |
| Reporting bias | latent_em (two-class EM) |
(built-in) | Dempster-Laird-Rubin 1977 |
| Hawkes | tick |
[hawkes] |
Hawkes 1971 / Bacry et al. 2013 |
| Climate | mann_kendall (+ Sen's slope) |
(built-in) | Mann 1945 / Kendall 1948 |
| Diffusion | ndlib_sir |
[diffusion] |
— |
17 engine families / 30+ adapters. Use the /engine-status Claude Code slash-command (or inspect factor_factory.engines.<family>.registry) to see the live state.
Python-ecosystem gaps closed
Two methods the Python ecosystem was missing entirely (canonical R packages, no maintained Python equivalent). Both shipped as first-class engines with the canonical paper + reference R-package linked in the engine docstring.
engines.sdid — Synthetic Difference-in-Differences
Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088–4118. doi:10.1257/aer.20190159
Reference R implementation: synthdid
SDID combines unit weights (synthetic-control style) with time weights and a weighted DiD estimator. It's a 2021 advance that addresses the parallel-trends fragility of vanilla DiD when units have heterogeneous trends. The R synthdid package is canonical; partial Python ports like pysdid are lightly maintained.
Our adapter:
- Solves the unit- and time-weights QPs via
scipy.optimize.minimize(SLSQP), nocvxpydependency - Uses the regularization
ζ = (N_tr · T_post)^(1/4) · σ̂from AER §3.3 - Computes the closed-form weighted-DiD ATT for binary block treatment
- Supports both jackknife (AER §3.4 default) and placebo inference (preferred for single-treated-unit panels)
- Returns unit weights + time weights so analysts can interrogate the synthetic control
Validated against a known-ATT=4.0 fixture: recovers ATT=4.535 (SE=0.245).
engines.mediation.FourWayMediationEngine — VanderWeele's four-way decomposition
VanderWeele, T. J. (2014). A unification of mediation and interaction: A four-way decomposition. Epidemiology, 25(5), 749–761. doi:10.1097/EDE.0000000000000121
Reference R implementation: CMAverse
Decomposes a treatment's total effect into:
- CDE (Controlled Direct Effect)
- INTref (Reference Interaction)
- INTmed (Mediated Interaction)
- PIE (Pure Indirect Effect)
statsmodels.stats.mediation only provides the simpler Imai-Keele-Tingley two-component decomposition (NDE / NIE). The mediation package on PyPI is stale. Our adapter ports the linear-linear case from the Epidemiology paper directly with bootstrap inference (1000 resamples by default), and adds an unobserved-confounding sensitivity analysis (.sensitivity(rho_range, n_points)) ported from CMAverse's rho-test.
Validates against a fixture with known components — recovers all four within 1 SE:
| Component | True | Estimated | SE |
|---|---|---|---|
| CDE | 2.00 | 2.004 | 0.087 |
| PIE | 1.50 | 1.514 | 0.070 |
| INTmed | 0.45 | 0.397 | 0.085 |
| INTref | 0.15 | 0.137 | 0.030 |
Domain coverage
Cross-domain conformance fixtures exercise the Panel contract across data shapes from NYC-civic to chemistry. See the supported-domains page for the full matrix.
| Domain | Fixture | Engines that fit |
|---|---|---|
| NYC-civic / public policy | staggered_did_panel |
DiD (twfe, cs, sa, bjs, sdid) |
| Finance event study | finance_event_study_panel |
DiD twfe, Event Study (market_adjusted, fama_french) |
| Population health — longitudinal | rct_longitudinal_panel |
DiD per-arm |
| Population health — survival | survival_oncology_panel |
Survival (kaplan_meier, cox_ph, stratified) |
| Population health — mediation | mediation_panel |
Mediation four_way |
| Agriculture / dose-response | agronomic_dose_response_panel |
DiD twfe (continuous treatment) |
| Chemistry / pharmacology | chem_assay_panel |
Analyst-fit dose-response |
| Climate anomaly | climate_anomaly_panel |
DiD, Climate (mann_kendall) |
| Education / value-added | education_value_added_panel |
DiD, Mediation |
| Energy / utilities | energy_consumption_panel |
DiD, STL |
| Marketing / A-B testing | marketing_uplift_panel |
Per-arm TWFE, Mediation, Het-TE (causal_forest) |
| Macroeconomics | macroeconomic_country_panel |
DiD, SDID, Panel regression (HDFE) |
| Ecology / conservation | ecology_biodiversity_panel |
DiD, Spatial (morans_i) |
| Network / social diffusion | network_diffusion_panel |
Diffusion (ndlib_sir) |
| Multi-state policy block | sdid_block_treatment_panel |
DiD twfe, SDID (the headline use-case) |
| Test-score cutoff | rdd_sharp_cutoff_panel |
RDD rd_robust |
| Single treated state | scm_single_treated_state_panel |
SCM (augmented, matrix_completion) |
GWAS / biobank-scale genetics is deliberately out of scope — scale, file formats, and inference shape all mismatch. Use hail, pysnptools, PLINK 2.0, or BOLT-LMM instead. Full rationale on the supported-domains page.
Documentation
Full docs at factor-factory.readthedocs.io (Sphinx + Furo + autodoc2).
| Page | Purpose |
|---|---|
| Getting started | Install, scaffold, build a Panel, run estimators, render manuscripts |
| Cookbook | Per-adapter worked examples (DiD, Survival, Event Study, SDID, Mediation, RDD, SCM) |
| Supported domains | Domain matrix + extension patterns + GWAS-exclusion rationale |
| Design contracts | The canonical Panel data-shape contract |
| Jellycell integration | Cell conventions + tearsheet renderers |
| Reference / architecture | 6-layer pipeline + dependency order |
| Reference / contracts | Locked Panel / Engine Protocol / Tearsheet JSON snapshots |
| Reference / piggyback-map | Which upstream packages each adapter wraps |
| Migration v0 → v1 | Upgrade guide for downstream adopters |
| Contributing | Dev setup + contract ceremony + PR checklist |
Contributing
PRs welcome — especially new engine families. Factor-factory is an adapter-first framework: before writing engine math from scratch, consult the piggyback map. See CONTRIBUTING.md for the full workflow.
Claude Code users get slash-commands for common operations:
/engine-status # 17-family status report
/add-engine <family> # scaffold a new engine family end-to-end
/contract-check # audit a diff against the three contract invariants
/bump [patch|minor|major] # bump version + roll CHANGELOG
/release-check # preflight before a tag push
Citing
If you use factor-factory in academic work, please cite:
- The engine-specific canonical paper(s) — each adapter's docstring carries the DOI + reference R-package URL.
- This software record — via CITATION.cff (Zenodo-compatible).
License
MIT. See LICENSE. Same license as sibling random-walks packages (jellycell, nyc311, nyc-geo-toolkit).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file factor_factory-1.0.3.tar.gz.
File metadata
- Download URL: factor_factory-1.0.3.tar.gz
- Upload date:
- Size: 460.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3e539b2a2da566746e51425dfe5e2fdb4c015937c360b66dee529ba9b079dc1
|
|
| MD5 |
c08500f8238f88c11324189bc697770d
|
|
| BLAKE2b-256 |
7491f2a02334ca6c7e02f1d399758451dc9a3d9715494c9c16d4fb7bf9e96774
|
Provenance
The following attestation bundles were made for factor_factory-1.0.3.tar.gz:
Publisher:
release.yml on random-walks/factor-factory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
factor_factory-1.0.3.tar.gz -
Subject digest:
a3e539b2a2da566746e51425dfe5e2fdb4c015937c360b66dee529ba9b079dc1 - Sigstore transparency entry: 1351166606
- Sigstore integration time:
-
Permalink:
random-walks/factor-factory@782cba514e84eaa00c4d9d74984758fe34d8dadf -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/random-walks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@782cba514e84eaa00c4d9d74984758fe34d8dadf -
Trigger Event:
push
-
Statement type:
File details
Details for the file factor_factory-1.0.3-py3-none-any.whl.
File metadata
- Download URL: factor_factory-1.0.3-py3-none-any.whl
- Upload date:
- Size: 195.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b88827e5c7ba84035ce6f6d70050fd9bb35ea5268a81f48bb58213286e131ccb
|
|
| MD5 |
fe6e4fad34dc82233277c2a846318035
|
|
| BLAKE2b-256 |
8b24a3272b56d8e17906509d98e933e98a60a5ccf222f4742951c4ad945e3e89
|
Provenance
The following attestation bundles were made for factor_factory-1.0.3-py3-none-any.whl:
Publisher:
release.yml on random-walks/factor-factory
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
factor_factory-1.0.3-py3-none-any.whl -
Subject digest:
b88827e5c7ba84035ce6f6d70050fd9bb35ea5268a81f48bb58213286e131ccb - Sigstore transparency entry: 1351166727
- Sigstore integration time:
-
Permalink:
random-walks/factor-factory@782cba514e84eaa00c4d9d74984758fe34d8dadf -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/random-walks
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@782cba514e84eaa00c4d9d74984758fe34d8dadf -
Trigger Event:
push
-
Statement type: