Skip to main content

Shared factor-model + analysis-pipeline framework with first-class jellycell integration

Project description

factor-factory

PyPI version Python versions Documentation Status CI License: MIT Ruff mypy: strict uv

A domain-agnostic factor-model + analysis-pipeline framework with a Protocol-based pluggable engine pattern, first-class jellycell integration, and the only production-grade Python implementations of Synthetic Difference-in-Differences (Arkhangelsky et al. 2021, AER) and the Four-way Mediation Decomposition (VanderWeele 2014, Epidemiology).

The same Panel shape hosts NYC-civic data, finance event studies, clinical trials, agronomic dose-response, chemistry assays, climate anomaly studies, education-intervention evaluations, energy-meter data, marketing A/B tests, macroeconomic country panels, ecological biodiversity surveys, and social-network diffusion cascades. Add a new domain by writing extractors; add a new method by writing a ~150-LOC engine adapter that fits the Protocol.


Install

# Default install — tidy layer + diagnostics + jellycell (no engines)
pip install factor-factory

# With specific engine families
pip install factor-factory[did,survival,event-study]

# Everything currently shipping
pip install factor-factory[all]

Supports Python 3.12+. Dependency manager of choice is uv; pip works the same way.

Quick start

Scaffold a showcase, run it, render tearsheets:

python -m factor_factory scaffold my-showcase
cd my-showcase
python notebooks/01_load.py

The scaffolded notebook builds a synthetic panel, runs a TWFE DiD via factor_factory.engines.did.estimate, saves a parallel-trends figure, and regenerates all five canonical manuscripts (METHODOLOGY.md, DIAGNOSTICS_CHECKLIST.md, FINDINGS.md, MANUSCRIPT.md, AUDIT.md).

The canonical pattern inside a notebook:

from datetime import date
from factor_factory.tidy import Panel, TreatmentEvent
from factor_factory.engines.did import estimate as did_estimate

panel = Panel.from_records(
    records,
    dimension="community_district",
    freq="ME",
    treatment_events=(TreatmentEvent(
        name="rat_pilot",
        treated_units=("MN-01", "MN-02"),
        treatment_date=date(2024, 6, 1),
        dimension="community_district",
    ),),
    outcome_col="complaint_count",
)

# Multi-engine DiD in one call — TWFE + Callaway-Sant'Anna side-by-side
results = did_estimate(panel, methods=("twfe", "cs"), cluster="unit_id")
print(results.summary_table())

See the getting-started guide for cross-domain examples (finance event study, multi-arm RCT, agronomic dose-response, chemistry IC₅₀ assay, etc.).


Architecture

raw records
  ↓ tidying              factor_factory.tidy         Panel + TreatmentEvent + Provenance + RecordView
tidied panel
  ↓ diagnostics          factor_factory.diagnostics  SMD, parallel-trends, residuals, balance
diagnostic-annotated panel
  ↓ modeling             factor_factory.engines      17 engine families (see below)
modeling results
  ↓ reporting            factor_factory.jellycell    5 tearsheet renderers + scaffold CLI
                         factor_factory.reporting    Quarto (.qmd) alternative

Every engine family follows the same shape: a frozen Result dataclass + an Engine Protocol + a registry-backed estimate() dispatcher. Adapters wrap external packages (linearmodels, differences, lifelines, rdrobust, pysyncon, econml, DoubleML, ruptures, sktime, pyfixest, esda, tick, ndlib, …) or roll their own math when no canonical package exists.

See the design contracts for the data contract and the reference architecture page for the full engine-family contract.


Shipping engine families

Install the extras you need; unlisted adapters fail-fast with a crisp ImportError pointing at the right pip install factor-factory[<family>].

Family Adapters Extra Canonical citation
DiD twfe, callaway_santanna, sun_abraham, borusyak_jaravel_spiess [did] Goodman-Bacon 2021 / Callaway-Sant'Anna 2021 / Sun-Abraham 2021 / Borusyak et al. 2024
Survival kaplan_meier, cox_ph (+ strata=) [survival] Kaplan-Meier 1958 / Cox 1972
Event Study market_adjusted, fama_french (FF3/FF5/Carhart-4) [event-study] MacKinlay 1997 / Fama-French 1993 & 2015
Synthetic DiD sdid — jackknife + placebo inference (built-in) Arkhangelsky et al. 2021 (AER) — Python-ecosystem gap closed
Mediation four_way — CDE / INTref / INTmed / PIE + sensitivity() (built-in) VanderWeele 2014 (Epidemiology) — Python-ecosystem gap closed
RDD rd_robust (sharp + fuzzy) [rdd] Calonico-Cattaneo-Titiunik 2014
SCM pysyncon, augmented, matrix_completion [scm] Abadie et al. 2010 / Ben-Michael et al. 2021 / Athey et al. 2021
Heterogeneous TE causal_forest, bcf [het-te] Wager-Athey 2018 / Hahn-Murray-Carvalho 2020
DoubleML plr [dml] Chernozhukov et al. 2018
Changepoint ruptures (Pelt / BinSeg / Window) [changepoint] Truong-Oudre-Vayatis 2020
STL sktime_stl [stl] Cleveland et al. 1990
Panel regression pyfixest (HDFE) [panel-reg] Correia 2016
Spatial morans_i [spatial] Moran 1950 / Anselin 1995
Inequality theil_t (+ between/within decomposition) (built-in) Theil 1967
Reporting bias latent_em (two-class EM) (built-in) Dempster-Laird-Rubin 1977
Hawkes tick [hawkes] Hawkes 1971 / Bacry et al. 2013
Climate mann_kendall (+ Sen's slope) (built-in) Mann 1945 / Kendall 1948
Diffusion ndlib_sir [diffusion]

17 engine families / 30+ adapters. Use the /engine-status Claude Code slash-command (or inspect factor_factory.engines.<family>.registry) to see the live state.


Python-ecosystem gaps closed

Two methods the Python ecosystem was missing entirely (canonical R packages, no maintained Python equivalent). Both shipped as first-class engines with the canonical paper + reference R-package linked in the engine docstring.

engines.sdid — Synthetic Difference-in-Differences

Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021). Synthetic Difference-in-Differences. American Economic Review, 111(12), 4088–4118. doi:10.1257/aer.20190159

Reference R implementation: synthdid

SDID combines unit weights (synthetic-control style) with time weights and a weighted DiD estimator. It's a 2021 advance that addresses the parallel-trends fragility of vanilla DiD when units have heterogeneous trends. The R synthdid package is canonical; partial Python ports like pysdid are lightly maintained.

Our adapter:

  • Solves the unit- and time-weights QPs via scipy.optimize.minimize (SLSQP), no cvxpy dependency
  • Uses the regularization ζ = (N_tr · T_post)^(1/4) · σ̂ from AER §3.3
  • Computes the closed-form weighted-DiD ATT for binary block treatment
  • Supports both jackknife (AER §3.4 default) and placebo inference (preferred for single-treated-unit panels)
  • Returns unit weights + time weights so analysts can interrogate the synthetic control

Validated against a known-ATT=4.0 fixture: recovers ATT=4.535 (SE=0.245).

engines.mediation.FourWayMediationEngine — VanderWeele's four-way decomposition

VanderWeele, T. J. (2014). A unification of mediation and interaction: A four-way decomposition. Epidemiology, 25(5), 749–761. doi:10.1097/EDE.0000000000000121

Reference R implementation: CMAverse

Decomposes a treatment's total effect into:

  • CDE (Controlled Direct Effect)
  • INTref (Reference Interaction)
  • INTmed (Mediated Interaction)
  • PIE (Pure Indirect Effect)

statsmodels.stats.mediation only provides the simpler Imai-Keele-Tingley two-component decomposition (NDE / NIE). The mediation package on PyPI is stale. Our adapter ports the linear-linear case from the Epidemiology paper directly with bootstrap inference (1000 resamples by default), and adds an unobserved-confounding sensitivity analysis (.sensitivity(rho_range, n_points)) ported from CMAverse's rho-test.

Validates against a fixture with known components — recovers all four within 1 SE:

Component True Estimated SE
CDE 2.00 2.004 0.087
PIE 1.50 1.514 0.070
INTmed 0.45 0.397 0.085
INTref 0.15 0.137 0.030

Domain coverage

Cross-domain conformance fixtures exercise the Panel contract across data shapes from NYC-civic to chemistry. See the supported-domains page for the full matrix.

Domain Fixture Engines that fit
NYC-civic / public policy staggered_did_panel DiD (twfe, cs, sa, bjs, sdid)
Finance event study finance_event_study_panel DiD twfe, Event Study (market_adjusted, fama_french)
Population health — longitudinal rct_longitudinal_panel DiD per-arm
Population health — survival survival_oncology_panel Survival (kaplan_meier, cox_ph, stratified)
Population health — mediation mediation_panel Mediation four_way
Agriculture / dose-response agronomic_dose_response_panel DiD twfe (continuous treatment)
Chemistry / pharmacology chem_assay_panel Analyst-fit dose-response
Climate anomaly climate_anomaly_panel DiD, Climate (mann_kendall)
Education / value-added education_value_added_panel DiD, Mediation
Energy / utilities energy_consumption_panel DiD, STL
Marketing / A-B testing marketing_uplift_panel Per-arm TWFE, Mediation, Het-TE (causal_forest)
Macroeconomics macroeconomic_country_panel DiD, SDID, Panel regression (HDFE)
Ecology / conservation ecology_biodiversity_panel DiD, Spatial (morans_i)
Network / social diffusion network_diffusion_panel Diffusion (ndlib_sir)
Multi-state policy block sdid_block_treatment_panel DiD twfe, SDID (the headline use-case)
Test-score cutoff rdd_sharp_cutoff_panel RDD rd_robust
Single treated state scm_single_treated_state_panel SCM (augmented, matrix_completion)

GWAS / biobank-scale genetics is deliberately out of scope — scale, file formats, and inference shape all mismatch. Use hail, pysnptools, PLINK 2.0, or BOLT-LMM instead. Full rationale on the supported-domains page.


Documentation

Full docs at factor-factory.readthedocs.io (Sphinx + Furo + autodoc2).

Page Purpose
Getting started Install, scaffold, build a Panel, run estimators, render manuscripts
Cookbook Per-adapter worked examples (DiD, Survival, Event Study, SDID, Mediation, RDD, SCM)
Supported domains Domain matrix + extension patterns + GWAS-exclusion rationale
Design contracts The canonical Panel data-shape contract
Jellycell integration Cell conventions + tearsheet renderers
Reference / architecture 6-layer pipeline + dependency order
Reference / contracts Locked Panel / Engine Protocol / Tearsheet JSON snapshots
Reference / piggyback-map Which upstream packages each adapter wraps
Migration v0 → v1 Upgrade guide for downstream adopters
Contributing Dev setup + contract ceremony + PR checklist

Contributing

PRs welcome — especially new engine families. Factor-factory is an adapter-first framework: before writing engine math from scratch, consult the piggyback map. See CONTRIBUTING.md for the full workflow.

Claude Code users get slash-commands for common operations:

/engine-status    # 17-family status report
/add-engine <family>   # scaffold a new engine family end-to-end
/contract-check   # audit a diff against the three contract invariants
/bump [patch|minor|major]   # bump version + roll CHANGELOG
/release-check    # preflight before a tag push

Citing

If you use factor-factory in academic work, please cite:

  • The engine-specific canonical paper(s) — each adapter's docstring carries the DOI + reference R-package URL.
  • This software record — via CITATION.cff (Zenodo-compatible).

License

MIT. See LICENSE. Same license as sibling random-walks packages (jellycell, nyc311, nyc-geo-toolkit).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

factor_factory-1.0.0.tar.gz (454.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

factor_factory-1.0.0-py3-none-any.whl (195.7 kB view details)

Uploaded Python 3

File details

Details for the file factor_factory-1.0.0.tar.gz.

File metadata

  • Download URL: factor_factory-1.0.0.tar.gz
  • Upload date:
  • Size: 454.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for factor_factory-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f7ef8dd1da7dfb153b57170b40d3f2578a9fba3b1516ba9f219d92732f3da139
MD5 7dfc4e941b6eab428a97fe1dcd72510e
BLAKE2b-256 aa18e0b420ca38828f81d58372bbd0a0913aaff8ef87c46dffac6d1cbddde50d

See more details on using hashes here.

Provenance

The following attestation bundles were made for factor_factory-1.0.0.tar.gz:

Publisher: release.yml on random-walks/factor-factory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file factor_factory-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: factor_factory-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 195.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for factor_factory-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 399ce74645369b5e6b88ad36fc4b96bc8cf5c9752723506f75d03398cdfe264d
MD5 de473a34899665b2abfc42e20de50b09
BLAKE2b-256 30bcdcedbc144b3603531dceb28d0e397881e1ca19ed961206bae28bac75295a

See more details on using hashes here.

Provenance

The following attestation bundles were made for factor_factory-1.0.0-py3-none-any.whl:

Publisher: release.yml on random-walks/factor-factory

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page