StataFlow: A Python econometrics toolkit aligned with Stata 17
Project description
StataFlow
StataFlow (stataflow) is a Python econometrics toolkit that reproduces Stata 17 estimation results with high precision. It provides both a Stata-compatible command layer (for researchers migrating from Stata) and a native Python estimator layer (for advanced users who want direct control).
What you can do today
- Run Stata-style commands in Python:
regress,reghdfe,ivregress 2sls,logit,ppmlhdfe,did_imputation,csdid,rdrobust, and more. - Obtain coefficients, standard errors, t/z-statistics, p-values, and confidence intervals that are field-level verified against Stata 17.
- Work with high-dimensional fixed effects (HDFE), IV/2SLS, binary/count models, and DID/event-study estimators.
- Use Stata-style factor-variable syntax (
i.group##c.post,c.x1#c.x2,x1##x2) and space-separated absorb strings directly in wrapper commands. Bare variables inside#/##are treated as continuous, matching common Stata usage.
What is not yet supported
- Multi-way clustering —
regresssupports two-way clustering (Cameron-Gelbach-Miller 2011); all other commands currently use single-cluster robust inference only. - Direct post-estimation on wrapper returns — the
compat.statawrappers returnResultSchemaresult objects.predictandmarginsare available on the core estimator layer only. - Full command surfaces for community commands —
reghdfe,ivreghdfe,ppmlhdfe,did_imputation,eventstudyinteract,csdid, andrdrobustare implemented as verified high-frequency subsets, not complete Stata command reproductions. Unsupported options are explicitly rejected rather than silently ignored.
Completeness legend
- Stable — synthetic + real-data dual-run verified; core API is unlikely to change.
- Alpha — high-frequency paths are implemented and verified, but the command surface is still a subset of the full Stata community command.
- Alpha — Partial — a verifiable implementation exists, but large functional areas are still missing (e.g., fuzzy RD for
rdrobust, weights beyondaweight).
See the Command Support Matrix for the per-command detailed status.
Installation
pip install StataFlow
Requirements: Python 3.10+, NumPy, pandas, SciPy.
For development (editable install from source):
git clone https://github.com/ZhenHaoFu810/StataFlow.git
cd StataFlow
pip install -e .
Quick start
Stata-compatible command layer (recommended)
All compat.stata wrappers return a ResultSchema object with coefficients, standard errors, and fit statistics. They do not expose .predict() or .margins() directly—use the core estimator layer below for post-estimation.
import pandas as pd
from stataflow.compat.stata import regress, reghdfe, ivregress_2sls, logit
# OLS with robust standard errors
result = regress(df, y="wage", x=["edu", "exper"], vce="robust")
# High-dimensional fixed effects (reghdfe)
result = reghdfe(
df, y="wage", x=["edu", "exper"],
absorb="firm_id year_id", vce="cluster", cluster="industry"
)
# Factor-variable syntax in HDFE
result = reghdfe(
df, y="wage", x=["i.industry##c.post"], absorb="firm_id year_id"
)
# 2SLS
result = ivregress_2sls(
df, y="lwage", x_exog=["edu"], x_endog=["exper"],
instruments=["age", "kidslt6"], vce="robust"
)
# Logit
result = logit(df, y="inlf", x=["nwifeinc", "educ", "exper"])
For runnable examples, see the examples/ directory:
examples/demo_regress.pyexamples/demo_reghdfe.pyexamples/demo_ppmlhdfe.pyexamples/demo_ivregress_2sls.py
Native Python estimator layer (advanced)
from stataflow import OLS, FixedEffectsOLS, AbsorbingOLS, Logit, IV2SLS
model = OLS(data=df, y="wage", x=["edu", "exper"])
result = model.fit(vce="robust")
Supported commands
| Command | Python entry | Core capabilities |
|---|---|---|
regress |
stataflow.compat.stata.regress |
OLS, robust, cluster, aweight |
xtreg, fe |
stataflow.compat.stata.xtreg_fe |
Fixed effects (within), cluster |
areg |
stataflow.compat.stata.areg |
Single absorb variable FE |
reghdfe |
stataflow.compat.stata.reghdfe |
1+ group HDFE, cluster, singleton drop |
ivregress 2sls |
stataflow.compat.stata.ivregress_2sls |
2SLS, robust, cluster |
ivreghdfe |
stataflow.compat.stata.ivreghdfe |
IV + 1+ group HDFE, cluster |
logit |
stataflow.compat.stata.logit |
MLE, robust, cluster |
probit |
stataflow.compat.stata.probit |
MLE, robust, cluster |
poisson |
stataflow.compat.stata.poisson |
MLE, robust, cluster |
ppmlhdfe |
stataflow.compat.stata.ppmlhdfe |
PPML + 1+ group HDFE |
did_imputation |
stataflow.compat.stata.did_imputation |
BJS DID imputation |
eventstudyinteract |
stataflow.compat.stata.eventstudyinteract |
Sun & Abraham IW estimator |
csdid |
stataflow.compat.stata.csdid |
Callaway-Sant'Anna DID (method="reg" only) |
rdrobust |
stataflow.compat.stata.rdrobust |
Sharp RD local polynomial (bwselect="mserd", covs) |
Full details: docs/command-support-matrix/README.md
Validation philosophy
Every public command is validated with two lines of evidence:
- Synthetic / controlled cases — formula, degrees of freedom, sample screening, edge cases.
- Real public datasets — field-level comparison against Stata 17 on openly available economic/financial data.
A command is considered "done" only when both lines pass and the source-to-Python mapping is documented. We do not accept "statistical equivalence" without explicit mathematical or source-code justification.
Public evidence and results are available in research/results/validation/.
Running tests
# Unit and integration tests (fast)
pytest tests/ -v --ignore=tests/golden/
# Golden dual-run tests (require Stata 17)
pytest tests/golden/ -v
Project structure
src/stataflow/estimators/— Core Python estimators (OLS,AbsorbingOLS,Logit,PPMLHDFE,DIDImputation, etc.)src/stataflow/compat/stata/— Stata command wrappers (regress(),reghdfe(),ivregress_2sls(), etc.)docs/command-support-matrix/— Per-command support matricesexamples/— Runnable demonstration scriptstests/— Unit and integration tests
Default target version
Stata 17
Documentation
Governance
- Codex — project goals, architecture, review gates, and statistical-dispute arbitration.
- Claude Code — implementation, testing, and evidence backfill.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stataflow-0.1.5.tar.gz.
File metadata
- Download URL: stataflow-0.1.5.tar.gz
- Upload date:
- Size: 80.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef5930dfabc528c4565595d4e59882407f9f81594527e70f6c0da7fc57555919
|
|
| MD5 |
91f246ddc6e3089f9f9e4cadb2e31613
|
|
| BLAKE2b-256 |
bfcc4147f29162d10396c168d801a73184778ceade02970cdeeb18d3edb3666c
|
File details
Details for the file stataflow-0.1.5-py3-none-any.whl.
File metadata
- Download URL: stataflow-0.1.5-py3-none-any.whl
- Upload date:
- Size: 80.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9bb009cc65edc5cb3a009dba122675d59afd6ab3cfbc03820e98e8117c81334
|
|
| MD5 |
97b4b15f996859a8daf5d9468b3bbc64
|
|
| BLAKE2b-256 |
94281118b85a49d88929538d08252bc980c4ca81cf40aeff9692a7b723168d2d
|