Skip to main content

StataFlow: A Python econometrics toolkit aligned with Stata 17

Project description

StataFlow

StataFlow (stataflow) is a Python econometrics toolkit that reproduces Stata 17 estimation results with high precision. It provides both a Stata-compatible command layer (for researchers migrating from Stata) and a native Python estimator layer (for advanced users who want direct control).

What you can do today

  • Run Stata-style commands in Python: regress, reghdfe, ivregress 2sls, logit, ppmlhdfe, did_imputation, csdid, rdrobust, and more.
  • Obtain coefficients, standard errors, t/z-statistics, p-values, and confidence intervals that are field-level verified against Stata 17.
  • Work with high-dimensional fixed effects (HDFE), IV/2SLS, binary/count models, and DID/event-study estimators.
  • Use Stata-style factor-variable syntax (i.group##c.post, c.x1#c.x2, x1##x2) and space-separated absorb strings directly in wrapper commands. Bare variables inside # / ## are treated as continuous, matching common Stata usage.

What is not yet supported

  • Multi-way clustering — only single-cluster robust inference is available.
  • Direct post-estimation on wrapper returns — the compat.stata wrappers return ResultSchema result objects. predict and margins are available on the core estimator layer only.
  • Full command surfaces for community commandsreghdfe, ivreghdfe, ppmlhdfe, did_imputation, eventstudyinteract, csdid, and rdrobust are implemented as verified high-frequency subsets, not complete Stata command reproductions. Unsupported options are explicitly rejected rather than silently ignored.

Completeness legend

  • Stable — synthetic + real-data dual-run verified; core API is unlikely to change.
  • Alpha — high-frequency paths are implemented and verified, but the command surface is still a subset of the full Stata community command.
  • Alpha — Partial — a verifiable implementation exists, but large functional areas are still missing (e.g., fuzzy RD for rdrobust, multi-way clustering).

See the Command Support Matrix for the per-command detailed status.


Installation

pip install StataFlow

Requirements: Python 3.10+, NumPy, pandas, SciPy.

For development (editable install from source):

git clone https://github.com/ZhenHaoFu810/StataFlow.git
cd StataFlow
pip install -e .

Quick start

Stata-compatible command layer (recommended)

All compat.stata wrappers return a ResultSchema object with coefficients, standard errors, and fit statistics. They do not expose .predict() or .margins() directly—use the core estimator layer below for post-estimation.

import pandas as pd
from stataflow.compat.stata import regress, reghdfe, ivregress_2sls, logit

# OLS with robust standard errors
result = regress(df, y="wage", x=["edu", "exper"], vce="robust")

# High-dimensional fixed effects (reghdfe)
result = reghdfe(
    df, y="wage", x=["edu", "exper"],
    absorb="firm_id year_id", vce="cluster", cluster="industry"
)

# Factor-variable syntax in HDFE
result = reghdfe(
    df, y="wage", x=["i.industry##c.post"], absorb="firm_id year_id"
)

# 2SLS
result = ivregress_2sls(
    df, y="lwage", x_exog=["edu"], x_endog=["exper"],
    instruments=["age", "kidslt6"], vce="robust"
)

# Logit
result = logit(df, y="inlf", x=["nwifeinc", "educ", "exper"])

For runnable examples, see the examples/ directory:

Native Python estimator layer (advanced)

from stataflow import OLS, FixedEffectsOLS, AbsorbingOLS, Logit, IV2SLS

model = OLS(data=df, y="wage", x=["edu", "exper"])
result = model.fit(vce="robust")

Supported commands

Command Python entry Core capabilities
regress stataflow.compat.stata.regress OLS, robust, cluster, aweight
xtreg, fe stataflow.compat.stata.xtreg_fe Fixed effects (within), cluster
areg stataflow.compat.stata.areg Single absorb variable FE
reghdfe stataflow.compat.stata.reghdfe 1-2 group HDFE, cluster, singleton drop
ivregress 2sls stataflow.compat.stata.ivregress_2sls 2SLS, robust, cluster
ivreghdfe stataflow.compat.stata.ivreghdfe IV + 1-2 group HDFE, cluster
logit stataflow.compat.stata.logit MLE, robust, cluster
probit stataflow.compat.stata.probit MLE, robust, cluster
poisson stataflow.compat.stata.poisson MLE, robust, cluster
ppmlhdfe stataflow.compat.stata.ppmlhdfe PPML + 1-2 group HDFE
did_imputation stataflow.compat.stata.did_imputation BJS DID imputation
eventstudyinteract stataflow.compat.stata.eventstudyinteract Sun & Abraham IW estimator
csdid stataflow.compat.stata.csdid Callaway-Sant'Anna DID (method="reg" only)
rdrobust stataflow.compat.stata.rdrobust Sharp RD local polynomial (bwselect="mserd", covs)

Full details: docs/command-support-matrix/README.md


Validation philosophy

Every public command is validated with two lines of evidence:

  1. Synthetic / controlled cases — formula, degrees of freedom, sample screening, edge cases.
  2. Real public datasets — field-level comparison against Stata 17 on openly available economic/financial data.

A command is considered "done" only when both lines pass and the source-to-Python mapping is documented. We do not accept "statistical equivalence" without explicit mathematical or source-code justification.

Public evidence and results are available in research/results/validation/.

Running tests

# Unit and integration tests (fast)
pytest tests/ -v --ignore=tests/golden/

# Golden dual-run tests (require Stata 17)
pytest tests/golden/ -v

Project structure

  • src/stataflow/estimators/ — Core Python estimators (OLS, AbsorbingOLS, Logit, PPMLHDFE, DIDImputation, etc.)
  • src/stataflow/compat/stata/ — Stata command wrappers (regress(), reghdfe(), ivregress_2sls(), etc.)
  • docs/command-support-matrix/ — Per-command support matrices
  • examples/ — Runnable demonstration scripts
  • tests/ — Unit and integration tests

Default target version

Stata 17


Documentation


Governance

  • Codex — project goals, architecture, review gates, and statistical-dispute arbitration.
  • Claude Code — implementation, testing, and evidence backfill.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stataflow-0.1.4.tar.gz (75.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stataflow-0.1.4-py3-none-any.whl (79.1 kB view details)

Uploaded Python 3

File details

Details for the file stataflow-0.1.4.tar.gz.

File metadata

  • Download URL: stataflow-0.1.4.tar.gz
  • Upload date:
  • Size: 75.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for stataflow-0.1.4.tar.gz
Algorithm Hash digest
SHA256 af5a0da615c8674099e4eb6364d136c6aea83dd15b8297c1515b993cf26e256c
MD5 e62651037847309774c4b2ffbdd98f82
BLAKE2b-256 7677305d03de6345cb86e03b5352c56dbe2ff298ac0589315574b641cf6637df

See more details on using hashes here.

File details

Details for the file stataflow-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: stataflow-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 79.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for stataflow-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c6437abfacf7a218fbee8eef9b6f32ad38958b0e481c90024c1b35acc591a2b7
MD5 8c710c32d3993b56e9ba889e584987ac
BLAKE2b-256 cc26aa7432d62497da7b098f09a76c325197595bb0c2594ceea2bde61b9ba2c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page