StataFlow: A Python econometrics toolkit aligned with Stata 17
Project description
StataFlow
A Python econometrics toolkit that reproduces Stata 17 results with high precision.
from stataflow.compat.stata import reghdfe
result = reghdfe(df, y="lwage", x=["exper", "edu"],
absorb="firm_id year_id", vce="cluster", cluster="firm_id")
result.display()
Features
- 14 Stata commands in Python:
regress,reghdfe,ivregress 2sls,ivreghdfe,logit,probit,poisson,ppmlhdfe,did_imputation,eventstudyinteract,csdid,rdrobust,areg,xtreg_fe - Stata-style regression table:
result.display()produces a formatted output matching Stata's layout - High-dimensional fixed effects: MAP iterative absorption handles 10K+ FE levels without memory issues; individual slope absorption (
absorb(firm_id##c.time)) - Driscoll-Kraay panel HAC: time-series-robust standard errors with Bartlett kernel autocorrelation correction
- Instrumental variables: 2SLS, GMM2S, and LIML estimators with weak-instrument diagnostics (Kleibergen-Paap F + Stock-Yogo critical values)
- Binary, count, and PPML models: Logit, Probit, Poisson, and PPML with HDFE
- Causal inference: DID (BJS imputation, Sun-Abraham, Callaway-Sant'Anna) with doubly-robust methods; Regression Discontinuity with 11 bandwidth selectors
- Stata-compatible syntax: factor variables (
i.group##c.post), analytic weights, multiple FEs - Validated against Stata 17: every public capability has field-level Python-Stata dual-run evidence
Installation
pip install StataFlow
Python 3.10+ required. Dependencies: NumPy, pandas, SciPy.
Quick Start
Stata-compatible API
import pandas as pd
from stataflow.compat.stata import regress, reghdfe, logit, ivregress_2sls, ppmlhdfe
# OLS with robust standard errors
result = regress(df, y="wage", x=["edu", "exper"], vce="robust")
result.display()
# High-dimensional fixed effects
result = reghdfe(
df, y="wage", x=["edu", "exper"],
absorb="firm_id year_id", vce="cluster", cluster="industry"
)
# Logit
result = logit(df, y="inlf", x=["nwifeinc", "educ", "exper"])
result.display()
# 2SLS with LIML
result = ivregress_2sls(
df, y="lwage", x_exog=["edu"], x_endog=["exper"],
instruments=["age", "kidslt6"], vce="robust"
)
# PPML with HDFE
result = ppmlhdfe(
df, y="trade", x=["lndist", "contig", "fta"],
absorb=["exporter", "importer", "year"], vce="cluster", cluster="exporter"
)
Native Python API
from stataflow import OLS, AbsorbingOLS, Logit
model = OLS(data=df, y="wage", x=["edu", "exper"])
result = model.fit(vce="robust")
result.display()
Using results
# Stata-style table
result.display()
result.display(show_ci=True) # with confidence intervals
# Programmatic access
for c in result.coefficients:
print(f"{c.name}: b={c.beta:.6f}, se={c.std_err:.6f}, t={c.t_stat:.2f}")
print(f"R² = {result.fit.r2:.4f}, N = {result.sample.nobs}")
Supported Models
| Family | Available via | Estimators & VCE |
|---|---|---|
| Linear | regress, areg, xtreg_fe, reghdfe |
OLS with ols / robust (HC1) / cluster (1-way, 2-way) / dkraay (panel HAC) |
| IV | ivregress_2sls, ivreghdfe |
2SLS, GMM2S, LIML (with Fuller), first-stage diagnostics, weak-IV tests |
| Binary / Count | logit, probit, poisson |
MLE with ols / robust / cluster |
| PPML + HDFE | ppmlhdfe |
IRLS with ols / robust / cluster, separation detection, eform |
| DID | did_imputation, csdid, eventstudyinteract |
BJS imputation, Callaway-Sant'Anna (reg + DR), Sun-Abraham IW |
| RDD | rdrobust |
Sharp / Fuzzy RD, 11 MSE+CER bandwidth selectors, cluster/nncluster VCE |
Documentation
- User Guide — full tutorial and concept guide (中文: 用户手册)
- Cookbook — copy-pasteable recipes for common tasks (中文: 中文 Cookbook)
- Examples — runnable demo scripts
Running Tests
# Unit and integration tests
pytest tests/ -v --ignore=tests/golden/
# Golden dual-run tests (require local Stata 17)
pytest tests/golden/ -v
License
This project is licensed under the MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
stataflow-1.0.0.tar.gz
(102.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
stataflow-1.0.0-py3-none-any.whl
(119.6 kB
view details)
File details
Details for the file stataflow-1.0.0.tar.gz.
File metadata
- Download URL: stataflow-1.0.0.tar.gz
- Upload date:
- Size: 102.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea92d08b28149cc35238d8080898fa8345c9bd2b2330c469d6528fab1816db9c
|
|
| MD5 |
fc036877ee2b4b59592b99654d9b1a11
|
|
| BLAKE2b-256 |
49723696b8adbdeae337a63a36b7005ed2767a5873fe77e874093cb1352f74d5
|
File details
Details for the file stataflow-1.0.0-py3-none-any.whl.
File metadata
- Download URL: stataflow-1.0.0-py3-none-any.whl
- Upload date:
- Size: 119.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b72ea723bc3d16e3a28bb8b289bfbcd18af5286c561dcfa07afebb306a1896e5
|
|
| MD5 |
f30f9ac61cfe67e457860e7f61243f4b
|
|
| BLAKE2b-256 |
ac76fb0ae065e90be8280325e0ca81f95a8195baee922db0a0672adb794558ca
|