Skip to main content

StataFlow: A Python econometrics toolkit aligned with Stata 17

Project description

StataFlow

A Python econometrics toolkit that reproduces Stata 17 results with high precision.

PyPI version Python 3.10+


from stataflow.compat.stata import reghdfe

result = reghdfe(df, y="lwage", x=["exper", "edu"],
                 absorb="firm_id year_id", vce="cluster", cluster="firm_id")
result.display()

Features

  • 14 Stata commands in Python: regress, reghdfe, ivregress 2sls, ivreghdfe, logit, probit, poisson, ppmlhdfe, did_imputation, eventstudyinteract, csdid, rdrobust, areg, xtreg_fe
  • Stata-style regression table: result.display() produces a formatted output matching Stata's layout
  • High-dimensional fixed effects: MAP iterative absorption handles 10K+ FE levels without memory issues; individual slope absorption (absorb(firm_id##c.time))
  • Driscoll-Kraay panel HAC: time-series-robust standard errors with Bartlett kernel autocorrelation correction
  • Instrumental variables: 2SLS, GMM2S, and LIML estimators with weak-instrument diagnostics (Kleibergen-Paap F + Stock-Yogo critical values)
  • Binary, count, and PPML models: Logit, Probit, Poisson, and PPML with HDFE
  • Causal inference: DID (BJS imputation, Sun-Abraham, Callaway-Sant'Anna) with doubly-robust methods; Regression Discontinuity with 11 bandwidth selectors
  • Stata-compatible syntax: factor variables (i.group##c.post), analytic weights, multiple FEs
  • Validated against Stata 17: every public capability has field-level Python-Stata dual-run evidence

Installation

pip install StataFlow

Python 3.10+ required. Dependencies: NumPy, pandas, SciPy.

Quick Start

Stata-compatible API

import pandas as pd
from stataflow.compat.stata import regress, reghdfe, logit, ivregress_2sls, ppmlhdfe

# OLS with robust standard errors
result = regress(df, y="wage", x=["edu", "exper"], vce="robust")
result.display()

# High-dimensional fixed effects
result = reghdfe(
    df, y="wage", x=["edu", "exper"],
    absorb="firm_id year_id", vce="cluster", cluster="industry"
)

# Logit
result = logit(df, y="inlf", x=["nwifeinc", "educ", "exper"])
result.display()

# 2SLS with LIML
result = ivregress_2sls(
    df, y="lwage", x_exog=["edu"], x_endog=["exper"],
    instruments=["age", "kidslt6"], vce="robust"
)

# PPML with HDFE
result = ppmlhdfe(
    df, y="trade", x=["lndist", "contig", "fta"],
    absorb=["exporter", "importer", "year"], vce="cluster", cluster="exporter"
)

Native Python API

from stataflow import OLS, AbsorbingOLS, Logit

model = OLS(data=df, y="wage", x=["edu", "exper"])
result = model.fit(vce="robust")
result.display()

Using results

# Stata-style table
result.display()
result.display(show_ci=True)  # with confidence intervals

# Programmatic access
for c in result.coefficients:
    print(f"{c.name}: b={c.beta:.6f}, se={c.std_err:.6f}, t={c.t_stat:.2f}")

print(f"R² = {result.fit.r2:.4f}, N = {result.sample.nobs}")

Supported Models

Family Available via Estimators & VCE
Linear regress, areg, xtreg_fe, reghdfe OLS with ols / robust (HC1) / cluster (1-way, 2-way) / dkraay (panel HAC)
IV ivregress_2sls, ivreghdfe 2SLS, GMM2S, LIML (with Fuller), first-stage diagnostics, weak-IV tests
Binary / Count logit, probit, poisson MLE with ols / robust / cluster
PPML + HDFE ppmlhdfe IRLS with ols / robust / cluster, separation detection, eform
DID did_imputation, csdid, eventstudyinteract BJS imputation, Callaway-Sant'Anna (reg + DR), Sun-Abraham IW
RDD rdrobust Sharp / Fuzzy RD, 11 MSE+CER bandwidth selectors, cluster/nncluster VCE

Documentation

Running Tests

# Unit and integration tests
pytest tests/ -v --ignore=tests/golden/

# Golden dual-run tests (require local Stata 17)
pytest tests/golden/ -v

License

This project is licensed under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stataflow-1.0.0.tar.gz (102.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stataflow-1.0.0-py3-none-any.whl (119.6 kB view details)

Uploaded Python 3

File details

Details for the file stataflow-1.0.0.tar.gz.

File metadata

  • Download URL: stataflow-1.0.0.tar.gz
  • Upload date:
  • Size: 102.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for stataflow-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ea92d08b28149cc35238d8080898fa8345c9bd2b2330c469d6528fab1816db9c
MD5 fc036877ee2b4b59592b99654d9b1a11
BLAKE2b-256 49723696b8adbdeae337a63a36b7005ed2767a5873fe77e874093cb1352f74d5

See more details on using hashes here.

File details

Details for the file stataflow-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: stataflow-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 119.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for stataflow-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b72ea723bc3d16e3a28bb8b289bfbcd18af5286c561dcfa07afebb306a1896e5
MD5 f30f9ac61cfe67e457860e7f61243f4b
BLAKE2b-256 ac76fb0ae065e90be8280325e0ca81f95a8195baee922db0a0672adb794558ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page