Skip to main content

Stata2Python: A Python econometrics toolkit aligned with Stata 17

Project description

Stata2Python

Stata2Python (statapy) is a Python econometrics toolkit that reproduces Stata 17 estimation results with high precision. It provides both a Stata-compatible command layer (for researchers migrating from Stata) and a native Python estimator layer (for advanced users who want direct control).

What you can do today

  • Run Stata-style commands in Python: regress, reghdfe, ivregress 2sls, logit, ppmlhdfe, did_imputation, csdid, rdrobust, and more.
  • Obtain coefficients, standard errors, t/z-statistics, p-values, and confidence intervals that are field-level verified against Stata 17.
  • Work with high-dimensional fixed effects (HDFE), IV/2SLS, binary/count models, and DID/event-study estimators.
  • Use Stata-style factor-variable syntax (i.group##c.post, c.x1#c.x2, x1##x2) and space-separated absorb strings directly in wrapper commands. Bare variables inside # / ## are treated as continuous, matching common Stata usage.

What is not yet supported

  • Multi-way clustering — only single-cluster robust inference is available.
  • Direct post-estimation on wrapper returns — the compat.stata wrappers return ResultSchema result objects. predict and margins are available on the core estimator layer only.
  • Full command surfaces for community commandsreghdfe, ivreghdfe, ppmlhdfe, did_imputation, eventstudyinteract, csdid, and rdrobust are implemented as verified high-frequency subsets, not complete Stata command reproductions. Unsupported options are explicitly rejected rather than silently ignored.

Completeness legend

  • Stable — synthetic + real-data dual-run verified; core API is unlikely to change.
  • Alpha — high-frequency paths are implemented and verified, but the command surface is still a subset of the full Stata community command.
  • Alpha — Partial — a verifiable implementation exists, but large functional areas are still missing (e.g., fuzzy RD for rdrobust, multi-way clustering).

See the Command Support Matrix for the per-command detailed status. For the public Stata-vs-Python evidence book, see Validation Overview.


Installation

pip install -e .

Requirements: Python 3.10+, NumPy, pandas, SciPy.


Quick start

Stata-compatible command layer (recommended)

All compat.stata wrappers return a ResultSchema object with coefficients, standard errors, and fit statistics. They do not expose .predict() or .margins() directly—use the core estimator layer below for post-estimation.

import pandas as pd
from statapy.compat.stata import regress, reghdfe, ivregress_2sls, logit

# OLS with robust standard errors
result = regress(df, y="wage", x=["edu", "exper"], vce="robust")

# High-dimensional fixed effects (reghdfe)
result = reghdfe(
    df, y="wage", x=["edu", "exper"],
    absorb="firm_id year_id", vce="cluster", cluster="industry"
)

# Factor-variable syntax in HDFE
result = reghdfe(
    df, y="wage", x=["i.industry##c.post"], absorb="firm_id year_id"
)

# 2SLS
result = ivregress_2sls(
    df, y="lwage", x_exog=["edu"], x_endog=["exper"],
    instruments=["age", "kidslt6"], vce="robust"
)

# Logit
result = logit(df, y="inlf", x=["nwifeinc", "educ", "exper"])

For runnable examples, see the examples/ directory:

Native Python estimator layer (advanced)

from statapy import OLS, FixedEffectsOLS, AbsorbingOLS, Logit, IV2SLS

model = OLS(data=df, y="wage", x=["edu", "exper"])
result = model.fit(vce="robust")

Supported commands

Command Python entry Core capabilities
regress statapy.compat.stata.regress OLS, robust, cluster, aweight
xtreg, fe statapy.compat.stata.xtreg_fe Fixed effects (within), cluster
areg statapy.compat.stata.areg Single absorb variable FE
reghdfe statapy.compat.stata.reghdfe 1-2 group HDFE, cluster, singleton drop
ivregress 2sls statapy.compat.stata.ivregress_2sls 2SLS, robust, cluster
ivreghdfe statapy.compat.stata.ivreghdfe IV + 1-2 group HDFE, cluster
logit statapy.compat.stata.logit MLE, robust, cluster
probit statapy.compat.stata.probit MLE, robust, cluster
poisson statapy.compat.stata.poisson MLE, robust, cluster
ppmlhdfe statapy.compat.stata.ppmlhdfe PPML + 1-2 group HDFE
did_imputation statapy.compat.stata.did_imputation BJS DID imputation
eventstudyinteract statapy.compat.stata.eventstudyinteract Sun & Abraham IW estimator
csdid statapy.compat.stata.csdid Callaway-Sant'Anna DID (method="reg" only)
rdrobust statapy.compat.stata.rdrobust Sharp RD local polynomial (bwselect="mserd", covs)

Full details: docs/command-support-matrix/README.md


Validation philosophy

Every public command is validated with two lines of evidence:

  1. Synthetic / controlled cases — formula, degrees of freedom, sample screening, edge cases.
  2. Real public datasets — field-level comparison against Stata 17 on openly available economic/financial data.

A command is considered "done" only when both lines pass and the source-to-Python mapping is documented. We do not accept "statistical equivalence" without explicit mathematical or source-code justification.

Public evidence entry points:

Running tests

# Unit and integration tests (fast)
pytest tests/ -v --ignore=tests/golden/

# Golden dual-run tests (require Stata 17)
pytest tests/golden/ -v

Project structure

  • src/statapy/estimators/ — Core Python estimators (OLS, AbsorbingOLS, Logit, PPMLHDFE, DIDImputation, etc.)
  • src/statapy/compat/stata/ — Stata command wrappers (regress(), reghdfe(), ivregress_2sls(), etc.)
  • docs/research/ — Source-to-Python mapping documents for community commands
  • docs/command-support-matrix/ — Per-command support matrices
  • tests/golden/ — Stata-Python dual-run tests
  • research/vendor/stata_community/ — Local mirrors of open-source Stata community packages (for research only)

Default target version

Stata 17


Documentation


Governance

  • Codex — project goals, architecture, review gates, and statistical-dispute arbitration.
  • Claude Code — implementation, testing, and evidence backfill.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stataflow-0.1.0.tar.gz (76.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stataflow-0.1.0-py3-none-any.whl (79.1 kB view details)

Uploaded Python 3

File details

Details for the file stataflow-0.1.0.tar.gz.

File metadata

  • Download URL: stataflow-0.1.0.tar.gz
  • Upload date:
  • Size: 76.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for stataflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cd3ec3546f0282d9d2a3245ff6a5cc9ae4833df5f571c729e6d098a1c4d8ae03
MD5 6122efc572dbd23de96079f2cadbb1c7
BLAKE2b-256 2c828759e911a5c809ca0c3d12ac67c7084b5756b0b8b0760d92032838aac880

See more details on using hashes here.

File details

Details for the file stataflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: stataflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 79.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for stataflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b73b9eccbe75e729e65e2a9443080eaaa83419254e3d5eafa2abc6fd7cf297e
MD5 dc3c29c129beb9aa8e6a875fe38b156f
BLAKE2b-256 f6a0cf2d01db676e60474175423fb1f9df780ad5962177bb2eea21ee8310f470

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page