Skip to main content

Fast Stata-Python bridge — independent drop-in replacement for pystata.

Project description

pystata-x

Independent drop-in replacement for StataCorp's pystata. Provides a fast stata_setup initialiser and command execution path that delivers ~10–20,000× speedup on short commands and ~11× faster cold Stata initialisation.

Quick Start

import sys
sys.path.insert(0, "path/to/pystata-x/src")

from pystata_x.stata_setup import config
config("/Applications/StataMP", "mp", splash=False)

# Use our fast execution:
from pystata_x._core import execute
output, rc = execute("display 1+1")
print(output)  # "2"

Or use the vendor-compatible API:

from pystata_x._core import run
run("sysuse auto, clear")  # prints output, raises SystemError on error

Why the polling thread is the bottleneck

The original pystata.stata.run() calls RedirectOutput from pystata.core.stout, which creates a RepeatTimer thread that polls Stata's output buffer every 15 ms:

  1. A background thread is created and started.
  2. Every 15 ms it calls StataSO_getOutput() to fetch and display output.
  3. After the command finishes a "#return;0" sentinel appears, the thread exits and is joined.

This design exists to support Jupyter notebook interactivity — users see output streaming in as commands execute, like a live terminal. The polling sleep (15 ms) plus thread lifecycle overhead adds ~40 ms of Python overhead on every run() call:

pystata.stata.run()  →  ~40 ms total
   ├─ thread create   ~1 ms
   ├─ 3× poll cycle   ~45 ms (3 × 15 ms)
   ├─ thread join     ~1 ms
   └─ work overhead   ~1 ms

For headless / CLI / AI-agent use cases (e.g., stata-agent), output is captured programmatically after the command finishes — no streaming to a terminal or notebook is needed. The polling thread is pure overhead.

pystata-x skips the thread entirely and calls StataSO_Execute() directly, then drains the output buffer once after execution.

Benchmark Results

Measured on macOS (StataSE, Apple Silicon M4) using benchmarks/run_benchmarks.py. Each test runs in a fresh subprocess (Stata initialised once per test) with warm-up iterations before timing. Times are the mean of multiple iterations measured via time.perf_counter().

Command execution

Test Original pystata pystata-x Speedup
Single command (display 1+1) ~40.6 ms ~0.002 ms ~19,000×
Single command + echo ~40.7 ms ~0.002 ms ~17,000×
Single command (quietly) ~40.4 ms ~0.002 ms ~20,000×
Multi-line (4 commands, do-file) ~41.9 ms ~3.2 ms ~13×
Raw StataSO_Execute (no wrapper) ~0.002 ms ~0.002 ms 1× (baseline)

Cold initialisation

Method Time Speedup
Original stata_setup.config() (→ pystata) ~1.50 s
Optimised pystata_x._config.init() ~0.13 s ~11×
Optimised pystata_x.stata_setup.config() ~0.13 s ~11×

Why cold init is faster

The original pystata.config.init() does several expensive things that pystata_x's init skips:

Step Original pystata-x
IPython/Jupyter probe ~100 ms (imports IPython, checks for kernel) Skipped
Preference-file I/O ~50 ms (reads profile.ini from disk) Skipped
Python 2 compat setup ~30 ms (try/except on every str() conversion) Removed
stata_setup wrapper overhead ~50 ms (filesystem checks, extra imports) Inlined
Total ~1.50 s ~0.13 s

Project Structure

src/pystata_x/
├── __init__.py              # Package entry point
├── _config.py               # Fast Stata initialisation (no IPython/py2 compat)
├── _core.py                 # Fast command execution (direct StataSO_Execute)
└── stata_setup.py           # Drop-in replacement for PyPI `stata-setup`
benchmarks/
├── run_benchmarks.py        # Comprehensive benchmark runner
└── history/                 # Benchmark result history

Cross-platform

Shared-library discovery in _config.py supports macOS, Linux, and Windows:

Platform Library name Search path
macOS libstata-{be,se,mp}.dylib Stata{B,E,MP}E.app/Contents/MacOS/
Linux libstata-{be,se,mp}.so {st_path}/
Windows libstata-{be,se,mp}.dll {st_path}/

Licence

  • Our modules (_config.py, _core.py, stata_setup.py, __init__.py, and all files under benchmarks/) are original work, released under the GNU Affero General Public License v3.0.
  • The PyPI stata-setup package (v0.1.3, StataCorp LLC) is Apache 2.0 licenced — our stata_setup.py provides the same public API with a completely rewritten implementation under AGPL-3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pystata_x-0.1.0.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pystata_x-0.1.0-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file pystata_x-0.1.0.tar.gz.

File metadata

  • Download URL: pystata_x-0.1.0.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pystata_x-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1251085dde0551cdedf1b8df05c7af88adfce784426fc13d26d7a6de944d58a9
MD5 e29227d124c7d24a7dfb86ef8e250eb3
BLAKE2b-256 a28bb10c87dcf221f3cb51c88c3f90156e74789a596edcc4fafba02945981359

See more details on using hashes here.

Provenance

The following attestation bundles were made for pystata_x-0.1.0.tar.gz:

Publisher: publish.yml on tmonk/pystata-x

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pystata_x-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pystata_x-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pystata_x-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38fee5e87cf98d63dfc23f49375b267d44557d335a2fefb3c678b286b2d4d847
MD5 149cabdb47e1aed822cf453461d5b9b5
BLAKE2b-256 b5c6115120038e132ee95bd9e10c9e1613e14f4c31248ae2ef885fb7508bdc0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pystata_x-0.1.0-py3-none-any.whl:

Publisher: publish.yml on tmonk/pystata-x

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page