Skip to main content

Fast Stata-Python bridge — independent drop-in replacement for pystata.

Project description

pystata-x

Independent drop-in replacement for StataCorp's pystata. Provides a fast stata_setup initialiser and command execution path that delivers ~10–20,000× speedup on short commands and ~11× faster cold Stata initialisation.

Quick Start

import sys
sys.path.insert(0, "path/to/pystata-x/src")

from pystata_x.stata_setup import config
config("/Applications/StataMP", "mp", splash=False)

# Use our fast execution:
from pystata_x._core import execute
output, rc = execute("display 1+1")
print(output)  # "2"

Or use the vendor-compatible API:

from pystata_x._core import run
run("sysuse auto, clear")  # prints output, raises SystemError on error

Why the polling thread is the bottleneck

The original pystata.stata.run() calls RedirectOutput from pystata.core.stout, which creates a RepeatTimer thread that polls Stata's output buffer every 15 ms:

  1. A background thread is created and started.
  2. Every 15 ms it calls StataSO_getOutput() to fetch and display output.
  3. After the command finishes a "#return;0" sentinel appears, the thread exits and is joined.

This design exists to support Jupyter notebook interactivity — users see output streaming in as commands execute, like a live terminal. The polling sleep (15 ms) plus thread lifecycle overhead adds ~40 ms of Python overhead on every run() call:

pystata.stata.run()  →  ~40 ms total
   ├─ thread create   ~1 ms
   ├─ 3× poll cycle   ~45 ms (3 × 15 ms)
   ├─ thread join     ~1 ms
   └─ work overhead   ~1 ms

For headless / CLI / AI-agent use cases (e.g., stata-agent), output is captured programmatically after the command finishes — no streaming to a terminal or notebook is needed. The polling thread is pure overhead.

pystata-x skips the thread entirely and calls StataSO_Execute() directly, then drains the output buffer once after execution.

Benchmark Results

Measured on macOS (StataSE, Apple Silicon M4) using benchmarks/run_benchmarks.py. Each test runs in a fresh subprocess (Stata initialised once per test) with warm-up iterations before timing. Times are the mean of multiple iterations measured via time.perf_counter().

Command execution

Test Original pystata pystata-x Speedup
Single command (display 1+1) ~40.6 ms ~0.002 ms ~19,000×
Single command + echo ~40.7 ms ~0.002 ms ~17,000×
Single command (quietly) ~40.4 ms ~0.002 ms ~20,000×
Multi-line (4 commands, do-file) ~41.9 ms ~3.2 ms ~13×
Raw StataSO_Execute (no wrapper) ~0.002 ms ~0.002 ms 1× (baseline)

Cold initialisation

Method Time Speedup
Original stata_setup.config() (→ pystata) ~1.50 s
Optimised pystata_x._config.init() ~0.13 s ~11×
Optimised pystata_x.stata_setup.config() ~0.13 s ~11×

Why cold init is faster

The original pystata.config.init() does several expensive things that pystata_x's init skips:

Step Original pystata-x
IPython/Jupyter probe ~100 ms (imports IPython, checks for kernel) Skipped
Preference-file I/O ~50 ms (reads profile.ini from disk) Skipped
Python 2 compat setup ~30 ms (try/except on every str() conversion) Removed
stata_setup wrapper overhead ~50 ms (filesystem checks, extra imports) Inlined
Total ~1.50 s ~0.13 s

Project Structure

src/pystata_x/
├── __init__.py              # Package entry point
├── _config.py               # Fast Stata initialisation (no IPython/py2 compat)
├── _core.py                 # Fast command execution (direct StataSO_Execute)
└── stata_setup.py           # Drop-in replacement for PyPI `stata-setup`
benchmarks/
├── run_benchmarks.py        # Comprehensive benchmark runner
└── history/                 # Benchmark result history

Cross-platform

Shared-library discovery in _config.py supports macOS, Linux, and Windows:

Platform Library name Search path
macOS libstata-{be,se,mp}.dylib Stata{B,E,MP}E.app/Contents/MacOS/
Linux libstata-{be,se,mp}.so {st_path}/
Windows libstata-{be,se,mp}.dll {st_path}/

Licence

  • Our modules (_config.py, _core.py, stata_setup.py, __init__.py, and all files under benchmarks/) are original work, released under the GNU Affero General Public License v3.0.
  • The PyPI stata-setup package (v0.1.3, StataCorp LLC) is Apache 2.0 licenced — our stata_setup.py provides the same public API with a completely rewritten implementation under AGPL-3.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pystata_x-0.1.2.tar.gz (37.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pystata_x-0.1.2-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file pystata_x-0.1.2.tar.gz.

File metadata

  • Download URL: pystata_x-0.1.2.tar.gz
  • Upload date:
  • Size: 37.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pystata_x-0.1.2.tar.gz
Algorithm Hash digest
SHA256 2824355463d5baa04646d552c9d60a7fc636a3073577cb4d71d068b61bb74398
MD5 9e85383538ef3c4bdf9c7a5509a3f83a
BLAKE2b-256 514de84d82e9895b70709ec181376228c17f990d98d7a2d8550441bd875e4d9c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pystata_x-0.1.2.tar.gz:

Publisher: publish.yml on tmonk/pystata-x

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pystata_x-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pystata_x-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pystata_x-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 064e1997f53e377fe8b0280b17c8909eb904199aab1f39aacdc94db7d82338b0
MD5 992dc1b418e131e7a79be4cc03f8def1
BLAKE2b-256 d26812474a047c27eed3878aa1cbed618b17011c8ffe9a3e4b0dbb785a478747

See more details on using hashes here.

Provenance

The following attestation bundles were made for pystata_x-0.1.2-py3-none-any.whl:

Publisher: publish.yml on tmonk/pystata-x

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page