Fast Stata-Python bridge — independent drop-in replacement for pystata.
Project description
pystata-x
Independent drop-in replacement for StataCorp's pystata. Provides a
fast stata_setup initialiser and command execution path that delivers
~10–20,000× speedup on short commands and ~11× faster cold Stata
initialisation.
Quick Start
import sys
sys.path.insert(0, "path/to/pystata-x/src")
from pystata_x.stata_setup import config
config("/Applications/StataMP", "mp", splash=False)
# Use our fast execution:
from pystata_x._core import execute
output, rc = execute("display 1+1")
print(output) # "2"
Or use the vendor-compatible API:
from pystata_x._core import run
run("sysuse auto, clear") # prints output, raises SystemError on error
Why the polling thread is the bottleneck
The original pystata.stata.run() calls RedirectOutput from
pystata.core.stout, which creates a RepeatTimer thread that polls
Stata's output buffer every 15 ms:
- A background thread is created and started.
- Every 15 ms it calls
StataSO_getOutput()to fetch and display output. - After the command finishes a
"#return;0"sentinel appears, the thread exits and is joined.
This design exists to support Jupyter notebook interactivity — users see
output streaming in as commands execute, like a live terminal. The polling
sleep (15 ms) plus thread lifecycle overhead adds ~40 ms of Python
overhead on every run() call:
pystata.stata.run() → ~40 ms total
├─ thread create ~1 ms
├─ 3× poll cycle ~45 ms (3 × 15 ms)
├─ thread join ~1 ms
└─ work overhead ~1 ms
For headless / CLI / AI-agent use cases (e.g., stata-agent), output is
captured programmatically after the command finishes — no streaming to a
terminal or notebook is needed. The polling thread is pure overhead.
pystata-x skips the thread entirely and calls StataSO_Execute() directly,
then drains the output buffer once after execution.
Benchmark Results
Measured on macOS (StataSE, Apple Silicon M4) using
benchmarks/run_benchmarks.py. Each test runs in a fresh subprocess
(Stata initialised once per test) with warm-up iterations before timing.
Times are the mean of multiple iterations measured via
time.perf_counter().
Command execution
| Test | Original pystata | pystata-x | Speedup |
|---|---|---|---|
Single command (display 1+1) |
~40.6 ms | ~0.002 ms | ~19,000× |
| Single command + echo | ~40.7 ms | ~0.002 ms | ~17,000× |
| Single command (quietly) | ~40.4 ms | ~0.002 ms | ~20,000× |
| Multi-line (4 commands, do-file) | ~41.9 ms | ~3.2 ms | ~13× |
| Raw StataSO_Execute (no wrapper) | ~0.002 ms | ~0.002 ms | 1× (baseline) |
Cold initialisation
| Method | Time | Speedup |
|---|---|---|
Original stata_setup.config() (→ pystata) |
~1.50 s | 1× |
Optimised pystata_x._config.init() |
~0.13 s | ~11× |
Optimised pystata_x.stata_setup.config() |
~0.13 s | ~11× |
Why cold init is faster
The original pystata.config.init() does several expensive things that pystata_x's
init skips:
| Step | Original | pystata-x |
|---|---|---|
| IPython/Jupyter probe | ~100 ms (imports IPython, checks for kernel) |
Skipped |
| Preference-file I/O | ~50 ms (reads profile.ini from disk) |
Skipped |
| Python 2 compat setup | ~30 ms (try/except on every str() conversion) |
Removed |
stata_setup wrapper overhead |
~50 ms (filesystem checks, extra imports) | Inlined |
| Total | ~1.50 s | ~0.13 s |
Project Structure
src/pystata_x/
├── __init__.py # Package entry point
├── _config.py # Fast Stata initialisation (no IPython/py2 compat)
├── _core.py # Fast command execution (direct StataSO_Execute)
└── stata_setup.py # Drop-in replacement for PyPI `stata-setup`
benchmarks/
├── run_benchmarks.py # Comprehensive benchmark runner
└── history/ # Benchmark result history
Cross-platform
Shared-library discovery in _config.py supports macOS, Linux, and Windows:
| Platform | Library name | Search path |
|---|---|---|
| macOS | libstata-{be,se,mp}.dylib |
Stata{B,E,MP}E.app/Contents/MacOS/ |
| Linux | libstata-{be,se,mp}.so |
{st_path}/ |
| Windows | libstata-{be,se,mp}.dll |
{st_path}/ |
Licence
- Our modules (
_config.py,_core.py,stata_setup.py,__init__.py, and all files underbenchmarks/) are original work, released under the GNU Affero General Public License v3.0. - The PyPI
stata-setuppackage (v0.1.3, StataCorp LLC) is Apache 2.0 licenced — ourstata_setup.pyprovides the same public API with a completely rewritten implementation under AGPL-3.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pystata_x-0.1.2.tar.gz.
File metadata
- Download URL: pystata_x-0.1.2.tar.gz
- Upload date:
- Size: 37.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2824355463d5baa04646d552c9d60a7fc636a3073577cb4d71d068b61bb74398
|
|
| MD5 |
9e85383538ef3c4bdf9c7a5509a3f83a
|
|
| BLAKE2b-256 |
514de84d82e9895b70709ec181376228c17f990d98d7a2d8550441bd875e4d9c
|
Provenance
The following attestation bundles were made for pystata_x-0.1.2.tar.gz:
Publisher:
publish.yml on tmonk/pystata-x
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pystata_x-0.1.2.tar.gz -
Subject digest:
2824355463d5baa04646d552c9d60a7fc636a3073577cb4d71d068b61bb74398 - Sigstore transparency entry: 1554699167
- Sigstore integration time:
-
Permalink:
tmonk/pystata-x@303e8a52c304142167fa6d91fdcbdf840ae9acea -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/tmonk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@303e8a52c304142167fa6d91fdcbdf840ae9acea -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pystata_x-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pystata_x-0.1.2-py3-none-any.whl
- Upload date:
- Size: 25.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
064e1997f53e377fe8b0280b17c8909eb904199aab1f39aacdc94db7d82338b0
|
|
| MD5 |
992dc1b418e131e7a79be4cc03f8def1
|
|
| BLAKE2b-256 |
d26812474a047c27eed3878aa1cbed618b17011c8ffe9a3e4b0dbb785a478747
|
Provenance
The following attestation bundles were made for pystata_x-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on tmonk/pystata-x
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pystata_x-0.1.2-py3-none-any.whl -
Subject digest:
064e1997f53e377fe8b0280b17c8909eb904199aab1f39aacdc94db7d82338b0 - Sigstore transparency entry: 1554699711
- Sigstore integration time:
-
Permalink:
tmonk/pystata-x@303e8a52c304142167fa6d91fdcbdf840ae9acea -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/tmonk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@303e8a52c304142167fa6d91fdcbdf840ae9acea -
Trigger Event:
workflow_dispatch
-
Statement type: