Skip to main content

MCP Server + Skill for Stata: execute commands, inspect data, and generate high-quality Stata code with AI

Project description

Stata AI Fusion

Stata AI Fusion

MCP Server + Skill Knowledge Base + VS Code Extension for Stata

Let AI directly execute Stata code, generate publication-quality analysis, and provide a complete IDE experience.

PyPI License: MIT Python 3.11+ VS Code Marketplace

Quick StartFeaturesMCP ToolsSkill KnowledgeVS Code Extension中文文档


Why Stata AI Fusion?

Stata is one of the most widely used statistical packages in economics, political science, epidemiology, and biostatistics. Yet while R and Python users have enjoyed deep AI integration for years, Stata has remained isolated from the AI-assisted coding revolution.

stata-ai-fusion bridges that gap. It gives AI assistants (Claude, Cursor, GitHub Copilot, and others) the ability to start a real Stata session, run commands, inspect data, extract estimation results, and capture graphs -- all through the open Model Context Protocol (MCP).

The project ships as three complementary components so every workflow is covered:

Component What it does Who it's for
MCP Server 10 tools that let any MCP-compatible AI execute Stata Claude Desktop, Claude Code, Cursor users
Skill Knowledge Base 5,653 lines of Stata expertise the AI can consult Claude.ai Project / Skill users
VS Code Extension Syntax highlighting, snippets, run-in-terminal Anyone writing .do files in VS Code or Cursor

Architecture

Architecture

The data flow is straightforward:

  1. AI Assistant sends a tool call (e.g. run_command) via MCP.
  2. MCP Server dispatches the request to the Session Manager, which maintains one or more persistent, interactive Stata processes.
  3. Stata executes the command; the server captures output, strips SMCL markup, detects errors, and auto-exports any new graphs.
  4. The cleaned result (text + optional base64 image) flows back to the AI, which interprets it and responds to the user.

Quick Start

Claude Code (recommended)

# Register the MCP server in one command
claude mcp add stata-ai-fusion -- uvx --from stata-ai-fusion stata-ai-fusion

# Verify
claude mcp list

Then try:

> Load the auto dataset in Stata and regress price on mpg and weight with robust SE

Claude Desktop

Edit your config file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "stata": {
      "command": "uvx",
      "args": ["--from", "stata-ai-fusion", "stata-ai-fusion"]
    }
  }
}

Restart Claude Desktop. The Stata tools will appear in the tool list.

Cursor / VS Code (MCP)

Create .cursor/mcp.json or .vscode/mcp.json in your project root:

{
  "servers": {
    "stata": {
      "command": "uvx",
      "args": ["--from", "stata-ai-fusion", "stata-ai-fusion"]
    }
  }
}

Claude.ai (Skill Only)

This mode provides code-generation guidance only (no live Stata execution).

  1. Download stata-ai-fusion-skill.zip from the Releases page.
  2. Go to Claude.ai > Project > Project Knowledge > Upload.
  3. Upload the zip file.

The AI will now reference the 5,653-line knowledge base when writing Stata code for you.

VS Code Extension

# Option 1: VS Code Marketplace
# Search "Stata AI Fusion" in the Extensions panel

# Option 2: From GitHub Release
code --install-extension stata-ai-fusion-0.1.2.vsix

# Option 3: Cursor
cursor --install-extension stata-ai-fusion-0.1.2.vsix

Features

MCP Server -- 10 tools for AI-driven analysis

The server exposes 10 MCP tools. Each tool can be called by any MCP-compatible AI assistant.

Conversation Example

User: "Analyze the determinants of car prices in the auto dataset."

AI calls: run_command("sysuse auto, clear")
AI calls: inspect_data()                          -> 74 obs, 12 variables
AI calls: run_command("regress price mpg weight foreign, robust")
AI calls: get_results("e", "N r2 F")              -> N=74, R²=0.52, F=29.1
AI calls: run_command("scatter price mpg || lfit price mpg")
AI calls: export_graph(format="png")               -> [base64 image]

AI: "The regression shows that each additional mile per gallon is associated
     with a $49.50 decrease in price, controlling for weight and origin..."

Skill Knowledge Base -- 5,653 lines of Stata expertise

The knowledge base uses a Progressive Disclosure architecture:

  • SKILL.md (486 lines) serves as the entry-point router.
  • 14 reference files cover specific domains; the AI loads them on demand.
  • The AI never reads all 5,653 lines at once -- it fetches only what the current task requires.

VS Code Extension -- complete Stata IDE

Feature Shortcut Description
Run Selection Cmd+Shift+Enter Execute selected Stata code in the terminal
Run File Cmd+Shift+D Execute the entire .do file
Syntax Highlighting -- 25 grammar scopes covering commands, functions, macros
Code Snippets Tab 30 snippets (reg, merge, foreach, esttab, ...)
Graph Preview -- View Stata graphs inside VS Code
Auto MCP Config -- Auto-generate .vscode/mcp.json for Cursor/VS Code

MCP Tools Reference

Tool Description Example
run_command Execute Stata code and return output run_command(code="regress price mpg weight, robust")
run_do_file Run an entire .do file run_do_file(path="/path/to/analysis.do")
inspect_data Describe the current dataset in memory Returns obs count, variable names, types, labels
codebook Generate codebook for specific variables codebook(variables="price mpg foreign")
get_results Extract stored results (r/e/c class) get_results(result_class="e", keys="N r2")
export_graph Export current graph as PNG/SVG/PDF Returns base64-encoded image data
search_log Search through the Stata session log search_log(query="error", regex=true)
install_package Install SSC or user-written packages install_package(package="reghdfe")
list_sessions List all active Stata sessions Returns session IDs, types, alive status
close_session Close a specific Stata session close_session(session_id="default")

Skill Knowledge Base

Reference Lines Coverage
syntax-core.md 564 Commands, data types, operators, macros
data-management.md 481 merge, reshape, append, collapse, encode
econometrics.md 412 OLS, IV, panel data, GMM, quantile regression
causal-inference.md 433 DiD, RDD, synthetic control, IPW, event study
survival-analysis.md 332 stset, stcox, streg, competing risks, KM curves
clinical-data.md 497 MIMIC-IV, ICD-9/10, KDIGO, Sepsis-3, LOS
graphics.md 463 twoway, graph options, schemes, export
tables-export.md 348 esttab, putdocx, collect, LaTeX/Word output
error-codes.md 349 Common Stata errors with causes and fixes
defensive-coding.md 389 assert, capture, confirm, isid, tempfiles
mata.md 532 Mata programming, matrices, optimization
packages/reghdfe.md 127 High-dimensional fixed effects regression
packages/coefplot.md 133 Coefficient and event-study plots
packages/gtools.md 107 Fast data operations (gcollapse, gegen)
Total 5,653

Configuration

Variable Default Description
STATA_PATH Auto-detect Full path to the Stata executable
MCP_STATA_LOGLEVEL INFO Logging level (DEBUG / INFO / WARNING)
MCP_STATA_TEMP System temp Base directory for session temporary files

Stata Auto-Discovery

The server automatically detects your Stata installation using a three-tier strategy:

  1. Environment variable -- STATA_PATH takes highest priority.
  2. Standard paths --
    • macOS: /Applications/Stata*/, /Applications/StataNow/
    • Linux: /usr/local/stata*/, /usr/local/bin/
    • Windows: C:\Program Files\Stata*\
  3. System PATH -- which stata-mp, which stata-se, which stata

Supported editions: MP, SE, IC, BE (Stata 17, 18, 19 and StataNow).

If auto-detection fails, set the environment variable explicitly:

export STATA_PATH="/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp"

Multi-Session Support

The server supports multiple concurrent Stata sessions with complete data isolation:

  • Each session maintains its own dataset, variables, and estimation results.
  • Sessions persist between tool calls -- no need to reload data after every command.
  • A default session is created automatically; create named sessions for parallel workflows.
  • All sessions are cleaned up gracefully on server shutdown.
AI calls: run_command(code="sysuse auto, clear", session_id="session_A")
AI calls: run_command(code="sysuse nlsw88, clear", session_id="session_B")
# session_A has 74 obs (auto), session_B has 2,246 obs (nlsw88)

Development

# Clone and set up
git clone https://github.com/SexyERIC0723/stata-ai-fusion.git
cd stata-ai-fusion
uv sync

# Run unit tests (no Stata required)
uv run pytest tests/test_discovery.py -v

# Run integration tests (requires Stata)
uv run pytest tests/test_integration.py -v

# Build Python package
uv build

# Build VS Code extension
cd vscode-extension && npm install && npm run build

Testing

Test Suite Count Requires Stata
test_discovery.py 39 No
test_integration.py 46 Yes
Total 85

All 85 tests pass on Stata MP 19 (macOS arm64).


Project Structure

stata-ai-fusion/
├── src/stata_ai_fusion/
│   ├── __main__.py          # CLI entry point
│   ├── server.py            # MCP server + resource registration
│   ├── stata_discovery.py   # Auto-detect Stata installation
│   ├── stata_session.py     # Interactive & batch session manager
│   ├── graph_cache.py       # Graph capture and base64 encoding
│   ├── result_extractor.py  # r()/e()/c() result extraction
│   └── tools/               # 10 MCP tool implementations
├── skill/
│   ├── SKILL.md             # Main skill routing document (486 lines)
│   └── references/          # 14 reference documents (5,167 lines)
├── vscode-extension/
│   ├── src/                 # TypeScript extension source (5 files)
│   ├── syntaxes/            # TextMate grammar
│   └── snippets/            # 30 code snippets
├── tests/                   # 85 tests (39 unit + 46 integration)
├── assets/                  # Icon, architecture diagrams
└── pyproject.toml

Contributing

Contributions are welcome! Here are some ways to help:

  • Bug reports: Open an issue describing the problem, your Stata version, and OS.
  • New Skill references: Add a .md file to skill/references/ covering a Stata topic.
  • New MCP tools: Implement a tool in src/stata_ai_fusion/tools/ and register it.
  • VS Code improvements: Expand syntax grammar or add snippets.

Please run uv run pytest tests/ -v before submitting a PR.


License

MIT -- see LICENSE for details.

Acknowledgments


PyPIVS Code MarketplaceReleases中文文档

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stata_ai_fusion-0.2.1.tar.gz (428.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stata_ai_fusion-0.2.1-py3-none-any.whl (47.2 kB view details)

Uploaded Python 3

File details

Details for the file stata_ai_fusion-0.2.1.tar.gz.

File metadata

  • Download URL: stata_ai_fusion-0.2.1.tar.gz
  • Upload date:
  • Size: 428.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for stata_ai_fusion-0.2.1.tar.gz
Algorithm Hash digest
SHA256 627f8e433449b5dbe6b4010cad02725644f76c92b95a1f75710a3e6e96608c52
MD5 b42b23cdda117ee9e9d792deaa152c40
BLAKE2b-256 d2392ee8745d3eaca768c74ede7401e625ed127678b9113b9f47f6e2a3228d56

See more details on using hashes here.

File details

Details for the file stata_ai_fusion-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for stata_ai_fusion-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6f587e8e35699bc00cbb1ca99ebccfb5e92fa27622bbc3c42708f3be31dd9f8a
MD5 f85d7eef3153f82741e77169a0bbd64f
BLAKE2b-256 c5184c0c84ce0af43edc0302d5c850cf8c2079e213df96426ec55a78decaa4fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page