MCP Server + Skill for Stata: execute commands, inspect data, and generate high-quality Stata code with AI
Project description
Stata AI Fusion
MCP Server + Skill Knowledge Base + VS Code Extension for Stata
Let AI directly execute Stata code, generate publication-quality analysis, and provide a complete IDE experience.
Quick Start • Features • MCP Tools • Skill Knowledge • VS Code Extension • 中文文档
Why Stata AI Fusion?
Stata is one of the most widely used statistical packages in economics, political science, epidemiology, and biostatistics. Yet while R and Python users have enjoyed deep AI integration for years, Stata has remained isolated from the AI-assisted coding revolution.
stata-ai-fusion bridges that gap. It gives AI assistants (Claude, Cursor, GitHub Copilot, and others) the ability to start a real Stata session, run commands, inspect data, extract estimation results, and capture graphs -- all through the open Model Context Protocol (MCP).
The project ships as three complementary components so every workflow is covered:
| Component | What it does | Who it's for |
|---|---|---|
| MCP Server | 10 tools that let any MCP-compatible AI execute Stata | Claude Desktop, Claude Code, Cursor users |
| Skill Knowledge Base | 5,653 lines of Stata expertise the AI can consult | Claude.ai Project / Skill users |
| VS Code Extension | Syntax highlighting, snippets, run-in-terminal | Anyone writing .do files in VS Code or Cursor |
Architecture
The data flow is straightforward:
- AI Assistant sends a tool call (e.g.
run_command) via MCP. - MCP Server dispatches the request to the Session Manager, which maintains one or more persistent, interactive Stata processes.
- Stata executes the command; the server captures output, strips SMCL markup, detects errors, and auto-exports any new graphs.
- The cleaned result (text + optional base64 image) flows back to the AI, which interprets it and responds to the user.
Quick Start
Claude Code (recommended)
# Register the MCP server in one command
claude mcp add stata-ai-fusion -- uvx --from stata-ai-fusion stata-ai-fusion
# Verify
claude mcp list
Then try:
> Load the auto dataset in Stata and regress price on mpg and weight with robust SE
Claude Desktop
Edit your config file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"stata": {
"command": "uvx",
"args": ["--from", "stata-ai-fusion", "stata-ai-fusion"]
}
}
}
Restart Claude Desktop. The Stata tools will appear in the tool list.
Cursor / VS Code (MCP)
Create .cursor/mcp.json or .vscode/mcp.json in your project root:
{
"servers": {
"stata": {
"command": "uvx",
"args": ["--from", "stata-ai-fusion", "stata-ai-fusion"]
}
}
}
Claude.ai (Skill Only)
This mode provides code-generation guidance only (no live Stata execution).
- Download
stata-ai-fusion-skill.zipfrom the Releases page. - Go to Claude.ai > Project > Project Knowledge > Upload.
- Upload the zip file.
The AI will now reference the 5,653-line knowledge base when writing Stata code for you.
VS Code Extension
# Option 1: VS Code Marketplace
# Search "Stata AI Fusion" in the Extensions panel
# Option 2: From GitHub Release
code --install-extension stata-ai-fusion-0.1.2.vsix
# Option 3: Cursor
cursor --install-extension stata-ai-fusion-0.1.2.vsix
Features
MCP Server -- 10 tools for AI-driven analysis
The server exposes 10 MCP tools. Each tool can be called by any MCP-compatible AI assistant.
Conversation Example
User: "Analyze the determinants of car prices in the auto dataset."
AI calls: run_command("sysuse auto, clear")
AI calls: inspect_data() -> 74 obs, 12 variables
AI calls: run_command("regress price mpg weight foreign, robust")
AI calls: get_results("e", "N r2 F") -> N=74, R²=0.52, F=29.1
AI calls: run_command("scatter price mpg || lfit price mpg")
AI calls: export_graph(format="png") -> [base64 image]
AI: "The regression shows that each additional mile per gallon is associated
with a $49.50 decrease in price, controlling for weight and origin..."
Skill Knowledge Base -- 5,653 lines of Stata expertise
The knowledge base uses a Progressive Disclosure architecture:
- SKILL.md (486 lines) serves as the entry-point router.
- 14 reference files cover specific domains; the AI loads them on demand.
- The AI never reads all 5,653 lines at once -- it fetches only what the current task requires.
VS Code Extension -- complete Stata IDE
| Feature | Shortcut | Description |
|---|---|---|
| Run Selection | Cmd+Shift+Enter |
Execute selected Stata code in the terminal |
| Run File | Cmd+Shift+D |
Execute the entire .do file |
| Syntax Highlighting | -- | 25 grammar scopes covering commands, functions, macros |
| Code Snippets | Tab |
30 snippets (reg, merge, foreach, esttab, ...) |
| Graph Preview | -- | View Stata graphs inside VS Code |
| Auto MCP Config | -- | Auto-generate .vscode/mcp.json for Cursor/VS Code |
MCP Tools Reference
| Tool | Description | Example |
|---|---|---|
run_command |
Execute Stata code and return output | run_command(code="regress price mpg weight, robust") |
run_do_file |
Run an entire .do file |
run_do_file(path="/path/to/analysis.do") |
inspect_data |
Describe the current dataset in memory | Returns obs count, variable names, types, labels |
codebook |
Generate codebook for specific variables | codebook(variables="price mpg foreign") |
get_results |
Extract stored results (r/e/c class) | get_results(result_class="e", keys="N r2") |
export_graph |
Export current graph as PNG/SVG/PDF | Returns base64-encoded image data |
search_log |
Search through the Stata session log | search_log(query="error", regex=true) |
install_package |
Install SSC or user-written packages | install_package(package="reghdfe") |
list_sessions |
List all active Stata sessions | Returns session IDs, types, alive status |
close_session |
Close a specific Stata session | close_session(session_id="default") |
Skill Knowledge Base
| Reference | Lines | Coverage |
|---|---|---|
syntax-core.md |
564 | Commands, data types, operators, macros |
data-management.md |
481 | merge, reshape, append, collapse, encode |
econometrics.md |
412 | OLS, IV, panel data, GMM, quantile regression |
causal-inference.md |
433 | DiD, RDD, synthetic control, IPW, event study |
survival-analysis.md |
332 | stset, stcox, streg, competing risks, KM curves |
clinical-data.md |
497 | MIMIC-IV, ICD-9/10, KDIGO, Sepsis-3, LOS |
graphics.md |
463 | twoway, graph options, schemes, export |
tables-export.md |
348 | esttab, putdocx, collect, LaTeX/Word output |
error-codes.md |
349 | Common Stata errors with causes and fixes |
defensive-coding.md |
389 | assert, capture, confirm, isid, tempfiles |
mata.md |
532 | Mata programming, matrices, optimization |
packages/reghdfe.md |
127 | High-dimensional fixed effects regression |
packages/coefplot.md |
133 | Coefficient and event-study plots |
packages/gtools.md |
107 | Fast data operations (gcollapse, gegen) |
| Total | 5,653 |
Configuration
| Variable | Default | Description |
|---|---|---|
STATA_PATH |
Auto-detect | Full path to the Stata executable |
MCP_STATA_LOGLEVEL |
INFO |
Logging level (DEBUG / INFO / WARNING) |
MCP_STATA_TEMP |
System temp | Base directory for session temporary files |
Stata Auto-Discovery
The server automatically detects your Stata installation using a three-tier strategy:
- Environment variable --
STATA_PATHtakes highest priority. - Standard paths --
- macOS:
/Applications/Stata*/,/Applications/StataNow/ - Linux:
/usr/local/stata*/,/usr/local/bin/ - Windows:
C:\Program Files\Stata*\
- macOS:
- System PATH --
which stata-mp,which stata-se,which stata
Supported editions: MP, SE, IC, BE (Stata 17, 18, 19 and StataNow).
If auto-detection fails, set the environment variable explicitly:
export STATA_PATH="/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp"
Multi-Session Support
The server supports multiple concurrent Stata sessions with complete data isolation:
- Each session maintains its own dataset, variables, and estimation results.
- Sessions persist between tool calls -- no need to reload data after every command.
- A default session is created automatically; create named sessions for parallel workflows.
- All sessions are cleaned up gracefully on server shutdown.
AI calls: run_command(code="sysuse auto, clear", session_id="session_A")
AI calls: run_command(code="sysuse nlsw88, clear", session_id="session_B")
# session_A has 74 obs (auto), session_B has 2,246 obs (nlsw88)
Development
# Clone and set up
git clone https://github.com/SexyERIC0723/stata-ai-fusion.git
cd stata-ai-fusion
uv sync
# Run unit tests (no Stata required)
uv run pytest tests/test_discovery.py -v
# Run integration tests (requires Stata)
uv run pytest tests/test_integration.py -v
# Build Python package
uv build
# Build VS Code extension
cd vscode-extension && npm install && npm run build
Testing
| Test Suite | Count | Requires Stata |
|---|---|---|
test_discovery.py |
39 | No |
test_integration.py |
46 | Yes |
| Total | 85 |
All 85 tests pass on Stata MP 19 (macOS arm64).
Project Structure
stata-ai-fusion/
├── src/stata_ai_fusion/
│ ├── __main__.py # CLI entry point
│ ├── server.py # MCP server + resource registration
│ ├── stata_discovery.py # Auto-detect Stata installation
│ ├── stata_session.py # Interactive & batch session manager
│ ├── graph_cache.py # Graph capture and base64 encoding
│ ├── result_extractor.py # r()/e()/c() result extraction
│ └── tools/ # 10 MCP tool implementations
├── skill/
│ ├── SKILL.md # Main skill routing document (486 lines)
│ └── references/ # 14 reference documents (5,167 lines)
├── vscode-extension/
│ ├── src/ # TypeScript extension source (5 files)
│ ├── syntaxes/ # TextMate grammar
│ └── snippets/ # 30 code snippets
├── tests/ # 85 tests (39 unit + 46 integration)
├── assets/ # Icon, architecture diagrams
└── pyproject.toml
Contributing
Contributions are welcome! Here are some ways to help:
- Bug reports: Open an issue describing the problem, your Stata version, and OS.
- New Skill references: Add a
.mdfile toskill/references/covering a Stata topic. - New MCP tools: Implement a tool in
src/stata_ai_fusion/tools/and register it. - VS Code improvements: Expand syntax grammar or add snippets.
Please run uv run pytest tests/ -v before submitting a PR.
License
MIT -- see LICENSE for details.
Acknowledgments
- Stata by StataCorp
- Model Context Protocol by Anthropic
PyPI • VS Code Marketplace • Releases • 中文文档
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file stata_ai_fusion-0.2.1.tar.gz.
File metadata
- Download URL: stata_ai_fusion-0.2.1.tar.gz
- Upload date:
- Size: 428.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
627f8e433449b5dbe6b4010cad02725644f76c92b95a1f75710a3e6e96608c52
|
|
| MD5 |
b42b23cdda117ee9e9d792deaa152c40
|
|
| BLAKE2b-256 |
d2392ee8745d3eaca768c74ede7401e625ed127678b9113b9f47f6e2a3228d56
|
File details
Details for the file stata_ai_fusion-0.2.1-py3-none-any.whl.
File metadata
- Download URL: stata_ai_fusion-0.2.1-py3-none-any.whl
- Upload date:
- Size: 47.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f587e8e35699bc00cbb1ca99ebccfb5e92fa27622bbc3c42708f3be31dd9f8a
|
|
| MD5 |
f85d7eef3153f82741e77169a0bbd64f
|
|
| BLAKE2b-256 |
c5184c0c84ce0af43edc0302d5c850cf8c2079e213df96426ec55a78decaa4fe
|