MCP server that exposes Excel files as SQL-queryable tables via DuckDB
Project description
mcp-server-excel-sql
MCP server exposing Excel files as SQL-queryable DuckDB views with thread-safe concurrent access.
Features
- SQL queries on Excel via DuckDB views
- RAW (all_varchar) or ASSISTED (sheet_overrides) modes
- Multi-row headers, type hints, unpivot, column renames
- Auto-refresh on file changes (--watch)
- Thread-safe concurrent access (HTTP/SSE transports)
- Isolated query timeouts (per-connection interrupt)
Installation
pipx install mcp-server-excel-sql
Usage
CLI
# STDIO mode (single-threaded)
mcp-excel --path /data/excel --watch --overrides config.yaml
# HTTP mode (concurrent, ~10 users)
mcp-excel --path /data/excel --transport streamable-http --port 8000
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"excel": {
"command": "mcp-excel",
"args": ["--path", "/Users/your-username/data/excel"]
}
}
}
With overrides:
{
"mcpServers": {
"excel": {
"command": "mcp-excel",
"args": [
"--path", "/path/to/excel/files",
"--overrides", "/path/to/config.yaml"
]
}
}
}
MCP Tools
tool_load_dir - Load Excel directory into views tool_query - Execute SELECT (read-only, timeout/limit enforced) tool_list_tables - List views with metadata tool_get_schema - DESCRIBE table tool_refresh - Rescan filesystem (incremental or full)
Table Naming
Format: <alias>.<filepath>.<sheet> (dot-separated, lowercase, sanitized)
Sanitization:
- Spaces →
_ - Special chars → removed
- Allowed:
[a-z0-9_$]
Alias: Auto-generated from directory name
Examples:
/data/sales/Q1-2024.xlsx → "sales.q12024.summary"
/reports/P&L (Final).xlsx → "reports.plfinal.sheet1"
IMPORTANT: Dots require quoted identifiers in SQL:
SELECT * FROM "sales.q12024.summary" -- correct
SELECT * FROM sales.q12024.summary -- fails (Catalog Error)
System Views
<alias>.__files - File metadata (path, sheet_count, total_rows, mtime)
<alias>.__tables - Table metadata (table_name, file, sheet, mode, est_rows)
Query: SELECT * FROM "sales.__files"
Modes
RAW: read_xlsx(..., all_varchar=true, header=false)
ASSISTED: Apply per-sheet overrides
sales.xlsx:
sheet_overrides:
Summary:
skip_rows: 3
skip_footer: 2
header_rows: 2
drop_regex: "^Total:"
column_renames:
"col_0": "region"
type_hints:
amount: "DECIMAL(10,2)"
date: "DATE"
unpivot:
id_vars: ["Region"]
value_vars: ["Jan", "Feb"]
var_name: "Month"
value_name: "Sales"
Concurrency & Thread Safety
Design
STDIO mode: Single shared connection (single-threaded) HTTP/SSE mode: Isolated connections per request (concurrent)
Supports ~10-20 concurrent users with <1ms overhead per request.
Thread Safety Features
- Isolated Connections - Each HTTP request gets its own DuckDB connection
- Protected Catalog - RLock guards all metadata dictionary access
- Per-Connection Timeouts - Query interrupts don't affect other users
- Deadlock-Free - Sequential lock acquisition (never nested)
Critical Fixes
Problem 1: Timeout Interference
- Before: One user's timeout killed all concurrent queries
- After: Isolated connections ensure timeouts only affect their own query
Problem 2: Catalog Race
- Before: Concurrent refresh could corrupt table metadata
- After: RLock protects all catalog reads/writes
Problem 3: Dictionary Mutation
- Before: Python dict operations not atomic (GIL can be released)
- After: All dict mutations locked to prevent corruption
Performance
- STDIO mode: 0% overhead (no changes to single-threaded path)
- HTTP mode: ~0.6ms per request (connection + lock overhead)
- Tested: 20 concurrent workers, 100+ operations
Testing
73/73 tests pass including:
- 6 concurrency tests (parallel queries, timeout isolation, refresh safety)
- 4 stress tests (100+ concurrent operations, memory leak detection)
Security
- Read-only: BEGIN TRANSACTION READ ONLY (DuckDB-enforced, blocks all write operations)
- Path-confined: Root path validation, no traversal
- Timeout: threading.Timer → conn.interrupt() (isolated per connection)
- Row limit: fetchmany(max_rows + 1)
Examples
Finance examples for Kopitiam Kita Sdn Bhd, a Malaysian coffeehouse chain:
# Generate example files
python examples/create_finance_examples.py
# Load and query
mcp-excel --path examples --alias finance --overrides examples/finance_overrides.yaml
# Query examples
SELECT SUM(COALESCE(debit, 0)) as total_debits
FROM "finance.general_ledger.entries";
SELECT region, SUM(revenue) as total_revenue
FROM "finance.revenue_by_segment.revenue"
GROUP BY region ORDER BY total_revenue DESC;
Includes:
- 10 Excel files with 3,100+ financial records (MYR currency)
- General Ledger, Financial Statements, AR Aging, Revenue Analysis
- Budget Variance, Invoices, Trial Balance, Cash Flow Forecast
- Complete prompt chain sequences in
examples/README.md
See examples/README.md for detailed usage and SQL query patterns.
Development
pip install -e ".[dev]"
pytest --cov=mcp_excel tests/
python -m build
Tests: 73 passing (11 unit + 47 integration + 10 regression + 6 concurrency + 4 stress) Coverage: Comprehensive test coverage including race condition scenarios
Architecture Notes
Transport Modes
STDIO (default):
- Single shared in-memory DuckDB connection
- Single-threaded (no concurrency concerns)
- Ideal for: Claude Desktop, local CLI usage
HTTP/SSE (--transport streamable-http):
- Persistent DuckDB file (temp, auto-cleanup)
- Isolated connection per request
- RLock-protected shared state (catalog, load_configs)
- Ideal for: Multiple concurrent users, web APIs
State Management
Global State (thread-safe via locks):
catalog- Table metadata dict (protected by_catalog_lock)load_configs- Directory configurations (protected by_load_configs_lock)registry- Name collision tracking (internal lock)
Per-Request State (isolated):
- DuckDB connection (HTTP mode only)
- Query timeout timer
- Transaction state
Scale Limits
- Current design: Optimized for ~10 concurrent users
- Tested with: 20 concurrent workers
- Upgrade path: If >20 users needed, consider connection pooling
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_server_excel_sql-0.2.0.tar.gz.
File metadata
- Download URL: mcp_server_excel_sql-0.2.0.tar.gz
- Upload date:
- Size: 247.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec8120f19bdf31c64551e566eff9635c870421ffbf86b9ab223055fe7b7bfa1d
|
|
| MD5 |
817f5107f9f047eeeeea6a22669842b9
|
|
| BLAKE2b-256 |
1999a05980270a793db702a5f6349a3dff6ce2cb9212ca357ddcedb0ede13cf3
|
File details
Details for the file mcp_server_excel_sql-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mcp_server_excel_sql-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8224047b9412f301b6065bf2d3c2d8ca8ee955c6786470096c2bc79729ae247
|
|
| MD5 |
19019cdfbdc12ff8d707c76dcf438412
|
|
| BLAKE2b-256 |
558a13c15017279dfcc978038d200238a8abc293040875737be9caf4bc6013a2
|