A dynamic MCP server for local databases and text files.
Project description
LocalData MCP Server
A powerful, secure MCP server for local databases, spreadsheets, and structured data files with advanced security features and large dataset handling.
✨ Features
🗄️ Multi-Database Support
- SQL Databases: PostgreSQL, MySQL, SQLite
- Document Databases: MongoDB
- Spreadsheets: Excel (.xlsx/.xls), LibreOffice Calc (.ods) with multi-sheet support
- Structured Files: CSV, TSV, JSON, YAML, TOML, XML, INI
- Analytical Formats: Parquet, Feather, Arrow
🔒 Advanced Security
- Path Security: Restricts file access to current working directory only
- SQL Injection Prevention: Parameterized queries and safe table identifiers
- Connection Limits: Maximum 10 concurrent database connections
- Input Validation: Comprehensive validation and sanitization
📊 Large Dataset Handling
- Query Buffering: Automatic buffering for results with 100+ rows
- Large File Support: 100MB+ files automatically use temporary SQLite storage
- Chunk Retrieval: Paginated access to large result sets
- Auto-Cleanup: 10-minute expiry with file modification detection
🛠️ Developer Experience
- Comprehensive Tools: 12 database operation tools
- Error Handling: Detailed, actionable error messages
- Thread Safety: Concurrent operation support
- Backward Compatible: All existing APIs preserved
🚀 Quick Start
Installation
# Using pip
pip install localdata-mcp
# Using uv (recommended)
uv tool install localdata-mcp
# Development installation
git clone https://github.com/ChrisGVE/localdata-mcp.git
cd localdata-mcp
pip install -e .
Configuration
Add to your MCP client configuration:
{
"mcpServers": {
"localdata": {
"command": "localdata-mcp",
"env": {}
}
}
}
Usage Examples
Connect to Databases
# PostgreSQL
connect_database("analytics", "postgresql", "postgresql://user:pass@localhost/db")
# SQLite
connect_database("local", "sqlite", "./data.sqlite")
# CSV Files
connect_database("csvdata", "csv", "./data.csv")
# JSON Files
connect_database("config", "json", "./config.json")
# Excel Spreadsheets (all sheets)
connect_database("sales", "xlsx", "./sales_data.xlsx")
# Excel with specific sheet
connect_database("q1data", "xlsx", "./quarterly.xlsx?sheet=Q1_Sales")
# LibreOffice Calc
connect_database("budget", "ods", "./budget_2024.ods")
# Tab-separated values
connect_database("exports", "tsv", "./export_data.tsv")
# XML structured data
connect_database("config_xml", "xml", "./config.xml")
# INI configuration files
connect_database("settings", "ini", "./app.ini")
# Analytical formats
connect_database("analytics", "parquet", "./data.parquet")
connect_database("features", "feather", "./features.feather")
connect_database("vectors", "arrow", "./vectors.arrow")
Query Data
# Execute queries with automatic result formatting
execute_query("analytics", "SELECT * FROM users LIMIT 50")
# Large result sets use buffering automatically
execute_query_json("analytics", "SELECT * FROM large_table")
Handle Large Results
# Get chunked results for large datasets
get_query_chunk("analytics_1640995200_a1b2", 101, "100")
# Check buffer status
get_buffered_query_info("analytics_1640995200_a1b2")
# Manual cleanup
clear_query_buffer("analytics_1640995200_a1b2")
🔧 Available Tools
| Tool | Description | Use Case |
|---|---|---|
connect_database |
Connect to databases/files | Initial setup |
disconnect_database |
Close connections | Cleanup |
list_databases |
Show active connections | Status check |
execute_query |
Run SQL (markdown output) | Small results |
execute_query_json |
Run SQL (JSON output) | Large results |
describe_database |
Show schema/structure | Exploration |
describe_table |
Show table details | Analysis |
get_table_sample |
Preview table data | Quick look |
get_table_sample_json |
Preview (JSON format) | Development |
find_table |
Locate tables by name | Navigation |
read_text_file |
Read structured files | File access |
get_query_chunk |
Paginated result access | Large data |
get_buffered_query_info |
Buffer status info | Monitoring |
clear_query_buffer |
Manual buffer cleanup | Management |
📋 Supported Data Sources
SQL Databases
- PostgreSQL: Full support with connection pooling
- MySQL: Complete MySQL/MariaDB compatibility
- SQLite: Local file and in-memory databases
Document Databases
- MongoDB: Collection queries and aggregation
Structured Files
Spreadsheet Formats
- Excel (.xlsx, .xls): Full multi-sheet support with automatic table creation
- LibreOffice Calc (.ods): Complete ODS support with sheet handling
- Multi-sheet handling: Each sheet becomes a separate queryable table
Text-Based Formats
- CSV: Large file automatic SQLite conversion
- TSV: Tab-separated values with same features as CSV
- JSON: Nested structure flattening
- YAML: Configuration file support
- TOML: Settings and config files
- XML: Structured XML document parsing
- INI: Configuration file format support
Analytical Formats
- Parquet: High-performance columnar data format
- Feather: Fast binary format for data interchange
- Arrow: In-memory columnar format support
🛡️ Security Features
Path Security
# ✅ Allowed - current directory and subdirectories
"./data/users.csv"
"data/config.json"
"subdir/file.yaml"
# ❌ Blocked - parent directory access
"../etc/passwd"
"../../sensitive.db"
"/etc/hosts"
SQL Injection Prevention
# ✅ Safe - parameterized queries
describe_table("mydb", "users") # Validates table name
# ❌ Blocked - malicious input
describe_table("mydb", "users; DROP TABLE users; --")
Resource Limits
- Connection Limit: Maximum 10 concurrent connections
- File Size Threshold: 100MB triggers temporary storage
- Query Buffering: Automatic for 100+ row results
- Auto-Cleanup: Buffers expire after 10 minutes
📊 Performance & Scalability
Large File Handling
- Files over 100MB automatically use temporary SQLite storage
- Memory-efficient streaming for large datasets
- Automatic cleanup of temporary files
Query Optimization
- Results with 100+ rows automatically use buffering system
- Chunk-based retrieval for large datasets
- File modification detection for cache invalidation
Concurrency
- Thread-safe connection management
- Concurrent query execution support
- Resource pooling and limits
🧪 Testing & Quality
✅ 100% Test Coverage
- 100+ comprehensive test cases
- Security vulnerability testing
- Performance benchmarking
- Edge case validation
🔒 Security Validated
- Path traversal prevention
- SQL injection protection
- Resource exhaustion testing
- Malicious input handling
⚡ Performance Tested
- Large file processing
- Concurrent connection handling
- Memory usage optimization
- Query response times
🔄 API Compatibility
All existing MCP tool signatures remain 100% backward compatible. New functionality is additive only:
- ✅ All original tools work unchanged
- ✅ Enhanced responses with additional metadata
- ✅ New buffering tools for large datasets
- ✅ Improved error messages and validation
📖 Examples
Basic Database Operations
# Connect to SQLite
connect_database("sales", "sqlite", "./sales.db")
# Explore structure
describe_database("sales")
describe_table("sales", "orders")
# Query data
execute_query("sales", "SELECT product, SUM(amount) FROM orders GROUP BY product")
Large Dataset Processing
# Connect to large CSV
connect_database("bigdata", "csv", "./million_records.csv")
# Query returns buffer info for large results
result = execute_query_json("bigdata", "SELECT * FROM data WHERE category = 'A'")
# Access results in chunks
chunk = get_query_chunk("bigdata_1640995200_a1b2", 1, "1000")
Multi-Database Analysis
# Connect multiple sources
connect_database("postgres", "postgresql", "postgresql://localhost/prod")
connect_database("config", "yaml", "./config.yaml")
connect_database("logs", "json", "./logs.json")
# Query across sources (in application logic)
user_data = execute_query("postgres", "SELECT * FROM users")
config = read_text_file("./config.yaml", "yaml")
Multi-Sheet Spreadsheet Handling
LocalData MCP Server provides comprehensive support for multi-sheet spreadsheets (Excel and LibreOffice Calc):
Automatic Multi-Sheet Processing
# Connect to Excel file - all sheets become separate tables
connect_database("workbook", "xlsx", "./financial_data.xlsx")
# Query specific sheet (table names are sanitized sheet names)
execute_query("workbook", "SELECT * FROM Q1_Sales")
execute_query("workbook", "SELECT * FROM Q2_Budget")
execute_query("workbook", "SELECT * FROM Annual_Summary")
Single Sheet Selection
# Connect to specific sheet only using ?sheet=SheetName syntax
connect_database("q1only", "xlsx", "./financial_data.xlsx?sheet=Q1 Sales")
# The data is available as the default table
execute_query("q1only", "SELECT * FROM data")
Sheet Name Sanitization
Sheet names are automatically sanitized for SQL compatibility:
| Original Sheet Name | SQL Table Name |
|---|---|
| "Q1 Sales" | Q1_Sales |
| "2024-Budget" | _2024_Budget |
| "Summary & Notes" | Summary__Notes |
Discovering Available Sheets
# Connect to multi-sheet workbook
connect_database("workbook", "xlsx", "./data.xlsx")
# List all available tables (sheets)
describe_database("workbook")
# Get sample data from specific sheet
get_table_sample("workbook", "Sheet1")
🚧 Roadmap
Completed (v1.1.0)
- Spreadsheet Formats: Excel (.xlsx/.xls), LibreOffice Calc (.ods) with full multi-sheet support
- Enhanced File Formats: XML, INI, TSV support
- Analytical Formats: Parquet, Feather, Arrow support
Planned Features
- Caching Layer: Configurable query result caching
- Connection Pooling: Advanced connection management
- Streaming APIs: Real-time data processing
- Monitoring Tools: Connection and performance metrics
- Export Capabilities: Query results to various formats
🛠️ Troubleshooting
Spreadsheet Format Issues
Large Excel Files
# For files over 100MB, temporary SQLite storage is used automatically
connect_database("largefile", "xlsx", "./large_workbook.xlsx")
# Monitor processing with describe_database
describe_database("largefile") # Shows processing status
Sheet Name Conflicts
# If sheet names conflict after sanitization, use specific sheet selection
connect_database("specific", "xlsx", "./workbook.xlsx?sheet=Sheet1")
# Check sanitized names
describe_database("workbook") # Lists all table names
Format Detection
# Ensure correct file extension for proper format detection
connect_database("data", "xlsx", "./file.xlsx") # ✅ Correct
connect_database("data", "xlsx", "./file.xls") # ⚠️ May cause issues
# Use explicit format specification
connect_database("data", "xls", "./old_format.xls") # ✅ Better
Multi-Sheet Selection Issues
# Sheet names with special characters need URL encoding
connect_database("data", "xlsx", "./file.xlsx?sheet=Q1%20Sales") # For "Q1 Sales"
# Or use the sanitized table name after connecting all sheets
connect_database("workbook", "xlsx", "./file.xlsx")
execute_query("workbook", "SELECT * FROM Q1_Sales") # Use sanitized name
Performance Optimization
# For better performance with large spreadsheets:
# 1. Use specific sheet selection when possible
connect_database("q1", "xlsx", "./large.xlsx?sheet=Q1_Data")
# 2. Use LIMIT clauses for large datasets
execute_query("data", "SELECT * FROM large_sheet LIMIT 1000")
# 3. Consider converting to Parquet for repeated analysis
# (Manual conversion outside of LocalData MCP recommended for very large files)
General File Issues
Path Security Errors
# ✅ Allowed paths (current directory and subdirectories)
connect_database("data", "csv", "./data/file.csv")
connect_database("data", "csv", "subfolder/file.csv")
# ❌ Blocked paths (parent directories)
connect_database("data", "csv", "../data/file.csv") # Security error
Connection Limits
# Maximum 10 concurrent connections
# Use disconnect_database() to free up connections when done
disconnect_database("old_connection")
🤝 Contributing
Contributions welcome! Please read our Contributing Guidelines for details.
Development Setup
git clone https://github.com/ChrisGVE/localdata-mcp.git
cd localdata-mcp
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -e ".[dev]"
pytest
📄 License
MIT License - see the LICENSE file for details.
🔗 Links
- GitHub: localdata-mcp
- PyPI: localdata-mcp
- MCP Protocol: Model Context Protocol
- FastMCP: FastMCP Framework
📊 Stats
📚 Additional Resources
- FAQ: Common questions and troubleshooting
- Troubleshooting Guide: Comprehensive problem resolution
- Advanced Examples: Production-ready usage patterns
- Blog Post: Technical deep dive and use cases
🤔 Need Help?
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: Available in GitHub profile
- Community: Join MCP community forums
🏷️ Tags
mcp model-context-protocol database postgresql mysql sqlite mongodb spreadsheet excel xlsx ods csv tsv json yaml toml xml ini parquet feather arrow ai machine-learning data-integration python security performance
Made with ❤️ for the MCP Community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file localdata_mcp-1.2.0.tar.gz.
File metadata
- Download URL: localdata_mcp-1.2.0.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
deabd3c379c08ad501e9e64c0a0af891e3838ae903bf160c8a403e6d0d24d0c5
|
|
| MD5 |
13e2ce7db26601fc29eedad52df10ac3
|
|
| BLAKE2b-256 |
34ed0fded3c979680b3f2fc1a325326694e852dcd845baeff46e02adb9552ad9
|
Provenance
The following attestation bundles were made for localdata_mcp-1.2.0.tar.gz:
Publisher:
publish-to-pypi.yml on ChrisGVE/localdata-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
localdata_mcp-1.2.0.tar.gz -
Subject digest:
deabd3c379c08ad501e9e64c0a0af891e3838ae903bf160c8a403e6d0d24d0c5 - Sigstore transparency entry: 452858455
- Sigstore integration time:
-
Permalink:
ChrisGVE/localdata-mcp@6c4b9ccd5b5757dd588dba47f0e7887204fc0190 -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/ChrisGVE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@6c4b9ccd5b5757dd588dba47f0e7887204fc0190 -
Trigger Event:
push
-
Statement type:
File details
Details for the file localdata_mcp-1.2.0-py3-none-any.whl.
File metadata
- Download URL: localdata_mcp-1.2.0-py3-none-any.whl
- Upload date:
- Size: 18.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0676bd6eddebb690edc2c018f67d0933834b3e9cf271d226403533b7dc437257
|
|
| MD5 |
00e2d8ddbbf421ea898fd63b72aae5f3
|
|
| BLAKE2b-256 |
0535674cf929e14f212909d652f7e78693360ec4eff1b76854cbc8611c6efb11
|
Provenance
The following attestation bundles were made for localdata_mcp-1.2.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on ChrisGVE/localdata-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
localdata_mcp-1.2.0-py3-none-any.whl -
Subject digest:
0676bd6eddebb690edc2c018f67d0933834b3e9cf271d226403533b7dc437257 - Sigstore transparency entry: 452858457
- Sigstore integration time:
-
Permalink:
ChrisGVE/localdata-mcp@6c4b9ccd5b5757dd588dba47f0e7887204fc0190 -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/ChrisGVE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@6c4b9ccd5b5757dd588dba47f0e7887204fc0190 -
Trigger Event:
push
-
Statement type: