Skip to main content

MCP server for databases, spreadsheets, structured files, and directed graphs.

Project description

LocalData MCP Server

LocalData MCP Server

License: MIT GitHub Release CI PyPI version Python 3.10+ Documentation FastMCP Verified on MseeP PyPI downloads GitHub stars

LocalData MCP gives LLM agents access to local and remote data — databases, files, graphs, and structured documents — along with a full data science toolkit for analysis and modeling. It exposes 52 MCP tools across 13 database types and 20+ file formats, with memory-bounded streaming so agents can work safely on large datasets without exceeding available RAM.

MseeP.ai Security Assessment Badge

Quick Start

# Install permanently
uv tool install localdata-mcp

# Or run directly without installing
uvx localdata-mcp

First-run note: Data science dependencies (scipy, scikit-learn, statsmodels, geopandas) total around 200 MB and are downloaded on first use. Subsequent starts reuse the cache. If your MCP client times out on the first launch, reconnect — the next start will be immediate.

Add to your MCP client configuration:

{
  "mcpServers": {
    "localdata": {
      "command": "localdata-mcp"
    }
  }
}

For uvx (no permanent install):

{
  "mcpServers": {
    "localdata": {
      "command": "uvx",
      "args": ["localdata-mcp"]
    }
  }
}

Then connect to any supported source and start querying:

connect_database("sales", "postgresql", "postgresql://user:pass@localhost/db")
execute_query("sales", "SELECT product, SUM(amount) FROM orders GROUP BY product")

connect_database("data", "csv", "./records.csv")
analyze_hypothesis_test("data", "SELECT amount, region FROM data", column="amount", group_column="region")

Feature Overview

Core Database (8 tools)

Connect, query, and inspect databases and files. All queries execute within configurable memory limits (default 2 GB) with automatic chunked streaming for large result sets.

Tool Description
connect_database Open a connection to any supported database or file
disconnect_database Close a connection
list_databases List active connections
execute_query Run SQL with streaming, chunking, and preflight mode
describe_database Show schema and table list
describe_table Column types, indexes, row count
find_table Locate a table across all active connections
analyze_query_preview Estimate query cost before execution

Streaming and Memory (9 tools)

Tool Description
next_chunk Retrieve the next chunk of a streamed result
request_data_chunk Fetch a specific chunk by row range
request_multiple_chunks Batch-fetch multiple chunks in one call
manage_memory_bounds View and configure memory limits
get_streaming_status Check active streams and buffer usage
clear_streaming_buffer Free memory from a specific buffer
get_query_metadata Rich metadata for a completed query
cancel_query_operation Cancel a running or buffered query
get_data_quality_report Column statistics, null rates, and quality metrics

Tree / Structured Data (10 tools)

Navigate and edit TOML, JSON, and YAML files as navigable trees. Supports full CRUD with auto-creation of ancestor nodes and round-trip export to any supported format.

Tool Description
get_node / get_children Navigate the tree
set_node / delete_node Create or remove nodes
get_value / set_value / delete_key Read and write properties
list_keys List key-value pairs at a node
move_node Relocate a node within the tree
export_structured Export as TOML, JSON, or YAML

Graph (14 tools)

Work with DOT, GML, GraphML, and Mermaid files as directed multigraphs. Supports full CRUD on nodes and edges, shortest-path and all-paths queries, structural statistics, and multi-format export.

Tool Description
get_node_graph / get_neighbors / get_edges Navigate the graph
set_node_graph / delete_node_graph Create or remove nodes
add_edge / remove_edge Manage edges
get_value_graph / set_value_graph / delete_key_graph / list_keys_graph Node properties
find_path Shortest or all paths between two nodes
get_graph_stats Node/edge counts, density, DAG validation
export_graph Export as DOT, GML, GraphML, or Mermaid

Search and Transform (2 tools)

Tool Description
search_data Regex search across query results
transform_data Apply column transformations to result sets

Schema and Audit (3 tools)

Tool Description
export_schema Export full schema as JSON
get_query_log Recent query execution history
get_error_log Recent error log

System (2 tools)

Tool Description
check_compatibility Verify API backward compatibility
get_metrics Server performance and resource metrics

Data Science (12 tools)

Run statistical analysis, modeling, and pattern detection directly on query results from any connected source.

Tool Domain
analyze_hypothesis_test Statistical Analysis
analyze_anova Statistical Analysis
analyze_effect_sizes Statistical Analysis
analyze_regression Regression and Modeling
evaluate_model_performance Regression and Modeling
analyze_clusters Pattern Recognition
detect_anomalies Pattern Recognition
reduce_dimensions Pattern Recognition
analyze_time_series Time Series
forecast_time_series Time Series
analyze_rfm Business Intelligence
analyze_ab_test Business Intelligence

Supported Data Sources

Databases

Type Engines
SQL SQLite, PostgreSQL, MySQL, DuckDB
SQL (enterprise) Oracle, MS SQL Server (pip install localdata-mcp[enterprise])
Document MongoDB, CouchDB (pip install localdata-mcp[modern-databases])
Key-value Redis (pip install localdata-mcp[modern-databases])
Search Elasticsearch (pip install localdata-mcp[modern-databases])
Time series InfluxDB (pip install localdata-mcp[modern-databases])
Graph Neo4j (pip install localdata-mcp[modern-databases])
RDF / SPARQL Turtle (.ttl), N-Triples (.nt), remote SPARQL endpoints

File Formats

Category Formats
Tabular CSV, TSV
Structured JSON, JSONL, YAML, TOML, XML, INI
Spreadsheet Excel (.xlsx, .xls), LibreOffice Calc (.ods), Apple Numbers (.numbers)
Analytical Parquet, Feather, Arrow, HDF5
Graph DOT (Graphviz), GML, GraphML, Mermaid
RDF Turtle (.ttl), N-Triples (.nt)

Multi-sheet spreadsheets are fully supported: each sheet becomes a separately queryable table. Connect to a specific sheet with ?sheet=SheetName in the path.

Data Science Domains

Statistical Analysis — t-tests, chi-squared, Mann-Whitney, Kruskal-Wallis, and related hypothesis tests; one-way ANOVA with post-hoc tests; Cohen's d, eta-squared, and other effect size measures.

Regression and Modeling — linear, polynomial, logistic, ridge, lasso, and elastic net regression; model evaluation with R², RMSE, MAE, and classification metrics; automated feature selection.

Pattern Recognition — K-means, DBSCAN, and hierarchical clustering; anomaly detection via isolation forest, LOF, and one-class SVM; dimensionality reduction with PCA, t-SNE, and UMAP.

Time Series — decomposition, stationarity testing, autocorrelation analysis; ARIMA, SARIMA, and ETS forecasting; change point detection; multivariate analysis with VAR, Granger causality, and cointegration tests.

Business Intelligence — A/B test statistical analysis; RFM customer segmentation; cohort analysis, CLV modeling, and funnel analysis.

Geospatial — distance and coordinate calculations, spatial joins, interpolation, and network analysis.

Optimization — linear programming, constrained optimization, assignment problems, and network optimization.

Sampling and Estimation — bootstrap confidence intervals, Bayesian estimation, Monte Carlo simulation, and stratified sampling.

Architecture

  • Intention-driven interface — tools accept semantic parameters ("find strong correlations") rather than requiring statistical procedure names or threshold values
  • Progressive disclosure — simple calls return high-level insights with sensible defaults; advanced parameters are available when needed
  • Streaming-first execution — all operations are designed for chunked processing; tools automatically switch strategies based on data size, keeping memory usage within configured bounds
  • Composition metadata — every tool result includes metadata that downstream tools can use directly, enabling chained analysis without manual wiring

Configuration

LocalData MCP uses environment variables for optional settings. The defaults work for most cases.

Variable Default Description
LOCALDATA_MEMORY_LIMIT_MB 2048 Maximum memory per query result (MB)
LOCALDATA_MAX_CONNECTIONS 10 Maximum concurrent database connections
LOCALDATA_CHUNK_SIZE 500 Default rows per streaming chunk
LOCALDATA_BUFFER_TTL 600 Streaming buffer expiry in seconds
LOCALDATA_WORKING_DIR process cwd Root directory for file access (file paths are restricted to this tree)

Set in your MCP server configuration under "env", or in a .env file in the working directory.

Documentation

Development

git clone https://github.com/ChrisGVE/localdata-mcp.git
cd localdata-mcp
uv sync --all-extras
uv run pytest

The test suite includes 1,600+ unit tests, 234+ integration tests, and 62 enterprise-scale tests across 7 database types with 100K rows each.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before submitting a pull request.

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localdata_mcp-2.0.0.tar.gz (663.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localdata_mcp-2.0.0-py3-none-any.whl (880.8 kB view details)

Uploaded Python 3

File details

Details for the file localdata_mcp-2.0.0.tar.gz.

File metadata

  • Download URL: localdata_mcp-2.0.0.tar.gz
  • Upload date:
  • Size: 663.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for localdata_mcp-2.0.0.tar.gz
Algorithm Hash digest
SHA256 c13528ff6915951de3ac55eb775234631ff776b231f2db862a75dd1772dcf2de
MD5 012090b443af41348bb358ae6e3e3f43
BLAKE2b-256 bef49541e76b742417bc8205263a04d81bfc4cf032d2bfd59d952b8fd31446f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for localdata_mcp-2.0.0.tar.gz:

Publisher: publish-to-pypi.yml on ChrisGVE/localdata-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file localdata_mcp-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: localdata_mcp-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 880.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for localdata_mcp-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea6c11b9de4bb98fabb82b2327f9bea100583e794324d2c1f640f7a43b6bc5eb
MD5 fa3231ab3d94a5e8cc6f76aef379b63b
BLAKE2b-256 d90dac29124465910840f4344ed19b82ee7247d4f764a6bc1663c21c38e05d92

See more details on using hashes here.

Provenance

The following attestation bundles were made for localdata_mcp-2.0.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on ChrisGVE/localdata-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page