DataScience MCP with Apache Arrow-based cache for efficient data handling

Project description

Arrow Cache MCP

A powerful data science toolkit and MCP server leveraging Apache Arrow for efficient memory management and data processing.

Features

High Performance Data Handling: Efficient memory management using Apache Arrow
Intelligent Memory Management: Automatic partitioning and compression of large datasets
Multiple File Format Support: Load datasets from CSV, Parquet, Arrow, Feather, JSON, Excel, and more
SQL Query Capabilities: Advanced SQL analytics with DuckDB integration
Visualization Support: Create plots and charts from your datasets
AI-Powered Data Analysis: Optional Claude integration for natural language queries
Memory Overflow Protection: Automatic spilling to disk when memory limits are reached

Components

Resources

The server exposes cached datasets as resources:

Custom arrowcache:// URI scheme for accessing datasets
Each dataset resource has detailed metadata about size, shape, and structure

Tools

The server implements the following tools:

run_sql_query: Execute SQL queries against cached datasets
- Use _cache_<dataset_name> syntax in FROM clause
load_dataset: Load a dataset from a file or URL into the cache
- Support for many common file formats
get_dataset_sample: Get a sample of rows from a dataset
- Useful for previewing large datasets
get_dataset_info: Get detailed information about a dataset
- Schema, row counts, column statistics, etc.
remove_dataset: Remove a dataset from the cache
create_plot: Create visualizations from datasets
- Support for various plot types (line, bar, scatter, etc.)
get_memory_usage: Get detailed memory usage statistics

Installation

pip install arrow-cache-mcp

With optional dependencies:

# For geospatial data support
pip install "arrow-cache-mcp[geospatial]"

# For enhanced visualization
pip install "arrow-cache-mcp[viz]"

# For all features
pip install "arrow-cache-mcp[geospatial,viz]"

Configuration

Environment Variables

ARROW_CACHE_MEMORY_LIMIT: Maximum memory usage in bytes (default: 4GB)
ARROW_CACHE_SPILL_DIRECTORY: Directory for spilling data to disk when memory limit is reached
ARROW_CACHE_SPILL_TO_DISK: Whether to allow spilling to disk (true/false)
ANTHROPIC_API_KEY: API key for Claude integration (optional)

Quickstart

Configure in Claude Desktop

On MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json On Windows: %APPDATA%/Claude/claude_desktop_config.json

Development/Unpublished Servers Configuration

"mcpServers": {
  "arrow-cache-mcp": {
    "command": "uv",
    "args": [
      "--directory",
      "/PATH/TO/arrow-cache-mcp",
      "run",
      "arrow-cache-mcp"
    ]
  }
}

Published Servers Configuration

"mcpServers": {
  "arrow-cache-mcp": {
    "command": "uvx",
    "args": [
      "arrow-cache-mcp"
    ]
  }
}

Example Usage

Load a dataset:

I'd like to load the NYC Yellow Taxi dataset from January 2023

Query the data:

How many taxi trips were there per day of the week?

Create a visualization:

Create a bar chart showing average fare amount by day of week

Development

Building and Publishing

To prepare the package for distribution:

Sync dependencies and update lockfile:

uv sync

Build package distributions:

uv build

This will create source and wheel distributions in the dist/ directory.

Publish to PyPI:

uv publish

Note: You'll need to set PyPI credentials via environment variables or command flags.

Debugging

Since MCP servers run over stdio, debugging can be challenging. For the best debugging experience, we recommend using the MCP Inspector.

You can launch the MCP Inspector via npm with this command:

npx @modelcontextprotocol/inspector uv --directory /PATH/TO/arrow-cache-mcp run arrow-cache-mcp

Upon launching, the Inspector will display a URL that you can access in your browser to begin debugging.

Architecture

Arrow Cache MCP builds on these key components:

Arrow Cache: A memory-managed caching system for Arrow tables
DuckDB: A high-performance analytical database
PyArrow: Python bindings for Apache Arrow
MCP Protocol: Model Context Protocol for AI agent integration

The system features smart memory management with automatic partitioning, spilling, and compression to efficiently handle datasets of any size while preventing out-of-memory errors.

Project details

Release history Release notifications | RSS feed

0.1.1

Apr 16, 2025

This version

0.1

Apr 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrow_cache_mcp-0.1.tar.gz (40.3 kB view details)

Uploaded Apr 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arrow_cache_mcp-0.1-py3-none-any.whl (32.3 kB view details)

Uploaded Apr 10, 2025 Python 3

File details

Details for the file arrow_cache_mcp-0.1.tar.gz.

File metadata

Download URL: arrow_cache_mcp-0.1.tar.gz
Upload date: Apr 10, 2025
Size: 40.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for arrow_cache_mcp-0.1.tar.gz
Algorithm	Hash digest
SHA256	`b5b979fe5ede125f7f8fd793058bbbeccbbe0d94cf17674913bc88b12d864291`
MD5	`7c4132d69f84fc57c41e562a31006b00`
BLAKE2b-256	`e8cba4bfde1bec4541c6a71bec4cf33ff61c05b15de91b0a78d4173b22788998`

See more details on using hashes here.

File details

Details for the file arrow_cache_mcp-0.1-py3-none-any.whl.

File metadata

Download URL: arrow_cache_mcp-0.1-py3-none-any.whl
Upload date: Apr 10, 2025
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for arrow_cache_mcp-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`794ba8aaaecc8ec25d136e1d6bbd634ca0cd716e28f5fde32cd7470dda105d3c`
MD5	`8ebb02bc26bc937ba493f21ab7f40938`
BLAKE2b-256	`d0c84e8571e98e1ad24157cb697c9633616ce833c5e8c45250f7eedbfb31c11d`

See more details on using hashes here.

arrow-cache-mcp 0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Arrow Cache MCP

Features

Components

Resources

Tools

Installation

Configuration

Environment Variables

Quickstart

Configure in Claude Desktop

Example Usage

Development

Building and Publishing

Debugging

Architecture

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes