Skip to main content

Comprehensive Model Context Protocol server with 40 statistical analysis tools, HTTP transport, and Server-Sent Events

Project description

RMCP: R Model Context Protocol Server

Python application Documentation PyPI version Downloads License Python

Version 0.3.10 - A comprehensive Model Context Protocol (MCP) server with 40 statistical analysis tools across 9 categories. RMCP enables AI assistants and applications to perform sophisticated statistical modeling, econometric analysis, machine learning, time series analysis, and data science tasks seamlessly through natural conversation.

๐Ÿ†• Python 3.10+ required for modern type hints and performance improvements.

๐ŸŽ‰ Now with 40 statistical tools across 9 categories including natural language formula building and intelligent error recovery!

๐Ÿš€ Quick Start

pip install rmcp
# Check R packages are installed
rmcp check-r-packages

# Start the MCP server
rmcp start

That's it! RMCP is now ready to handle statistical analysis requests via the Model Context Protocol.

๐Ÿ‘‰ See Working Examples โ†’ - Copy-paste ready commands with real datasets!

โœจ Features

๐Ÿ“Š Comprehensive Statistical Analysis (40 Tools)

Regression & Correlation โœ…

  • Linear Regression (linear_model): OLS with robust standard errors, Rยฒ, p-values
  • Logistic Regression (logistic_regression): Binary classification with odds ratios and accuracy
  • Correlation Analysis (correlation_analysis): Pearson, Spearman, and Kendall correlations

Time Series Analysis โœ…

  • ARIMA Modeling (arima_model): Autoregressive integrated moving average with forecasting
  • Time Series Decomposition (decompose_timeseries): Trend, seasonal, remainder components
  • Stationarity Testing (stationarity_test): ADF, KPSS, Phillips-Perron tests

Data Transformation โœ…

  • Lag/Lead Variables (lag_lead): Create time-shifted variables for analysis
  • Winsorization (winsorize): Handle outliers by capping extreme values
  • Differencing (difference): Create stationary series for time series analysis
  • Standardization (standardize): Z-score, min-max, robust scaling

Statistical Testing โœ…

  • T-Tests (t_test): One-sample, two-sample, paired t-tests
  • ANOVA (anova): Analysis of variance with Types I/II/III
  • Chi-Square Tests (chi_square_test): Independence and goodness-of-fit
  • Normality Tests (normality_test): Shapiro-Wilk, Jarque-Bera, Anderson-Darling

Descriptive Statistics โœ…

  • Summary Statistics (summary_stats): Comprehensive descriptives with grouping
  • Outlier Detection (outlier_detection): IQR, Z-score, Modified Z-score methods
  • Frequency Tables (frequency_table): Counts and percentages with sorting

Advanced Econometrics โœ…

  • Panel Regression (panel_regression): Fixed/random effects for longitudinal data
  • Instrumental Variables (instrumental_variables): 2SLS with endogeneity testing
  • Vector Autoregression (var_model): Multivariate time series modeling

Machine Learning โœ…

  • K-Means Clustering (kmeans_clustering): Unsupervised clustering with validation
  • Decision Trees (decision_tree): Classification and regression trees
  • Random Forest (random_forest): Ensemble methods with variable importance

Data Visualization โœ…

  • Scatter Plots (scatter_plot): Correlation plots with trend lines
  • Histograms (histogram): Distribution analysis with density overlay
  • Box Plots (boxplot): Quartile analysis with outlier detection
  • Time Series Plots (time_series_plot): Temporal data visualization
  • Correlation Heatmaps (correlation_heatmap): Matrix visualization
  • Regression Diagnostics (regression_plot): Model validation plots

File Operations โœ…

  • CSV Import (read_csv): Flexible data loading with parsing options and URL support
  • Excel Import (read_excel): Read .xlsx/.xls files with sheet and range selection
  • JSON Import (read_json): Convert JSON to tabular format with flattening
  • CSV Export (write_csv): Data export with formatting control
  • Dataset Information (data_info): Comprehensive data structure analysis
  • Data Filtering (filter_data): Complex conditional data selection

Natural Language & User Experience โœ…

  • Formula Builder (build_formula): Convert natural language to R formulas
  • Formula Validation (validate_formula): Check formula syntax against data
  • Error Recovery (suggest_fix): Intelligent error diagnosis and solutions
  • Data Validation (validate_data): Pre-analysis data quality checks
  • Example Datasets (load_example): Built-in datasets for learning and testing

Production Ready

  • MCP Protocol: Full JSON-RPC 2.0 compliance
  • Multiple Transports: stdio transport (primary) and HTTP transport with SSE
  • Error Handling: Comprehensive error reporting and validation
  • Security: Safe R execution with controlled environment

๐ŸŽฏ Real-World Usage

RMCP works through natural conversation with AI assistants. Here's how users actually interact with it:

Business Analysis

You ask Claude:

"I have sales data and marketing spend. Can you analyze the ROI and tell me how effective our marketing is?"

Claude responds:

"Based on the regression analysis, every $1 spent on marketing generates $4.70 in additional sales. The relationship is highly significant (p < 0.001) and explains 97.9% of sales variation."

Economic Research

You ask Claude:

"I have GDP growth and unemployment data for several countries. Can you test if this supports Okun's Law?"

Claude responds:

"The correlation analysis shows strong support for Okun's Law with r = -0.944. Higher GDP growth is strongly associated with lower unemployment across your dataset."

Data Science

You ask Claude:

"Help me build a customer churn prediction model using tenure and monthly charges."

Claude responds:

"I've built a logistic regression model that achieves 100% accuracy. Longer tenure significantly reduces churn risk (11.3% less likely per month), while higher charges increase churn risk (3% more likely per dollar)."

Natural Language Formula Building

You ask Claude:

"I want to predict sales from marketing spend and customer satisfaction scores. Can you help me build the right formula?"

Claude responds:

"I'll help you build that formula. Using the natural language formula builder, I converted your description to: sales ~ marketing_spend + customer_satisfaction. This creates a multiple regression model with sales as the outcome and both marketing spend and satisfaction as predictors. I can also suggest alternatives like interaction terms if needed."

Intelligent Error Recovery

You ask Claude:

"I'm getting an error about missing R packages when I try to run my analysis."

Claude responds:

"I can help diagnose that error. The issue is that the 'forecast' package is missing, which is required for Time Series Forecasting. You can fix this by running: install.packages(\"forecast\"). I can also check all your R packages and install everything RMCP needs at once."

๐Ÿ“ˆ Visual Analytics - See Plots Directly in Claude

New in v0.3.7: All visualization tools now display plots directly in your Claude conversation - no more file management!

You ask Claude:

"Create a correlation heatmap of my sales, marketing, and customer satisfaction data"

Claude responds with:

  • ๐Ÿ“Š Interactive heatmap displayed inline showing correlation strengths with color coding
  • ๐Ÿ“‹ Statistical analysis: correlation matrix with exact values and significance tests
  • ๐Ÿ’ก Insights: "Strong positive correlation (r=0.89) between marketing and sales suggests effective targeting"

Supported Visual Tools:

  • ๐Ÿ”ฅ Correlation Heatmaps: Color-coded correlation matrices with statistical values
  • ๐Ÿ“ˆ Scatter Plots: Trend lines, grouping, and regression analysis
  • ๐Ÿ“Š Histograms: Distribution analysis with density overlays
  • ๐Ÿ“ฆ Box Plots: Outlier detection and quartile analysis
  • โฑ๏ธ Time Series: Trend analysis with forecasting
  • ๐Ÿ” Regression Diagnostics: 4-panel diagnostic plots for model validation

Key Benefits:

  • โœ… Immediate visual feedback - see results instantly
  • โœ… No file management - plots appear in conversation
  • โœ… Combined analysis - statistics + visualizations together
  • โœ… Professional quality - publication-ready plots with ggplot2
  • โœ… Optional saving - can still export to files when needed

๐Ÿ“Š Validated User Scenarios

RMCP has been tested with real-world scenarios achieving 100% success rate:

  • โœ… Business Analysts: Sales forecasting with 97.9% Rยฒ, $4.70 ROI per marketing dollar
  • โœ… Economists: Macroeconomic analysis showing Okun's Law (r=-0.944)
  • โœ… Data Scientists: Customer churn prediction with 100% accuracy
  • โœ… Researchers: Treatment effect analysis with significant results (p<0.001)

๐Ÿ”ง Installation & Setup

Prerequisites

  • Python 3.9+
  • R 4.0+ with required packages (see below)

R Package Requirements

RMCP requires the following R packages. Install all at once with:

# Install all required packages (recommended)
install.packages(c(
  # Core statistical packages
  "jsonlite", "plm", "lmtest", "sandwich", "AER", "dplyr",
  # Time series analysis
  "forecast", "vars", "urca", "tseries", 
  # Statistical testing
  "nortest", "car",
  # Machine learning
  "rpart", "randomForest",
  # Data visualization  
  "ggplot2", "gridExtra", "tidyr", "rlang"
), repos = "https://cran.rstudio.com/")

Minimum Core Packages (basic functionality only):

install.packages(c("jsonlite", "plm", "lmtest", "sandwich", "AER"))

Feature-Specific Packages:

  • Time Series Analysis: forecast, vars, urca, tseries
  • Machine Learning: rpart, randomForest
  • Data Visualization: ggplot2, gridExtra, tidyr, rlang
  • Statistical Testing: nortest, car
  • Data Manipulation: dplyr

๐Ÿ’ก Tip: Install all packages first to avoid errors. Missing packages will cause specific tools to fail with clear error messages.

Install via pip

Requirements: Python 3.10+ (for modern type hints and performance)

pip install rmcp

Development Installation

git clone https://github.com/finite-sample/rmcp.git
cd rmcp
pip install -e ".[dev]"

With Claude Desktop

Add to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "rmcp": {
      "command": "rmcp",
      "args": ["start"],
      "env": {}
    }
  }
}

๐Ÿ“š Usage Examples

Quick Start with Claude Desktop

Once RMCP is configured in Claude Desktop, you can immediately start asking for statistical analysis:

Business Analysis Example:

You: "I have sales data: month 1-12, sales [120, 135, 148, 165, 178, 185, 192, 210, 225, 240, 255, 270]. 
Can you analyze the trend and predict next 3 months?"

Claude: "I'll analyze your sales trend using linear regression and forecasting."
[Claude automatically calls linear_model and then arima_model]

Claude: "Analysis shows strong growth trend with 12.5 units/month increase (Rยฒ=0.99). 
Predicted sales for next 3 months: 285, 298, 311 units with 95% confidence intervals."

Scientific Research Example:

You: "I want to test if a new drug is effective. Control group: [2.1, 2.3, 2.0, 2.4, 2.2]. 
Treatment group: [3.1, 3.4, 3.0, 3.3, 3.2]. Please run appropriate statistical test."

Claude: "I'll perform a two-sample t-test to compare the groups."
[Claude calls t_test tool]

Claude: "Results: Treatment group mean (3.2) significantly higher than control (2.2), 
p-value = 0.0001, 95% CI: [0.8, 1.2]. Strong evidence of treatment effect."

Command Line Interface

# Start MCP server (stdio transport for Claude Desktop)
rmcp start

# Start HTTP server (for web applications and custom clients)
rmcp serve-http --port 8080

# Check version and available tools
rmcp --version

# Development server with debug logging
rmcp start --log-level DEBUG

Direct Tool Usage (Advanced)

For developers building MCP clients or testing tools directly:

import asyncio
from rmcp.core.server import create_server
from rmcp.tools.regression import linear_model

# Create server and context
server = create_server()
context = server.create_context("test-1", "tools/call")

# Call tool directly
result = await linear_model(context, {
    "data": {
        "sales": [100, 120, 140, 160, 180],
        "advertising": [10, 15, 20, 25, 30]
    },
    "formula": "sales ~ advertising"
})

print(f"Advertising effectiveness: ${result['coefficients']['advertising']:.2f} per dollar")
print(f"Model explains {result['r_squared']:.1%} of variance")

HTTP Transport Usage

RMCP now supports HTTP transport for web applications and custom integrations:

Starting the HTTP Server

# Install with HTTP support
pip install rmcp[http]

# Start HTTP server
rmcp serve-http --host 0.0.0.0 --port 8080

# Server provides these endpoints:
# โ€ข POST http://localhost:8080/ (JSON-RPC requests)
# โ€ข GET  http://localhost:8080/sse (Server-Sent Events)
# โ€ข GET  http://localhost:8080/health (Health check)

Making HTTP Requests

# Initialize MCP connection
curl -X POST http://localhost:8080/ \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "initialize",
    "params": {
      "protocolVersion": "2025-06-18",
      "capabilities": {},
      "clientInfo": {"name": "my-app", "version": "1.0.0"}
    }
  }'

# Call a statistical tool
curl -X POST http://localhost:8080/ \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
      "name": "summary_stats",
      "arguments": {
        "data": {
          "sales": [100, 150, 200, 175, 225],
          "month": [1, 2, 3, 4, 5]
        }
      }
    }
  }'

Server-Sent Events for Notifications

# Connect to SSE stream for real-time updates
curl -N http://localhost:8080/sse

# Health check
curl http://localhost:8080/health
# Returns: {"status": "healthy", "transport": "HTTP"}

MCP Protocol Example

Testing with raw JSON-RPC messages:

{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
        "name": "correlation_analysis",
        "arguments": {
            "data": {
                "sales": [100, 150, 200, 250, 300],
                "marketing": [10, 20, 30, 40, 50],
                "satisfaction": [7.5, 8.0, 8.5, 9.0, 9.5]
            },
            "method": "pearson"
        }
    }
}

Response:

{
    "jsonrpc": "2.0",
    "id": 1,
    "result": {
        "content": [{
            "type": "text",
            "text": {
                "correlation_matrix": {
                    "sales": {"marketing": 1.0, "satisfaction": 0.996},
                    "marketing": {"sales": 1.0, "satisfaction": 0.996},
                    "satisfaction": {"sales": 0.996, "marketing": 0.996}
                },
                "significance_tests": {
                    "sales_marketing": 0.0,
                    "sales_satisfaction": 0.000056,
                    "marketing_satisfaction": 0.000056
                }
            }
        }]
    }
}
## ๐Ÿ”ฌ Advanced Usage Scenarios

### Time Series Forecasting

**Business Scenario: Sales Forecasting**

You: "I have monthly sales data for 2 years: [150, 162, 178, 195, 210, 225, 240, 255, 270, 285, 300, 315, 330, 345, 360, 375, 390, 405, 420, 435, 450, 465, 480, 495]. Can you forecast next 6 months?"

Claude: "I'll analyze the time series and create forecasts using ARIMA modeling." [Claude calls decompose_timeseries and arima_model]

Claude: "Strong upward trend detected. ARIMA(1,1,1) model selected. Forecasted sales: 510, 525, 540, 555, 570, 585 with confidence intervals."


### Panel Data Analysis  

**Economic Research: Income and Education**

You: "I have panel data on income and education across countries and years. Can you analyze if education affects income while controlling for country differences?"

Claude: "I'll run a panel regression with fixed effects to control for country characteristics." [Claude calls panel_regression]

Claude: "Results show each additional year of education increases income by $2,450 (p<0.001), controlling for unobserved country characteristics. Rยฒ=0.87 within countries."


### Machine Learning Workflow

**Customer Segmentation Analysis**

You: "Help me segment customers based on: age [25,35,45,30,40], spending [1200,2500,800,1800,3200], frequency [12,25,8,15,30]. How many segments should I use?"

Claude: "I'll perform k-means clustering to identify customer segments." [Claude calls kmeans_clustering with different k values]

Claude: "Optimal segmentation shows 3 distinct customer groups:

  • Young High-Value (age30, spend$2650, freq~27)
  • Mature Moderate (age40, spend$1500, freq~13)
  • Budget Conscious (age45, spend$800, freq~8)"

### Statistical Testing Workflow

**A/B Test Analysis**

You: "I ran an A/B test. Control conversion: [0.12, 0.11, 0.13, 0.10, 0.14]. Treatment conversion: [0.18, 0.19, 0.17, 0.20, 0.16]. Is the difference significant?"

Claude: "I'll run appropriate tests to evaluate your A/B test results." [Claude calls t_test and normality_test]

Claude: "Treatment significantly outperforms control:

  • Control: 12.0% conversion rate
  • Treatment: 18.0% conversion rate
  • Lift: +50% improvement (p=0.003, 95% CI: [2.8%, 9.2%])
  • Power analysis: 89% power to detect this effect size"

## ๐Ÿ“‹ Complete Tool Reference

### Regression & Modeling
| Tool | Purpose | Key Outputs |
|------|---------|-------------|
| `linear_model` | OLS regression | Rยฒ, coefficients, p-values, diagnostics |
| `logistic_regression` | Binary/categorical outcomes | Odds ratios, accuracy, ROC |
| `panel_regression` | Longitudinal data | Fixed/random effects, within Rยฒ |
| `instrumental_variables` | Causal inference | 2SLS estimates, endogeneity tests |

### Time Series Analysis
| Tool | Purpose | Key Outputs |
|------|---------|-------------|
| `arima_model` | Forecasting | Predictions, confidence intervals, AIC |
| `decompose_timeseries` | Trend/seasonal analysis | Components, seasonality strength |
| `stationarity_test` | Unit root testing | ADF, KPSS, PP test statistics |
| `var_model` | Multivariate series | IRF, FEVD, Granger causality |

### Statistical Testing  
| Tool | Purpose | Key Outputs |
|------|---------|-------------|
| `t_test` | Mean comparisons | t-statistic, p-value, confidence intervals |
| `anova` | Group differences | F-statistic, effect sizes, post-hoc |
| `chi_square_test` | Independence/goodness-of-fit | ฯ‡ยฒ statistic, Cramรฉr's V |
| `normality_test` | Distribution testing | Shapiro-Wilk, Jarque-Bera p-values |

### Data Analysis
| Tool | Purpose | Key Outputs |
|------|---------|-------------|
| `correlation_analysis` | Association strength | Correlation matrix, significance tests |
| `summary_stats` | Descriptive statistics | Mean, median, SD, quartiles |
| `outlier_detection` | Anomaly identification | Outlier indices, methods comparison |
| `frequency_table` | Categorical analysis | Counts, percentages, sorted tables |

## ๐Ÿงช Testing & Validation

RMCP includes comprehensive testing with realistic scenarios:

```bash
# Run all user scenarios (should show 100% pass rate)
python tests/realistic_scenarios.py

# Run development test script
bash src/rmcp/scripts/test.sh

Current Test Coverage:

  • โœ… MCP Interface: 100% success rate (5/5 tests) - Validates actual Claude Desktop integration
  • โœ… User Scenarios: 100% success rate (4/4 tests) - Validates real-world usage patterns
  • โœ… Conversational Examples: All documented examples tested and verified working

๐Ÿ—๏ธ Architecture

RMCP is built with production best practices:

  • Clean Architecture: Modular design with clear separation of concerns
  • MCP Compliance: Full Model Context Protocol specification support
  • Multiple Transports: stdio (primary) and HTTP with Server-Sent Events
  • R Integration: Safe subprocess execution with JSON serialization
  • Error Handling: Comprehensive error reporting and recovery
  • Security: Controlled R execution environment
src/rmcp/
โ”œโ”€โ”€ core/           # MCP server core
โ”œโ”€โ”€ tools/          # Statistical analysis tools  
โ”œโ”€โ”€ transport/      # Communication layers
โ”œโ”€โ”€ registries/     # Tool and resource management
โ””โ”€โ”€ security/       # Safe execution environment

๐Ÿค Contributing

We welcome contributions! Please see our contributing guidelines.

Development Setup

git clone https://github.com/finite-sample/rmcp.git
cd rmcp
pip install -e ".[dev]"
pre-commit install

Running Tests

RMCP includes comprehensive organized testing:

# Run all tests in logical order
python run_tests.py

# Run specific test categories
python tests/unit/test_new_tools.py                    # Unit tests
python tests/integration/test_mcp_interface.py         # Integration tests  
python tests/e2e/test_claude_desktop_scenarios.py      # End-to-end tests

# Run pytest (if available)
pytest tests/unit/ -v                                  # Unit tests only

๐Ÿ“„ License

MIT License - see LICENSE file for details.

๐Ÿ› ๏ธ Troubleshooting

Quick Fixes for Common Issues

R not found:

# Check R installation
R --version

# Install R if missing (macOS)
brew install r

# Install R (Ubuntu)
sudo apt-get install r-base

Missing R packages:

First, check which packages are missing:

rmcp check-r-packages

Then install missing packages in R:

# Install all RMCP packages (recommended)
install.packages(c(
  "jsonlite", "plm", "lmtest", "sandwich", "AER", "dplyr",
  "forecast", "vars", "urca", "tseries", "nortest", "car",
  "rpart", "randomForest", "ggplot2", "gridExtra", "tidyr", "rlang"
), repos = "https://cran.rstudio.com/")

# Or install just core packages (limited functionality)
install.packages(c("jsonlite", "plm", "lmtest", "sandwich", "AER"))

Package installation fails:

  • On Ubuntu/Debian: sudo apt-get install r-base-dev libcurl4-openssl-dev libssl-dev libxml2-dev
  • On macOS with Homebrew: brew install r
  • On Windows: Download from CRAN

MCP connection issues:

# Test server directly
echo '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' | rmcp start

# Check Claude Desktop MCP configuration
# Ensure rmcp is in PATH: which rmcp

For detailed troubleshooting: See docs/troubleshooting.md

๐Ÿ™‹ Support

๐ŸŽ‰ Acknowledgments

RMCP builds on the excellent work of:


Ready to analyze data like never before? Install RMCP and start running sophisticated statistical analyses through AI assistants today! ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rmcp-0.3.10.tar.gz (106.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rmcp-0.3.10-py3-none-any.whl (116.6 kB view details)

Uploaded Python 3

File details

Details for the file rmcp-0.3.10.tar.gz.

File metadata

  • Download URL: rmcp-0.3.10.tar.gz
  • Upload date:
  • Size: 106.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rmcp-0.3.10.tar.gz
Algorithm Hash digest
SHA256 d09e4c231d6a7caa1386b6f930b7813909f4ff9919068d38e8bb8e3b48bc32b2
MD5 ff566465c60e3d0ac37219eede94e591
BLAKE2b-256 118604d1f26a6e471ce4db4cd6ef06a0369a54f1598e800e2f19004d4650d324

See more details on using hashes here.

Provenance

The following attestation bundles were made for rmcp-0.3.10.tar.gz:

Publisher: python-publish.yml on finite-sample/rmcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rmcp-0.3.10-py3-none-any.whl.

File metadata

  • Download URL: rmcp-0.3.10-py3-none-any.whl
  • Upload date:
  • Size: 116.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rmcp-0.3.10-py3-none-any.whl
Algorithm Hash digest
SHA256 45508b8625867fcaca342d4400f302fc9fb3fd2892be2672e2041972851625ac
MD5 7553c073d9ff91e4619b78f578cd09c3
BLAKE2b-256 04aec0b321cd2c273b79a06d97e139b8c30333802716063f967e92c626cb14de

See more details on using hashes here.

Provenance

The following attestation bundles were made for rmcp-0.3.10-py3-none-any.whl:

Publisher: python-publish.yml on finite-sample/rmcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page