Skip to main content

MCP server for data engineering and business intelligence operations

Project description

Engineer Your Data

A Model Context Protocol (MCP) server designed specifically for data engineers and business intelligence professionals. Transform your data pipelines and BI workflows with AI-assisted data engineering capabilities that run locally without internet dependency.

Why Engineer Your Data?

Built from the ground up for data engineering teams and BI analysts who need:

  • Pipeline Development - Build and test ETL/ELT transformations
  • Data Quality Assurance - Profile and validate data sources
  • Business Intelligence - Create analytics models and dashboard visualizations
  • Local Control - Keep sensitive data on-premises with no cloud dependencies

🚀 Quick Start

New to Engineer Your Data? Start with these 5 essential operations:

  1. Check Data Quality: "Generate a data quality report for my sales.csv file"
  2. Find Issues: "Check for null values in the customer_data.csv"
  3. Transform Data: "Filter the orders.csv for rows where status is 'completed'"
  4. Visualize: "Create a bar chart showing sales by region from revenue.csv"
  5. Summarize: "Give me a statistical summary of the dataset"

These cover 80% of daily data engineering tasks. Explore the full capabilities below!

Core Capabilities

🚀 File Operations:

  • read_file - Read data files from local filesystem
  • write_file - Write processed data to files
  • list_files - Browse and discover data files
  • file_info - Get metadata about data files

📊 Data Validation & Quality:

  • validate_schema - Validate data against expected schemas
  • check_nulls - Analyze null values and missing data patterns
  • data_quality_report - Comprehensive data quality assessment
  • detect_duplicates - Find duplicate records with configurable matching

🔄 Data Transformation:

  • filter_data - Filter datasets based on conditions
  • aggregate_data - Group and aggregate data with statistical functions
  • join_data - Join multiple datasets with flexible join types
  • pivot_data - Reshape data from long to wide format
  • clean_data - Clean and standardize data values

📈 Visualization & Analysis:

  • create_chart - Generate bar, pie, line, scatter, histogram, box, and heatmap charts
  • data_summary - Create comprehensive dataset summaries with statistics
  • export_visualization - Export charts and data to JSON, CSV, HTML, Markdown

🌐 API Integration:

  • fetch_api_data - Retrieve data from REST APIs
  • monitor_api - Monitor API endpoints for health and performance
  • batch_api_calls - Execute multiple API calls efficiently
  • api_auth - Manage API authentication

🔧 Utilities:

  • chain_tools - Execute multiple tools in sequence
  • analyze_schema - Analyze and understand data schemas

Quick Start for Data Teams

Installation

# Option 1: Install from PyPI (recommended)
pip install engineer-your-data

# Option 2: Install from source
git clone https://github.com/eghuzefa/engineer-your-data-mcp.git
cd engineer-your-data-mcp
pip install -e .

Configure for Your Data Environment

For PyPI Installation: Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "engineer-your-data": {
      "command": "python",
      "args": ["-m", "src.server"],
      "env": {
        "WORKSPACE_PATH": "/path/to/your/data/workspace"
      }
    }
  }
}

For Source Installation: Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "engineer-your-data": {
      "command": "python",
      "args": ["/path/to/engineer-your-data-mcp/src/server.py"],
      "env": {
        "WORKSPACE_PATH": "/path/to/your/data/workspace"
      }
    }
  }
}

Data Engineering Examples

Data Quality Analysis:

"Check the customer data for null values and duplicates"
"Generate a comprehensive data quality report for the sales dataset"
"Validate this CSV file against our customer schema"

Data Transformation:

"Filter the orders data for customers in the US region"
"Aggregate sales data by month and calculate total revenue"
"Join customer data with order data on customer_id"
"Pivot the sales data to show products as columns"

Visualization & Reporting:

"Create a bar chart showing revenue by department"
"Generate a summary of the dataset with key statistics"
"Export the sales analysis as an HTML report"

API Data Integration:

"Fetch customer data from the CRM API"
"Monitor the data pipeline API for health status"
"Authenticate with the analytics API using OAuth"

Architecture for Data Teams

Claude Desktop → MCP Protocol → Engineer Your Data → Local Python Environment
                                        ↓
                    pandas + numpy + requests + matplotlib
                                        ↓
                         Local Files + APIs + Data Sources

Testing & Quality

  • 161 comprehensive tests with 100% pass rate
  • Async/await support for high-performance operations
  • Error handling with detailed logging and debugging
  • Type safety with proper schema validation
# Run all tests
python -m pytest

# Run with coverage
python -m pytest --cov=src

# Run specific tool tests
python -m pytest tests/tools/test_visualization.py

Available Tools (17 Total)

File Operations (4 tools)

Tool Description
read_file Read and parse data files (CSV, JSON, etc.)
write_file Write data to files with format options
list_files Directory browsing and file discovery
file_info File metadata and basic statistics

Data Validation (4 tools)

Tool Description
validate_schema Schema validation with custom rules
check_nulls Null value analysis and patterns
data_quality_report Comprehensive quality assessment
detect_duplicates Duplicate detection with flexible matching

Data Transformation (5 tools)

Tool Description
filter_data Advanced filtering with conditions
aggregate_data Grouping and statistical aggregation
join_data Multi-dataset joins (inner, outer, left, right)
pivot_data Data reshaping and pivoting
clean_data Data cleaning and standardization

Visualization (3 tools)

Tool Description
create_chart 7 chart types with customization
data_summary Statistical summaries and insights
export_visualization Multi-format export capabilities

API Integration (4 tools)

Tool Description
fetch_api_data REST API data retrieval
monitor_api API health monitoring
batch_api_calls Efficient bulk API operations
api_auth Authentication management

Data Engineering Best Practices

  • Sandboxed Execution - Safe environment for testing transformations
  • Local Data Control - Keep sensitive data on your infrastructure
  • Comprehensive Testing - All tools thoroughly tested and validated
  • Enterprise Security - No external API calls for core functionality
  • Performance Optimized - Async operations and efficient data processing

Integration with Your Stack

Works seamlessly alongside:

  • dbt - Use for complex transformation logic development
  • Airflow/Prefect - Incorporate into existing workflow orchestration
  • Jupyter/Notebooks - Prototype and iterate on data transformations
  • BI Tools - Generate data and visualizations for Tableau, Power BI, etc.
  • APIs - Integrate with REST APIs and microservices

Contributing

Data engineers and BI professionals welcome! Please read our contributing guidelines and submit PRs for new data connectors, transformations, or BI features.

MCP Registry

This server is available in the official Model Context Protocol Registry.

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

engineer_your_data-0.1.0.tar.gz (64.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

engineer_your_data-0.1.0-py3-none-any.whl (51.6 kB view details)

Uploaded Python 3

File details

Details for the file engineer_your_data-0.1.0.tar.gz.

File metadata

  • Download URL: engineer_your_data-0.1.0.tar.gz
  • Upload date:
  • Size: 64.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for engineer_your_data-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bf49a4f0b8b83f7059fd763749bd78fd6eade169d05cfa8a3b336b81e8caf112
MD5 9274292f09cd46a87a25cb026bb4afc3
BLAKE2b-256 185d7ac7ebe3eb4766f206b345e7ba68ca98e052e3cd358a822cc3d827f1f156

See more details on using hashes here.

File details

Details for the file engineer_your_data-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for engineer_your_data-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10e30715b1cd97889840fe12081a88d2c7dee28ac9473419a5878668de3b7f27
MD5 8aaa18572e475be341cb6e47d9cee625
BLAKE2b-256 a40fc1813bbfc0f6a54c40470cdeaee2a02c7a0981c95e599753c8f2bbff3037

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page