Skip to main content

MIMIC-IV + MCP + Models: Local MIMIC-IV querying with LLMs via Model Context Protocol

Project description

M4: Many Medical Datasets ↔ MCP ↔ Models 🏥🤖

M4 Logo

Query tabular PhysioNet medical data using natural language through MCP clients

Python MCP Tests Code Quality PRs Welcome

Transform medical data analysis with AI! Ask questions about MIMIC-IV and other PhysioNet datasets in plain English and get instant insights. Choose between local data (free) or full cloud dataset (BigQuery).

🫶 M4 is built upon M3.
Please acknowledge the original authors and cite their work.

💡 How It Works

M4 acts as a bridge between your AI Client (like Claude Desktop, Cursor, or LibreChat) and your medical data.

  1. You ask a question in your chat interface: "How many patients in the ICU have high blood pressure?"
  2. M4 securely translates this into a database query.
  3. M4 runs the query on your local or cloud data.
  4. The LLM explains the results to you in plain English.

No SQL knowledge required.

Features

  • 🔍 Natural Language Queries: Ask questions about your medical data in plain English
  • 🏠 Modular Datasets: Support for any tabular PhysioNet dataset (MIMIC-IV, etc.)
  • 📂 Local DuckDB + Parquet: Fast local queries using Parquet files with DuckDB views
  • ☁️ BigQuery Support: Access full MIMIC-IV dataset on Google Cloud
  • 🔒 Enterprise Security: OAuth2 authentication with JWT tokens and rate limiting
  • 🛡️ SQL Injection Protection: Read-only queries with comprehensive validation
  • 🧩 Extensible Architecture: Easily add new custom datasets via configuration or CLI

🚀 Quick Start

Prerequisites

You need an MCP-compatible Client to use M4. Popular options include:

1. Install uv (Required)

We use uvx to run the MCP server efficiently.

macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Choose Your Data Source

Select Option A (Local) or Option B (Cloud).

Option A: Local Dataset (Free & Fast)

Best for development, testing, and offline use.

  1. Create project directory:

    mkdir m4 && cd m4
    
  2. Initialize Dataset:

    We will use MIMIC-IV as an example.

    For Demo (Auto-download ~16MB):

    uv init && uv add m4-mcp
    uv run m4 init mimic-iv-demo
    

    For Full Data (Requires Manual Download): Download CSVs from PhysioNet first and place them in m4_data/raw_files.

    uv init && uv add m4-mcp
    uv run m4 init mimic-iv-full
    

    This can take 5-15 minutes depending on your machine

  3. Configure Your Client:

    For Claude Desktop (Shortcut):

    uv run m4 config claude --quick
    

    For Other Clients (Cursor, LibreChat, etc.):

    uv run m4 config --quick
    

    This generates the configuration JSON you need to paste into your client's settings.

Option B: BigQuery (Full Cloud Dataset)

Best for researchers with Google Cloud access.

  1. Authenticate with Google:

    gcloud auth application-default login
    
  2. Configure Client:

    uv run m4 config --backend bigquery --project_id BIGQUERY_PROJECT_ID
    

    This also generates the configuration JSON you need to paste into your client's settings.

3. Start Asking Questions!

Restart your MCP client and try:

  • "What tools do you have for MIMIC-IV data?"
  • "Show me patient demographics from the ICU"
  • "What is the race distribution in admissions?"

🔄 Managing Datasets

Switch between available datasets instantly:

# Switch to full dataset
m4 use mimic-iv-full

# Switch back to demo
m4 use mimic-iv-demo

# Check status
m4 status

Backend Comparison

Feature DuckDB (Demo) DuckDB (Full) BigQuery (Full)
Cost Free Free BigQuery usage fees
Setup Zero config Manual Download GCP credentials required
Credentials Not required PhysioNet PhysioNet
Data Size 100 patients 365k patients 365k patients
Speed Fast (local) Fast (local) Network latency
Use Case Learning Research (local) Research, production

➕ Adding Custom Datasets

M4 is designed to be modular. You can add support for any tabular dataset on PhysioNet easily. Let's take eICU as an example:

JSON Definition Method

  1. Create a definition file: m4_data/datasets/eicu.json

    {
      "name": "eicu",
      "description": "eICU Collaborative Research Database",
      "file_listing_url": "https://physionet.org/files/eicu-crd/2.0/",
      "subdirectories_to_scan": [],
      "primary_verification_table": "eicu_crd_patient",
      "tags": ["clinical", "eicu"],
      "requires_authentication": true,
      "bigquery_project_id": "physionet-data",
      "bigquery_dataset_ids": ["eicu_crd"]
    }
    
  2. Initialize it:

    m4 init eicu --src /path/to/raw/csvs
    

    M4 will convert CSVs to Parquet and create DuckDB views automatically.


Alternative Installation Methods

Already have Docker or prefer pip?

🐳 Docker

DuckDB (Local):

git clone https://github.com/hannesill/m4.git && cd m4
docker build -t m4:lite --target lite .
docker run -d --name m4-server m4:lite tail -f /dev/null

BigQuery:

git clone https://github.com/rafiattrach/m4.git && cd m4
docker build -t m4:bigquery --target bigquery .
docker run -d --name m4-server \
  -e M4_BACKEND=bigquery \
  -e M4_PROJECT_ID=your-project-id \
  -v $HOME/.config/gcloud:/root/.config/gcloud:ro \
  m4:bigquery tail -f /dev/null

MCP config (same for both):

{
  "mcpServers": {
    "m4": {
      "command": "docker",
      "args": ["exec", "-i", "m4-server", "python", "-m", "m4.mcp_server"]
    }
  }
}

pip Install

pip install m4-mcp
m4 config --quick

Local Development

For contributors:

  1. Clone & Install (using uv):

    git clone https://github.com/rafiattrach/m4.git
    cd m4
    uv venv
    uv sync
    
  2. MCP Config:

    {
      "mcpServers": {
        "m4": {
          "command": "/absolute/path/to/m4/.venv/bin/python",
          "args": ["-m", "m4.mcp_server"],
          "cwd": "/absolute/path/to/m4",
          "env": { "M4_BACKEND": "duckdb" }
        }
      }
    }
    

🔧 Advanced Configuration

Interactive Config Generator:

m4 config

OAuth2 Authentication: For secure production deployments:

m4 config claude --enable-oauth2 \
  --oauth2-issuer https://your-auth-provider.com \
  --oauth2-audience m4-api

See docs/OAUTH2_AUTHENTICATION.md for details.


🛠️ Available MCP Tools

  • get_database_schema: List all available tables
  • get_table_info: Get column info and sample data
  • execute_mimic_query: Execute SQL SELECT queries
  • get_icu_stays: ICU stay info & length of stay
  • get_lab_results: Laboratory test results
  • get_race_distribution: Patient race statistics

Example Prompts

Demographics:

  • What is the race distribution in MIMIC-IV admissions?
  • Show me patient demographics for ICU stays

Clinical Data:

  • Find lab results for patient X
  • What lab tests are most commonly ordered?

Exploration:

  • What tables are available in the database?

Troubleshooting

  • "Parquet not found": Rerun m4 init <dataset_name>.
  • MCP client not starting: Check logs (Claude Desktop: Help → View Logs).
  • BigQuery Access Denied: Run gcloud auth application-default login and verify project ID.

Contributing & Citation

For Developers

We welcome contributions!

  1. Setup: Follow the "Local Development" steps above.
  2. Test: Run uv run pre-commit --all-files to ensure everything is working and linted.
  3. Submit: Open a Pull Request with your changes.

Citation:

M4 is built upon M3. Please cite the original work:

@article{attrach2025conversational,
  title={Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis},
  author={Attrach, Rafi Al and Moreira, Pedro and Fani, Rajna and Umeton, Renato and Celi, Leo Anthony},
  journal={arXiv preprint arXiv:2507.01053},
  year={2025}
}

You can also use the "Cite this repository" button at the top of the GitHub page for other formats.

Related Projects

M4 has been forked and adapted by the community:

  • MCPStack-MIMIC - Integrates M4 with other MCP servers (Jupyter, sklearn, etc.)

Built with ❤️ for the medical AI community

Need help? Open an issue on GitHub or check our troubleshooting guide above.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

m4_mcp-0.0.0.dev0.tar.gz (51.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

m4_mcp-0.0.0.dev0-py3-none-any.whl (42.2 kB view details)

Uploaded Python 3

File details

Details for the file m4_mcp-0.0.0.dev0.tar.gz.

File metadata

  • Download URL: m4_mcp-0.0.0.dev0.tar.gz
  • Upload date:
  • Size: 51.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for m4_mcp-0.0.0.dev0.tar.gz
Algorithm Hash digest
SHA256 b3047a4bc77780167af7df6aa8089ffa5e30bfd6a2eadc6bddb480e9d3219af0
MD5 a1d44984a892f6cc43c589480eebc32c
BLAKE2b-256 78fc4d9eca99a2232d2566beb6bf52fa4298bebd1dfc6f8af47ac57619212771

See more details on using hashes here.

Provenance

The following attestation bundles were made for m4_mcp-0.0.0.dev0.tar.gz:

Publisher: publish.yaml on hannesill/m4

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file m4_mcp-0.0.0.dev0-py3-none-any.whl.

File metadata

  • Download URL: m4_mcp-0.0.0.dev0-py3-none-any.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for m4_mcp-0.0.0.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 04be190f41bb314f3b9ef2a3328a7fa716cc54475e58e259001452e7b79d0f82
MD5 d6e80dba17dade7392c770d2ea996df2
BLAKE2b-256 f98699a2f61bdecbef3994455518cfee4f831bc0c2427feacd039e1ff95f475f

See more details on using hashes here.

Provenance

The following attestation bundles were made for m4_mcp-0.0.0.dev0-py3-none-any.whl:

Publisher: publish.yaml on hannesill/m4

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page