MIMIC-IV + MCP + Models: Local MIMIC-IV querying with LLMs via Model Context Protocol
Project description
M4: Many Medical Datasets ↔ MCP ↔ Models 🏥🤖
Query tabular PhysioNet medical data using natural language through MCP clients
Transform medical data analysis with AI! Ask questions about MIMIC-IV and other PhysioNet datasets in plain English and get instant insights. Choose between local data (free) or full cloud dataset (BigQuery).
🫶 M4 is built upon M3.
Please acknowledge the original authors and cite their work.
💡 How It Works
M4 acts as a bridge between your AI Client (like Claude Desktop, Cursor, or LibreChat) and your medical data.
- You ask a question in your chat interface: "How many patients in the ICU have high blood pressure?"
- M4 securely translates this into a database query.
- M4 runs the query on your local or cloud data.
- The LLM explains the results to you in plain English.
No SQL knowledge required.
Features
- 🔍 Natural Language Queries: Ask questions about your medical data in plain English
- 🏠 Modular Datasets: Support for any tabular PhysioNet dataset (MIMIC-IV, etc.)
- 📂 Local DuckDB + Parquet: Fast local queries using Parquet files with DuckDB views
- ☁️ BigQuery Support: Access full MIMIC-IV dataset on Google Cloud
- 🔒 Enterprise Security: OAuth2 authentication with JWT tokens and rate limiting
- 🛡️ SQL Injection Protection: Read-only queries with comprehensive validation
- 🧩 Extensible Architecture: Easily add new custom datasets via configuration or CLI
🚀 Quick Start
Prerequisites
You need an MCP-compatible Client to use M4. Popular options include:
1. Install uv (Required)
We use uvx to run the MCP server efficiently.
macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
Windows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
2. Choose Your Data Source
Select Option A (Local) or Option B (Cloud).
Option A: Local Dataset (Free & Fast)
Best for development, testing, and offline use.
-
Create project directory:
mkdir m4 && cd m4
-
Initialize Dataset:
We will use MIMIC-IV as an example.
For Demo (Auto-download ~16MB):
uv init && uv add m4-mcp uv run m4 init mimic-iv-demo
For Full Data (Requires Manual Download): Download CSVs from PhysioNet first and place them in
m4_data/raw_files.uv init && uv add m4-mcp uv run m4 init mimic-iv-full
This can take 5-15 minutes depending on your machine
-
Configure Your Client:
For Claude Desktop (Shortcut):
uv run m4 config claude --quick
For Other Clients (Cursor, LibreChat, etc.):
uv run m4 config --quick
This generates the configuration JSON you need to paste into your client's settings.
Option B: BigQuery (Full Cloud Dataset)
Best for researchers with Google Cloud access.
-
Authenticate with Google:
gcloud auth application-default login
-
Configure Client:
uv run m4 config --backend bigquery --project_id BIGQUERY_PROJECT_ID
This also generates the configuration JSON you need to paste into your client's settings.
3. Start Asking Questions!
Restart your MCP client and try:
- "What tools do you have for MIMIC-IV data?"
- "Show me patient demographics from the ICU"
- "What is the race distribution in admissions?"
🔄 Managing Datasets
Switch between available datasets instantly:
# Switch to full dataset
m4 use mimic-iv-full
# Switch back to demo
m4 use mimic-iv-demo
# Check status
m4 status
Backend Comparison
| Feature | DuckDB (Demo) | DuckDB (Full) | BigQuery (Full) |
|---|---|---|---|
| Cost | Free | Free | BigQuery usage fees |
| Setup | Zero config | Manual Download | GCP credentials required |
| Credentials | Not required | PhysioNet | PhysioNet |
| Data Size | 100 patients | 365k patients | 365k patients |
| Speed | Fast (local) | Fast (local) | Network latency |
| Use Case | Learning | Research (local) | Research, production |
➕ Adding Custom Datasets
M4 is designed to be modular. You can add support for any tabular dataset on PhysioNet easily. Let's take eICU as an example:
JSON Definition Method
-
Create a definition file:
m4_data/datasets/eicu.json{ "name": "eicu", "description": "eICU Collaborative Research Database", "file_listing_url": "https://physionet.org/files/eicu-crd/2.0/", "subdirectories_to_scan": [], "primary_verification_table": "eicu_crd_patient", "tags": ["clinical", "eicu"], "requires_authentication": true, "bigquery_project_id": "physionet-data", "bigquery_dataset_ids": ["eicu_crd"] }
-
Initialize it:
m4 init eicu --src /path/to/raw/csvs
M4 will convert CSVs to Parquet and create DuckDB views automatically.
Alternative Installation Methods
Already have Docker or prefer pip?
🐳 Docker
|
DuckDB (Local): git clone https://github.com/hannesill/m4.git && cd m4
docker build -t m4:lite --target lite .
docker run -d --name m4-server m4:lite tail -f /dev/null
|
BigQuery: git clone https://github.com/rafiattrach/m4.git && cd m4
docker build -t m4:bigquery --target bigquery .
docker run -d --name m4-server \
-e M4_BACKEND=bigquery \
-e M4_PROJECT_ID=your-project-id \
-v $HOME/.config/gcloud:/root/.config/gcloud:ro \
m4:bigquery tail -f /dev/null
|
MCP config (same for both):
{
"mcpServers": {
"m4": {
"command": "docker",
"args": ["exec", "-i", "m4-server", "python", "-m", "m4.mcp_server"]
}
}
}
pip Install
pip install m4-mcp
m4 config --quick
Local Development
For contributors:
-
Clone & Install (using
uv):git clone https://github.com/rafiattrach/m4.git cd m4 uv venv uv sync
-
MCP Config:
{ "mcpServers": { "m4": { "command": "/absolute/path/to/m4/.venv/bin/python", "args": ["-m", "m4.mcp_server"], "cwd": "/absolute/path/to/m4", "env": { "M4_BACKEND": "duckdb" } } } }
🔧 Advanced Configuration
Interactive Config Generator:
m4 config
OAuth2 Authentication: For secure production deployments:
m4 config claude --enable-oauth2 \
--oauth2-issuer https://your-auth-provider.com \
--oauth2-audience m4-api
See
docs/OAUTH2_AUTHENTICATION.mdfor details.
🛠️ Available MCP Tools
- get_database_schema: List all available tables
- get_table_info: Get column info and sample data
- execute_mimic_query: Execute SQL SELECT queries
- get_icu_stays: ICU stay info & length of stay
- get_lab_results: Laboratory test results
- get_race_distribution: Patient race statistics
Example Prompts
Demographics:
- What is the race distribution in MIMIC-IV admissions?
- Show me patient demographics for ICU stays
Clinical Data:
- Find lab results for patient X
- What lab tests are most commonly ordered?
Exploration:
- What tables are available in the database?
Troubleshooting
- "Parquet not found": Rerun
m4 init <dataset_name>. - MCP client not starting: Check logs (Claude Desktop: Help → View Logs).
- BigQuery Access Denied: Run
gcloud auth application-default loginand verify project ID.
Contributing & Citation
For Developers
We welcome contributions!
- Setup: Follow the "Local Development" steps above.
- Test: Run
uv run pre-commit --all-filesto ensure everything is working and linted. - Submit: Open a Pull Request with your changes.
Citation:
M4 is built upon M3. Please cite the original work:
@article{attrach2025conversational,
title={Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis},
author={Attrach, Rafi Al and Moreira, Pedro and Fani, Rajna and Umeton, Renato and Celi, Leo Anthony},
journal={arXiv preprint arXiv:2507.01053},
year={2025}
}
You can also use the "Cite this repository" button at the top of the GitHub page for other formats.
Related Projects
M4 has been forked and adapted by the community:
- MCPStack-MIMIC - Integrates M4 with other MCP servers (Jupyter, sklearn, etc.)
Built with ❤️ for the medical AI community
Need help? Open an issue on GitHub or check our troubleshooting guide above.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file m4_mcp-0.0.0.dev0.tar.gz.
File metadata
- Download URL: m4_mcp-0.0.0.dev0.tar.gz
- Upload date:
- Size: 51.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3047a4bc77780167af7df6aa8089ffa5e30bfd6a2eadc6bddb480e9d3219af0
|
|
| MD5 |
a1d44984a892f6cc43c589480eebc32c
|
|
| BLAKE2b-256 |
78fc4d9eca99a2232d2566beb6bf52fa4298bebd1dfc6f8af47ac57619212771
|
Provenance
The following attestation bundles were made for m4_mcp-0.0.0.dev0.tar.gz:
Publisher:
publish.yaml on hannesill/m4
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
m4_mcp-0.0.0.dev0.tar.gz -
Subject digest:
b3047a4bc77780167af7df6aa8089ffa5e30bfd6a2eadc6bddb480e9d3219af0 - Sigstore transparency entry: 735787378
- Sigstore integration time:
-
Permalink:
hannesill/m4@005c543f98febece1d0eb6e3330c99a264af25fd -
Branch / Tag:
refs/heads/main - Owner: https://github.com/hannesill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@005c543f98febece1d0eb6e3330c99a264af25fd -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file m4_mcp-0.0.0.dev0-py3-none-any.whl.
File metadata
- Download URL: m4_mcp-0.0.0.dev0-py3-none-any.whl
- Upload date:
- Size: 42.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04be190f41bb314f3b9ef2a3328a7fa716cc54475e58e259001452e7b79d0f82
|
|
| MD5 |
d6e80dba17dade7392c770d2ea996df2
|
|
| BLAKE2b-256 |
f98699a2f61bdecbef3994455518cfee4f831bc0c2427feacd039e1ff95f475f
|
Provenance
The following attestation bundles were made for m4_mcp-0.0.0.dev0-py3-none-any.whl:
Publisher:
publish.yaml on hannesill/m4
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
m4_mcp-0.0.0.dev0-py3-none-any.whl -
Subject digest:
04be190f41bb314f3b9ef2a3328a7fa716cc54475e58e259001452e7b79d0f82 - Sigstore transparency entry: 735787380
- Sigstore integration time:
-
Permalink:
hannesill/m4@005c543f98febece1d0eb6e3330c99a264af25fd -
Branch / Tag:
refs/heads/main - Owner: https://github.com/hannesill
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yaml@005c543f98febece1d0eb6e3330c99a264af25fd -
Trigger Event:
workflow_dispatch
-
Statement type: