MIMIC-IV + MCP + Models: Local MIMIC-IV querying with LLMs via Model Context Protocol

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pedrojfm rafiaa rajna

These details have not been verified by PyPI

Project description

M3: MIMIC-IV + MCP + Models 🏥🤖

Query MIMIC-IV medical data using natural language through MCP clients

Transform medical data analysis with AI! Ask questions about MIMIC-IV data in plain English and get instant insights. Choose between local demo data (free) or full cloud dataset (BigQuery).

Features

🔍 Natural Language Queries: Ask questions about MIMIC-IV data in plain English
🏠 Local DuckDB + Parquet: Fast local queries for demo and full dataset using Parquet files with DuckDB views
☁️ BigQuery Support: Access full MIMIC-IV dataset on Google Cloud
🔒 Enterprise Security: OAuth2 authentication with JWT tokens and rate limiting
🛡️ SQL Injection Protection: Read-only queries with comprehensive validation

🚀 Quick Start

📺 Prefer video tutorials? Check out step-by-step video guides covering setup, PhysioNet configuration, and more.

Install uv (required for `uvx`)

We use uvx to run the MCP server. Install uv from the official installer, then verify with uv --version.

macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Verify installation:

uv --version

BigQuery Setup (Optional - Full Dataset)

Skip this if using DuckDB demo database.

Install Google Cloud SDK:
- macOS: brew install google-cloud-sdk
- Windows/Linux: https://cloud.google.com/sdk/docs/install
Authenticate:
```
gcloud auth application-default login
```
Opens your browser - choose the Google account with BigQuery access to MIMIC-IV.

M3 Initialization

Supported clients: Claude Desktop, Cursor, Goose, and more.

DuckDB (Demo or Full Dataset)

To create a m3 directory and navigate into it run:

mkdir m3 && cd m3

If you want to use the full dataset, download it manually from PhysioNet and place it into m3/m3_data/raw. For using the demo set you can continue and run:

uv init && uv add m3-mcp && \
uv run m3 init DATASET_NAME && uv run m3 config --quick

Replace DATASET_NAME with mimic-iv-demo or mimic-iv-full and copy & paste the output of this command into your client config JSON file.

Demo dataset (16MB raw download size) downloads automatically on first query.

Full dataset (10.6GB raw download size) needs to be downloaded manually.

BigQuery (Full Dataset)

Requires GCP credentials and PhysioNet access.

Paste this into your client config JSON file:

{
  "mcpServers": {
    "m3": {
      "command": "uvx",
      "args": ["m3-mcp"],
      "env": {
        "M3_BACKEND": "bigquery",
        "M3_PROJECT_ID": "your-project-id"
      }
    }
  }
}

Replace your-project-id with your Google Cloud project ID.

That's it! Restart your MCP client and ask:

"What tools do you have for MIMIC-IV data?"
"Show me patient demographics from the ICU"
"What is the race distribution in admissions?"

Backend Comparison

Feature	DuckDB (Demo)	DuckDB (Full)	BigQuery (Full)
Cost	Free	Free	BigQuery usage fees
Setup	Zero config	Manual Download	GCP credentials required
Data Size	100 patients, 275 admissions	365k patients, 546k admissions	365k patients, 546k admissions
Speed	Fast (local)	Fast (local)	Network latency
Use Case	Learning, development	Research (local)	Research, production

Alternative Installation Methods

Already have Docker or prefer pip? Here are other ways to run m3:

🐳 Docker (No Python Required)

DuckDB (Local):

git clone https://github.com/rafiattrach/m3.git && cd m3
docker build -t m3:lite --target lite .
docker run -d --name m3-server m3:lite tail -f /dev/null

BigQuery:

git clone https://github.com/rafiattrach/m3.git && cd m3
docker build -t m3:bigquery --target bigquery .
docker run -d --name m3-server \
  -e M3_BACKEND=bigquery \
  -e M3_PROJECT_ID=your-project-id \
  -v $HOME/.config/gcloud:/root/.config/gcloud:ro \
  m3:bigquery tail -f /dev/null

MCP config (same for both):

{
  "mcpServers": {
    "m3": {
      "command": "docker",
      "args": ["exec", "-i", "m3-server", "python", "-m", "m3.mcp_server"]
    }
  }
}

Stop: docker stop m3-server && docker rm m3-server

pip Install + CLI Tools

pip install m3-mcp

💡 CLI commands: Run m3 --help to see all available options.

Useful CLI commands:

m3 init mimic-iv-demo - Download demo database
m3 config - Generate MCP configuration interactively
m3 config claude --backend bigquery --project-id YOUR_PROJECT_ID - Quick BigQuery setup

Example MCP config:

{
  "mcpServers": {
    "m3": {
      "command": "m3-mcp-server",
      "env": {
        "M3_BACKEND": "duckdb"
      }
    }
  }
}

Local Development

For contributors:

git clone https://github.com/rafiattrach/m3.git && cd m3
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
pre-commit install

MCP config:

{
  "mcpServers": {
    "m3": {
      "command": "/path/to/m3/.venv/bin/python",
      "args": ["-m", "m3.mcp_server"],
      "cwd": "/path/to/m3",
      "env": {
        "M3_BACKEND": "duckdb"
      }
    }
  }
}

Using `UV` (Recommended)

Assuming you have UV installed.

Step 1: Clone and Navigate

# Clone the repository
git clone https://github.com/rafiattrach/m3.git
cd m3

Step 2: Create UV Virtual Environment

# Create virtual environment
uv venv

Step 3: Install M3

uv sync
# Do not forget to use `uv run` to any subsequent commands to ensure you're using the `uv` virtual environment

🗄️ Database Configuration

After installation, choose your data source:

Option A: Local Demo (DuckDB + Parquet)

Perfect for learning and development - completely free!

Initialize demo dataset:
```
m3 init mimic-iv-demo
```

Setup MCP Client:

m3 config

Alternative: For Claude Desktop specifically:

m3 config claude --backend duckdb --db-path /Users/you/path/to/m3_data/databases/mimic_iv_demo.duckdb

Restart your MCP client and ask:
- "What tools do you have for MIMIC-IV data?"
- "Show me patient demographics from the ICU"

Option B: Local Full Dataset (DuckDB + Parquet)

Run the entire MIMIC-IV dataset locally with DuckDB views over Parquet.

Acquire CSVs (requires PhysioNet credentials):
- Download the official MIMIC-IV CSVs from PhysioNet and place them under:
  - /Users/you/path/to/m3/m3_data/raw_files/mimic-iv-full/hosp/
  - /Users/you/path/to/m3/m3_data/raw_files/mimic-iv-full/icu/
- Note: m3 init's auto-download function currently only supports the demo dataset. Use your browser or wget to obtain the full dataset.

Initialize full dataset:

m3 init mimic-iv-full

This may take up to 30 minutes, depending on your system (e.g. 10 minutes for MacBook Pro M3)

Performance knobs (optional):

export M3_CONVERT_MAX_WORKERS=6   # number of parallel files (default=4)
export M3_DUCKDB_MEM=4GB          # DuckDB memory limit per worker (default=3GB)
export M3_DUCKDB_THREADS=4        # DuckDB threads per worker (default=2)

Pay attention to your system specifications, especially if you have enough memory.

Select dataset and verify:
```
m3 use full # optional, as this automatically got set to full
m3 status
```
- Status prints active dataset, local DB path, Parquet presence, quick row counts and total Parquet size.

Configure MCP client (uses the full local DB):

m3 config
# or
m3 config claude --backend duckdb --db-path /Users/you/path/to/m3/m3_data/databases/mimic_iv_full.duckdb

Option C: BigQuery (Full Dataset)

For researchers needing complete MIMIC-IV data

Prerequisites

Google Cloud account and project with billing enabled
Access to MIMIC-IV on BigQuery (requires PhysioNet credentialing)

Setup Steps

Install Google Cloud CLI:

macOS (with Homebrew):
```
brew install google-cloud-sdk
```
Windows: Download from https://cloud.google.com/sdk/docs/install

Linux:
```
curl https://sdk.cloud.google.com | bash
```
Authenticate:
```
gcloud auth application-default login
```
This will open your browser - choose the Google account that has access to your BigQuery project with MIMIC-IV data.

Setup MCP Client for BigQuery:

m3 config

Alternative: For Claude Desktop specifically:

m3 config claude --backend bigquery --project-id YOUR_PROJECT_ID

Test BigQuery Access - Restart your MCP client and ask:

Use the get_race_distribution function to show me the top 5 races in MIMIC-IV admissions.

🔧 Advanced Configuration

Need to configure other MCP clients or customize settings? Use these commands:

Interactive Configuration (Universal)

m3 config

Generates configuration for any MCP client with step-by-step guidance.

Quick Configuration Examples

# Quick universal config with defaults
m3 config --quick

# Universal config with custom DuckDB database
m3 config --quick --backend duckdb --db-path /path/to/database.duckdb

# Save config to file for other MCP clients
m3 config --output my_config.json

OAuth2 Authentication (Optional)

For production deployments requiring secure access to medical data:

# Enable OAuth2 with Claude Desktop
m3 config claude --enable-oauth2 \
  --oauth2-issuer https://your-auth-provider.com \
  --oauth2-audience m3-api \
  --oauth2-scopes "read:mimic-data"

# Or configure interactively
m3 config  # Choose OAuth2 option during setup

Supported OAuth2 Providers:

Auth0, Google Identity Platform, Microsoft Azure AD, Keycloak
Any OAuth2/OpenID Connect compliant provider

Key Benefits:

🔒 JWT Token Validation: Industry-standard security
🎯 Scope-based Access: Fine-grained permissions
🛡️ Rate Limiting: Abuse protection
📊 Audit Logging: Security monitoring

📖 Complete OAuth2 Setup Guide: See docs/OAUTH2_AUTHENTICATION.md for detailed configuration, troubleshooting, and production deployment guidelines.

🛠️ Available MCP Tools

When your MCP client processes questions, it uses these tools automatically:

get_database_schema: List all available tables
get_table_info: Get column info and sample data for a table
execute_mimic_query: Execute SQL SELECT queries
get_icu_stays: ICU stay information and length of stay data
get_lab_results: Laboratory test results
get_race_distribution: Patient race distribution

Example Prompts

Try asking your MCP client these questions:

Demographics & Statistics:

Prompt: What is the race distribution in MIMIC-IV admissions?
Prompt: Show me patient demographics for ICU stays
Prompt: How many total admissions are in the database?

Clinical Data:

Prompt: Find lab results for patient X
Prompt: What lab tests are most commonly ordered?
Prompt: Show me recent ICU admissions

Data Exploration:

Prompt: What tables are available in the database?
Prompt: What tools do you have for MIMIC-IV data?

🎩 Pro Tips

Do you want to pre-approve the usage of all tools in Claude Desktop? Use the prompt below and then select Always Allow
- Prompt: Can you please call all your tools in a logical sequence?

🔍 Troubleshooting

Common Issues

Local "Parquet not found" or view errors: Rerun the m3 init command for your chosen dataset.

MCP client server not starting:

Check your MCP client logs (for Claude Desktop: Help → View Logs)
Verify configuration file location and format
Restart your MCP client completely

OAuth2 Authentication Issues

"Missing OAuth2 access token" errors:

# Set your access token
export M3_OAUTH2_TOKEN="Bearer your-access-token-here"

"OAuth2 authentication failed" errors:

Verify your token hasn't expired
Check that required scopes are included in your token
Ensure your OAuth2 provider configuration is correct

Rate limit exceeded:

Wait for the rate limit window to reset
Contact your administrator to adjust limits if needed

🔧 OAuth2 Troubleshooting: See OAUTH2_AUTHENTICATION.md for detailed OAuth2 troubleshooting and configuration guides.

BigQuery Issues

"Access Denied" errors:

Ensure you have MIMIC-IV access on PhysioNet
Verify your Google Cloud project has BigQuery API enabled
Check that you're authenticated: gcloud auth list

"Dataset not found" errors:

Confirm your project ID is correct
Ensure you have access to physionet-data project

Authentication issues:

# Re-authenticate
gcloud auth application-default login

# Check current authentication
gcloud auth list

For Developers

See "Local Development" section above for setup instructions.

Running Tests

pytest  # All tests (includes OAuth2 and BigQuery mocks)
pytest tests/test_mcp_server.py -v  # MCP server tests
pytest tests/test_oauth2_auth.py -v  # OAuth2 authentication tests

Test BigQuery Locally

# Set environment variables
export M3_BACKEND=bigquery
export M3_PROJECT_ID=your-project-id
export GOOGLE_CLOUD_PROJECT=your-project-id

# Optional: Test with OAuth2 authentication
export M3_OAUTH2_ENABLED=true
export M3_OAUTH2_ISSUER_URL=https://your-provider.com
export M3_OAUTH2_AUDIENCE=m3-api
export M3_OAUTH2_TOKEN="Bearer your-test-token"

# Test MCP server
m3-mcp-server

Roadmap

🏠 Complete Local Full Dataset: Complete the support for mimic-iv-full (Download CLI)
🔧 Advanced Tools: More specialized medical data functions
📊 Visualization: Built-in plotting and charting tools
🔐 Enhanced Security: Role-based access control, audit logging
🌐 Multi-tenant Support: Organization-level data isolation

🐳 Kubernetes Deployment

Deploy M3 on Kubernetes using Docker images with pre-loaded MIMIC-IV demo database:

# Build and push Docker image
make all  # Will prompt for Docker registry/username

# Or specify registry directly
make all DOCKER_REGISTRY=your-username DOCKER=podman

The container uses StreamableHTTP transport on port 3000 with path /sse. Configure your MCP client to connect to the service endpoint (e.g., http://m3.kagent.svc.cluster.local:3000/sse for intra-cluster access).

Helm charts for deploying M3 are available in a separate repository.

🤝 Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Submit a pull request

Citation

If you use M3 in your research, please cite:

@article{attrach2025conversational,
  title={Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis},
  author={Attrach, Rafi Al and Moreira, Pedro and Fani, Rajna and Umeton, Renato and Celi, Leo Anthony},
  journal={arXiv preprint arXiv:2507.01053},
  year={2025}
}

You can also use the "Cite this repository" button at the top of the GitHub page for other formats.

Related Projects

M3 has been forked and adapted by the community:

MCPStack-MIMIC - Integrates M3 with other MCP servers (Jupyter, sklearn, etc.)

Built with ❤️ for the medical AI community

Need help? Open an issue on GitHub or check our troubleshooting guide above.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pedrojfm rafiaa rajna

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

Dec 19, 2025

0.3.0

Nov 4, 2025

0.2.0

Jul 11, 2025

0.1.5

Jul 3, 2025

0.1.4

Jun 8, 2025

0.1.3

Jun 8, 2025

0.1.2

Jun 8, 2025

0.1.1

Jun 7, 2025

0.1.0

Jun 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

m3_mcp-0.4.0.tar.gz (51.0 kB view details)

Uploaded Dec 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

m3_mcp-0.4.0-py3-none-any.whl (41.4 kB view details)

Uploaded Dec 19, 2025 Python 3

File details

Details for the file m3_mcp-0.4.0.tar.gz.

File metadata

Download URL: m3_mcp-0.4.0.tar.gz
Upload date: Dec 19, 2025
Size: 51.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for m3_mcp-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`4cfa9f7cb876c2fa8bf3f502fa40246e4c54fcccabd71adabd66081f648b37c0`
MD5	`33958639d474cf2ac70ccb8d405937c5`
BLAKE2b-256	`534fc182f79dc3fb0ca1bea175cf3b7b91135167f8081d2819a7b07bb7d38816`

See more details on using hashes here.

Provenance

The following attestation bundles were made for m3_mcp-0.4.0.tar.gz:

Publisher: publish.yaml on rafiattrach/m3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: m3_mcp-0.4.0.tar.gz
- Subject digest: 4cfa9f7cb876c2fa8bf3f502fa40246e4c54fcccabd71adabd66081f648b37c0
- Sigstore transparency entry: 772164726
- Sigstore integration time: Dec 19, 2025
Source repository:
- Permalink: rafiattrach/m3@5a6cc34e0fd4a9b761ca2884c0ee8d547d6b3f5e
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/rafiattrach
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@5a6cc34e0fd4a9b761ca2884c0ee8d547d6b3f5e
- Trigger Event: release

File details

Details for the file m3_mcp-0.4.0-py3-none-any.whl.

File metadata

Download URL: m3_mcp-0.4.0-py3-none-any.whl
Upload date: Dec 19, 2025
Size: 41.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for m3_mcp-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7f76e398238d50c0275bfbf90e9958ed054be1e6e5592004ab2cae105609af51`
MD5	`a353fa15b5fb90944ef002702296d63b`
BLAKE2b-256	`1725804ac38c681a4d2876100b187695f537f7c61a95ed0b6042f39562c423a6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for m3_mcp-0.4.0-py3-none-any.whl:

Publisher: publish.yaml on rafiattrach/m3

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: m3_mcp-0.4.0-py3-none-any.whl
- Subject digest: 7f76e398238d50c0275bfbf90e9958ed054be1e6e5592004ab2cae105609af51
- Sigstore transparency entry: 772164733
- Sigstore integration time: Dec 19, 2025
Source repository:
- Permalink: rafiattrach/m3@5a6cc34e0fd4a9b761ca2884c0ee8d547d6b3f5e
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/rafiattrach
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@5a6cc34e0fd4a9b761ca2884c0ee8d547d6b3f5e
- Trigger Event: release

m3-mcp 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

M3: MIMIC-IV + MCP + Models 🏥🤖

Features

🚀 Quick Start

Install uv (required for uvx)

BigQuery Setup (Optional - Full Dataset)

M3 Initialization

Backend Comparison

Alternative Installation Methods

🐳 Docker (No Python Required)

pip Install + CLI Tools

Local Development

Using UV (Recommended)

🗄️ Database Configuration

Option A: Local Demo (DuckDB + Parquet)

Option B: Local Full Dataset (DuckDB + Parquet)

Option C: BigQuery (Full Dataset)

Prerequisites

Setup Steps

🔧 Advanced Configuration

Interactive Configuration (Universal)

Quick Configuration Examples

OAuth2 Authentication (Optional)

🛠️ Available MCP Tools

Example Prompts

🎩 Pro Tips

🔍 Troubleshooting

Common Issues

OAuth2 Authentication Issues

BigQuery Issues

For Developers

Running Tests

Test BigQuery Locally

Roadmap

🐳 Kubernetes Deployment

🤝 Contributing

Citation

Related Projects

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Install uv (required for `uvx`)

Using `UV` (Recommended)