Skip to main content

MCP server for datacenter GPU liquid cooling thermal analysis

Project description

CI PyPI Python 3.10+

thermal-mcp-server

An NVL72 rack dissipates 120 kW through liquid cooling. Choosing the wrong flow rate, coolant, or manifold configuration means thermal throttling (lost compute revenue) or overprovisioned cooling (wasted capex). thermal-mcp-server quantifies these tradeoffs using a first-principles thermal resistance model exposed as an MCP server.

GPU Thermal Specs

Chip TDP (cold plate sizing) Tj Design Ceiling Source
NVIDIA H100 SXM 700 W 83°C (throttle onset) NVIDIA H100 Datasheet
NVIDIA B200 (NVL72) 1,200 W ~75°C (not NVIDIA-published) NVIDIA GB200 NVL72, SemiAnalysis
NVIDIA B200 (HGX standalone) 1,000 W Not published Lenovo ThinkSystem HGX B200 Product Guide
AMD MI300X 750 W Not published AMD MI300X Data Sheet (PDF)
Intel Gaudi 3 OAM 900 W (air) / 1,200 W (liquid) Not published Intel Gaudi 3 Product Brief (PDF)

Note: TDP values are for cold plate thermal sizing, not electrical nameplate. The GB200 1,200 W figure reflects per-GPU heat dissipation in the NVL72 liquid-cooled configuration — not 120 kW / 72 amortized (which would include CPUs, NVSwitches, NICs, and VRM losses).

Demo

Single H100 SXM cold plate analysis — 700 W, water, 10 LPM, 35°C inlet:

from thermal_mcp_server.physics import analyze
from thermal_mcp_server.schemas import AnalyzeColdplateInput

result = analyze(AnalyzeColdplateInput(
    heat_load_w=700, flow_rate_lpm=10, inlet_temp_c=35.0, coolant="water"
))
{
  "coolant": "water",
  "regime": "turbulent",
  "reynolds": 4667.6,
  "nusselt": 41.11,
  "heat_transfer_coeff_w_m2k": 24667.2,
  "pressure_drop_pa": 26503.0,
  "pump_power_w": 8.83,
  "coolant_rise_c": 1.01,
  "junction_temp_c": 80.69,
  "resistances_k_per_w": {
    "junction_to_case": 0.04,
    "tim": 0.02,
    "base_conduction": 0.00052,
    "convection": 0.00403,
    "total": 0.06455
  },
  "warnings": []
}

At 10 LPM water with 35°C inlet, the H100 runs at 80.7°C junction — 2.3°C of margin below the 83°C throttle point. This is a tight operating point at these inlet conditions; reducing inlet temperature to 25°C or increasing flow rate would add margin.

How It Works

The physics engine models a single cold plate as a 1D thermal resistance network: junction-to-case (R_jc), thermal interface material (R_tim), copper base conduction, and forced convection to the coolant. Convective heat transfer uses the Dittus-Boelter correlation for turbulent flow and a constant Nu = 4.36 for laminar flow, with linear blending in the transition regime (Re 2300–4000). Pressure drop uses Darcy-Weisbach with Blasius friction factor, also blended through the transition regime. All assumptions — pump efficiency, channel geometry, property values — are documented inline in the source.

flowchart LR
    A["Input\nchip power, flow rate,\ncoolant, geometry"] --> B["Physics Engine\nDittus-Boelter, Darcy-Weisbach,\nR_total network"]
    B --> C["Output\nT_junction, ΔP,\nthermal margin, pump power"]

Quick Start

Install from PyPI:

python -m venv thermal-venv
source thermal-venv/bin/activate
pip install thermal-mcp-server

Configure in your MCP client (e.g., Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "thermal": {
      "command": "/absolute/path/to/thermal-venv/bin/python",
      "args": ["-m", "thermal_mcp_server"]
    }
  }
}

Important: Use the absolute path to your venv's Python binary. Claude Desktop does not inherit your shell's PATH, so bare python will fail with "No such file or directory."

Install from source (for development):

git clone https://github.com/riccardovietri/thermal-mcp-server.git
cd thermal-mcp-server
python -m venv venv && source venv/bin/activate
pip install -e .

See the MCP documentation for client setup details.

Usage with Claude

Once configured, ask Claude natural-language questions about liquid cooling:

"I have 8 H100 SXM GPUs at 700 W each with water cooling at 8 LPM per cold plate and 25°C inlet. What's the junction temperature and am I within thermal margin?"

Claude calls analyze_coldplate and interprets the result:

"At 8 LPM with 25°C inlet water, each H100 runs at 70.9°C junction — 12.1°C below the 83°C throttle onset. Convective resistance (0.004 K/W) is small relative to the package resistances (R_jc + R_tim = 0.06 K/W), so increasing flow rate has diminishing returns. You have room to reduce flow to ~5.5 LPM before hitting margin, which would cut pump power roughly in half."

This works in Claude Desktop, Claude.ai with MCP, or any MCP-compatible client.

image

Tools

  • analyze_coldplate — Single-point thermal and hydraulic analysis. Takes heat load, flow rate, inlet temperature, coolant type, and geometry. Returns junction temperature, thermal resistances, pressure drop, and pump power.

  • compare_coolants — Runs analyze_coldplate for water and 50/50 glycol under identical conditions. Returns side-by-side junction temperature, pressure drop, and pump power for each coolant.

  • optimize_flow_rate — Binary search for the minimum flow rate that keeps junction temperature at or below a target. Returns the minimum flow rate and the full thermal analysis at that operating point.

See docs/mcp.md for full input/output schemas.

Scope

This tool models steady-state, single-cold-plate performance. Rack-level manifold modeling and transient thermal response are on the roadmap.

Roadmap:

  • Rack-level series/parallel manifold model (NVL72 validation target: 80 LPM, 120 kW, 1.5 bar max ΔP)
  • Transient thermal response (power-on ramp, workload spikes)
  • Coolant cost-performance comparison (water vs. glycol vs. engineered fluids)
  • Flow maldistribution sensitivity analysis

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thermal_mcp_server-0.2.1-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file thermal_mcp_server-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for thermal_mcp_server-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 50cf23120a17a230526d8555cde29542d451ee4aa2ffcc5c98ac14fee4a9a853
MD5 9e6f79530d8118f357a49d39e08f6c4f
BLAKE2b-256 42598e3fdb25c58de2a050db752662ed3c207ada79eb25b9d8619b19be3f0d61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page