MCP server for LLM inference performance prediction — Traction Layer AI
Project description
Inference Predictor MCP Server
MCP server for Inference Predictor by Traction Layer AI — predict LLM inference performance (TTFT, throughput, cost) for any Hardware x Model x Runtime configuration, directly from Claude.
Installation
pip install inference-predictor-mcp
Claude Desktop Configuration
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"inference-predictor": {
"command": "python",
"args": ["-m", "inference_predictor_mcp"]
}
}
}
With a Pro API key (enables compare, optimize, batch-sweep, visualize, SKU listing):
{
"mcpServers": {
"inference-predictor": {
"command": "python",
"args": ["-m", "inference_predictor_mcp"],
"env": {
"KPG_API_KEY": "your-api-key-here"
}
}
}
}
Available Tools
Free Tier (no API key required)
| Tool | Description |
|---|---|
predict_performance |
Predict TTFT, throughput, cost for a single hardware config |
check_hardware_compatibility |
Check which GPUs can fit a model's weights |
explain_model |
Generate educational architecture explainer |
list_models |
List all 18 registered models with parameters |
health_check |
Check API health and version |
Pro Tier (API key required)
| Tool | Description |
|---|---|
compare_configs |
Compare vLLM, SGLang, TensorRT-LLM side-by-side |
find_optimal_hardware |
Search for cheapest/fastest hardware config |
batch_size_sweep |
Sweep batch sizes to find optimal throughput |
visualize_kpg |
Generate interactive Kernel Pipeline Graph |
list_hardware_skus |
List AWS GPU instance SKUs with pricing |
Get a Pro API Key
Visit predictor.tractionlayer.ai to obtain a Pro API key.
Environment Variables
| Variable | Default | Description |
|---|---|---|
KPG_API_BASE_URL |
Production API Gateway | Override for self-hosted deployments |
KPG_API_KEY |
(none) | Pro tier API key |
KPG_TIMEOUT |
30 | HTTP timeout in seconds |
Development
If you're working on both this package and the main KPG_Predictor repo
in the same virtualenv, you may hit a starlette version conflict between
MCP SDK (requires >=1.0.0) and FastAPI (requires <0.49.0). Resolve
by installing MCP first, then explicitly pinning starlette:
pip install -e mcp/
pip install "starlette<0.49.0,>=0.40.0"
End users who only pip install inference-predictor-mcp don't encounter this.
Documentation
Full Inference Predictor documentation: docs/cli.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inference_predictor_mcp-0.1.0.tar.gz.
File metadata
- Download URL: inference_predictor_mcp-0.1.0.tar.gz
- Upload date:
- Size: 6.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
323da3b0d3c37618e4d14bef9dbd71f91a141759487a7710f499a63bcc3da359
|
|
| MD5 |
e2012220b8ee01fb0b22f4fd092c8072
|
|
| BLAKE2b-256 |
6df3473392c0c644d713b47a809ba222f1111236b82b19bba591f53dcb4f1b20
|
File details
Details for the file inference_predictor_mcp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: inference_predictor_mcp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7436d889fabc20dcaaed64703d7fa2ef82a6ccd6dcfa95b86517139eaa99e7f8
|
|
| MD5 |
8b47d4732d98f9c667850ff656fcf657
|
|
| BLAKE2b-256 |
82405e866f8681d8def6dfbb0681e12f6851de22d1c552f3ad3403b737240861
|