Skip to main content

Scoring engine for MCP server quality assessment

Project description

mcp-scoring-engine

A standalone scoring engine for evaluating the quality of Model Context Protocol (MCP) servers. Pure Python, no framework dependencies — just dataclasses, scoring logic, and network probes.

Used in production by MCP Scoreboard to grade thousands of MCP servers.

Installation

pip install mcp-scoring-engine

Quick Start

Score a server from its GitHub repo (static analysis)

from mcp_scoring_engine import ServerInfo, analyze_repo, compute_score

server = ServerInfo(
    name="my-mcp-server",
    description="A tool server for doing useful things",
    repo_url="https://github.com/owner/my-mcp-server",
)

static = analyze_repo(server.repo_url)
result = compute_score(server, static_result=static)

print(result.composite_score)  # 0–100
print(result.grade)            # "A+", "B", "D", etc.
print(result.score_type)       # "partial" (1 tier) or "full" (2+ tiers)

Probe a running server

from mcp_scoring_engine import (
    ServerInfo, probe_server, deep_probe_server, compute_score
)

# Fast health check (~10s) — connection, initialize, ping
fast = probe_server("https://my-server.example.com/mcp")
print(fast.is_reachable, fast.connection_ms)

# Deep protocol probe (~30s) — schema validation, error handling, fuzz testing
deep = deep_probe_server("https://my-server.example.com/mcp")
print(deep.tools_count, deep.schema_valid, deep.fuzz_score)

# Score with the probe results
server = ServerInfo(name="my-server", description="...", repo_url="...")
static = analyze_repo(server.repo_url)
result = compute_score(server, static_result=static, deep_probe=deep)
print(result.grade)

Probe a stdio server

from mcp_scoring_engine import probe_server_stdio, deep_probe_server_stdio

fast = probe_server_stdio(["npx", "-y", "@modelcontextprotocol/server-memory"])
deep = deep_probe_server_stdio(["python", "-m", "my_mcp_server"])

Classify a server

from mcp_scoring_engine import classify_server, ServerInfo

server = ServerInfo(
    name="stripe-mcp",
    description="MCP server for Stripe payment processing",
    repo_url="https://github.com/stripe/stripe-mcp",
)

category, targets = classify_server(server)
print(category)  # "finance"
print(targets)   # ["Stripe"]

Detect red flags

from mcp_scoring_engine import detect_flags, ServerInfo

server = ServerInfo(
    name="sketchy-server",
    description="A MCP server",
    repo_url="",
    remote_endpoint_url="http://localhost:3000/mcp",
)

flags = detect_flags(server)
for flag in flags:
    print(f"[{flag.severity}] {flag.label}: {flag.description}")
    # [critical] No Source Code: No repository URL or source link provided
    # [warning] Staging Artifact: Endpoint URL contains localhost or staging reference

Architecture

The engine evaluates servers across three data tiers:

Tier Source What it measures
Tier 1 — Static Analysis GitHub repo Schema completeness, description quality, documentation, maintenance pulse, dependency health, license clarity, version hygiene
Tier 2 — Protocol Probe Live server Connection health, tool schema validation, error handling, fuzz resilience, auth discovery
Tier 3 — Reliability Rolling window Uptime percentage, p50/p95 latency

The composite score is a weighted blend of five categories:

Category Weight
Schema & Docs 25%
Protocol Compliance 20%
Reliability 20%
Maintenance 15%
Security 20%

Score types:

  • partial — Only 1 data tier available. Numeric score but no letter grade.
  • full — 2+ data tiers. Graded A+ through F.

API Reference

Core

Function Description
compute_score(server, static_result?, deep_probe?, reliability?) Compute weighted composite score → ScoreResult
score_to_grade(score) Convert 0–100 → letter grade (A+, A, B, C, D, F)
classify_server(server) Categorize a server → (category, target_platforms)
detect_flags(server, context?) Detect red flags → list[Flag]
generate_badges(server, static_result?, deep_probe?, reliability?, flags?) Generate display badges → dict

Probes

Function Description
probe_server(url) Fast health check over HTTP → FastProbeResult
probe_server_stdio(command) Fast health check over stdio → FastProbeResult
deep_probe_server(url) Full protocol probe over HTTP → DeepProbeResult
deep_probe_server_stdio(command) Full protocol probe over stdio → DeepProbeResult
analyze_repo(repo_url) Static analysis of GitHub repo → StaticAnalysis
compute_reliability_score(data) Score from uptime + latency → int

Types

All inputs and outputs are plain dataclasses:

  • ServerInfo — Input server metadata (name, description, repo_url, etc.)
  • ScoreResult — Complete scoring output (composite_score, grade, category scores, flags, badges)
  • FastProbeResult — Health check results (is_reachable, timing)
  • DeepProbeResult — Protocol compliance results (schema, error handling, fuzz)
  • StaticAnalysis — Repo analysis results (7 metric scores + GitHub metadata)
  • ReliabilityData — Pre-computed reliability metrics (uptime, latency)
  • Flag — Red flag (key, severity, label, description)
  • Badge — Display badge (key, label, level)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_scoring_engine-0.3.0.tar.gz (41.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_scoring_engine-0.3.0-py3-none-any.whl (33.7 kB view details)

Uploaded Python 3

File details

Details for the file mcp_scoring_engine-0.3.0.tar.gz.

File metadata

  • Download URL: mcp_scoring_engine-0.3.0.tar.gz
  • Upload date:
  • Size: 41.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_scoring_engine-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d54f016865d8084e4cd6a27c574ee465ff8cd3b8951746c7b02ae5b9de6fe938
MD5 e3ee0a448e3b5ea2943b6656a6b4a665
BLAKE2b-256 eab83bc36d29a633b32d57fb65d1470c6c0c8d5a193354b909ba1aa32b335764

See more details on using hashes here.

File details

Details for the file mcp_scoring_engine-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_scoring_engine-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb5b4919ea9e517d085220ec84b2a5a96e7a9172be92e0b3fe6a24704c198a4e
MD5 19408332137b1369d8bc9f1083c32110
BLAKE2b-256 b3e2712bf4755c22a60e2157d8dfe90e900442b2eb4e2069b01c14f0d9e4b00b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page