Skip to main content

Scoring engine for MCP server quality assessment

Project description

mcp-scoring-engine

A standalone scoring engine for evaluating the quality of Model Context Protocol (MCP) servers. Pure Python, no framework dependencies — just dataclasses, scoring logic, and network probes.

Used in production by MCP Scoreboard to grade thousands of MCP servers.

Installation

pip install mcp-scoring-engine

Quick Start

Score a server from its GitHub repo (static analysis)

from mcp_scoring_engine import ServerInfo, analyze_repo, compute_score

server = ServerInfo(
    name="my-mcp-server",
    description="A tool server for doing useful things",
    repo_url="https://github.com/owner/my-mcp-server",
)

static = analyze_repo(server.repo_url)
result = compute_score(server, static_result=static)

print(result.composite_score)  # 0–100
print(result.grade)            # "A+", "B", "D", etc.
print(result.score_type)       # "partial" (1 tier) or "full" (2+ tiers)

Probe a running server

from mcp_scoring_engine import (
    ServerInfo, probe_server, deep_probe_server, compute_score
)

# Fast health check (~10s) — connection, initialize, ping
fast = probe_server("https://my-server.example.com/mcp")
print(fast.is_reachable, fast.connection_ms)

# Deep protocol probe (~30s) — schema validation, error handling, fuzz testing
deep = deep_probe_server("https://my-server.example.com/mcp")
print(deep.tools_count, deep.schema_valid, deep.fuzz_score)

# Score with the probe results
server = ServerInfo(name="my-server", description="...", repo_url="...")
static = analyze_repo(server.repo_url)
result = compute_score(server, static_result=static, deep_probe=deep)
print(result.grade)

Probe a stdio server

from mcp_scoring_engine import probe_server_stdio, deep_probe_server_stdio

fast = probe_server_stdio(["npx", "-y", "@modelcontextprotocol/server-memory"])
deep = deep_probe_server_stdio(["python", "-m", "my_mcp_server"])

Classify a server

from mcp_scoring_engine import classify_server, ServerInfo

server = ServerInfo(
    name="stripe-mcp",
    description="MCP server for Stripe payment processing",
    repo_url="https://github.com/stripe/stripe-mcp",
)

category, targets = classify_server(server)
print(category)  # "finance"
print(targets)   # ["Stripe"]

Detect red flags

from mcp_scoring_engine import detect_flags, ServerInfo

server = ServerInfo(
    name="sketchy-server",
    description="A MCP server",
    repo_url="",
    remote_endpoint_url="http://localhost:3000/mcp",
)

flags = detect_flags(server)
for flag in flags:
    print(f"[{flag.severity}] {flag.label}: {flag.description}")
    # [critical] No Source Code: No repository URL or source link provided
    # [warning] Staging Artifact: Endpoint URL contains localhost or staging reference

Architecture

The engine evaluates servers across three data tiers:

Tier Source What it measures
Tier 1 — Static Analysis GitHub repo Schema completeness, description quality, documentation, maintenance pulse, dependency health, license clarity, version hygiene
Tier 2 — Protocol Probe Live server Connection health, tool schema validation, error handling, fuzz resilience, auth discovery
Tier 3 — Reliability Rolling window Uptime percentage, p50/p95 latency

The composite score is a weighted blend of five categories:

Category Weight
Schema & Docs 25%
Protocol Compliance 20%
Reliability 20%
Maintenance 15%
Security 20%

Score types:

  • partial — Only 1 data tier available. Numeric score but no letter grade.
  • full — 2+ data tiers. Graded A+ through F.

API Reference

Core

Function Description
compute_score(server, static_result?, deep_probe?, reliability?) Compute weighted composite score → ScoreResult
score_to_grade(score) Convert 0–100 → letter grade (A+, A, B, C, D, F)
classify_server(server) Categorize a server → (category, target_platforms)
detect_flags(server, context?) Detect red flags → list[Flag]
generate_badges(server, static_result?, deep_probe?, reliability?, flags?) Generate display badges → dict

Probes

Function Description
probe_server(url) Fast health check over HTTP → FastProbeResult
probe_server_stdio(command) Fast health check over stdio → FastProbeResult
deep_probe_server(url) Full protocol probe over HTTP → DeepProbeResult
deep_probe_server_stdio(command) Full protocol probe over stdio → DeepProbeResult
analyze_repo(repo_url) Static analysis of GitHub repo → StaticAnalysis
compute_reliability_score(data) Score from uptime + latency → int

Types

All inputs and outputs are plain dataclasses:

  • ServerInfo — Input server metadata (name, description, repo_url, etc.)
  • ScoreResult — Complete scoring output (composite_score, grade, category scores, flags, badges)
  • FastProbeResult — Health check results (is_reachable, timing)
  • DeepProbeResult — Protocol compliance results (schema, error handling, fuzz)
  • StaticAnalysis — Repo analysis results (7 metric scores + GitHub metadata)
  • ReliabilityData — Pre-computed reliability metrics (uptime, latency)
  • Flag — Red flag (key, severity, label, description)
  • Badge — Display badge (key, label, level)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_scoring_engine-0.4.0.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_scoring_engine-0.4.0-py3-none-any.whl (36.8 kB view details)

Uploaded Python 3

File details

Details for the file mcp_scoring_engine-0.4.0.tar.gz.

File metadata

  • Download URL: mcp_scoring_engine-0.4.0.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_scoring_engine-0.4.0.tar.gz
Algorithm Hash digest
SHA256 29ec20d607efec783163cbf26a93d697a23678dde452c18866159d4be2236cb6
MD5 bf461e0fc8e242c00a172f7276ee9364
BLAKE2b-256 26b01c44f7aa60e3cebe16911f9dc4b6d56496a5b5eec4d95c641fcac286910d

See more details on using hashes here.

File details

Details for the file mcp_scoring_engine-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_scoring_engine-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a93fb409a0e38ab90b5bef341a1ad0518e0dcc34c2a16e9e491d858d6765d809
MD5 184abfbe7f564f79482ea91b54ca2067
BLAKE2b-256 19978ffd355b03f266ea1916caec2a44f35965017bd1e934b294a94b1e4325b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page