Skip to main content

Scoring engine for MCP server quality assessment

Project description

mcp-scoring-engine

A standalone scoring engine for evaluating the quality of Model Context Protocol (MCP) servers. Pure Python, no framework dependencies — just dataclasses, scoring logic, and network probes.

Used in production by MCP Scoreboard to grade thousands of MCP servers.

Installation

pip install mcp-scoring-engine

Quick Start

Score a server from its GitHub repo (static analysis)

from mcp_scoring_engine import ServerInfo, analyze_repo, compute_score

server = ServerInfo(
    name="my-mcp-server",
    description="A tool server for doing useful things",
    repo_url="https://github.com/owner/my-mcp-server",
)

static = analyze_repo(server.repo_url)
result = compute_score(server, static_result=static)

print(result.composite_score)  # 0–100
print(result.grade)            # "A+", "B", "D", etc.
print(result.score_type)       # "partial" (1 tier) or "full" (2+ tiers)

Probe a running server

from mcp_scoring_engine import (
    ServerInfo, probe_server, deep_probe_server, compute_score
)

# Fast health check (~10s) — connection, initialize, ping
fast = probe_server("https://my-server.example.com/mcp")
print(fast.is_reachable, fast.connection_ms)

# Deep protocol probe (~30s) — schema validation, error handling, fuzz testing
deep = deep_probe_server("https://my-server.example.com/mcp")
print(deep.tools_count, deep.schema_valid, deep.fuzz_score)

# Score with the probe results
server = ServerInfo(name="my-server", description="...", repo_url="...")
static = analyze_repo(server.repo_url)
result = compute_score(server, static_result=static, deep_probe=deep)
print(result.grade)

Probe a stdio server

from mcp_scoring_engine import probe_server_stdio, deep_probe_server_stdio

fast = probe_server_stdio(["npx", "-y", "@modelcontextprotocol/server-memory"])
deep = deep_probe_server_stdio(["python", "-m", "my_mcp_server"])

Classify a server

from mcp_scoring_engine import classify_server, ServerInfo

server = ServerInfo(
    name="stripe-mcp",
    description="MCP server for Stripe payment processing",
    repo_url="https://github.com/stripe/stripe-mcp",
)

category, targets = classify_server(server)
print(category)  # "finance"
print(targets)   # ["Stripe"]

Detect red flags

from mcp_scoring_engine import detect_flags, ServerInfo

server = ServerInfo(
    name="sketchy-server",
    description="A MCP server",
    repo_url="",
    remote_endpoint_url="http://localhost:3000/mcp",
)

flags = detect_flags(server)
for flag in flags:
    print(f"[{flag.severity}] {flag.label}: {flag.description}")
    # [critical] No Source Code: No repository URL or source link provided
    # [warning] Staging Artifact: Endpoint URL contains localhost or staging reference

Architecture

The engine evaluates servers across three data tiers:

Tier Source What it measures
Tier 1 — Static Analysis GitHub repo Schema completeness, description quality, documentation, maintenance pulse, dependency health, license clarity, version hygiene
Tier 2 — Protocol Probe Live server Connection health, tool schema validation, error handling, fuzz resilience, auth discovery
Tier 3 — Reliability Rolling window Uptime percentage, p50/p95 latency

The composite score is a weighted blend of five categories:

Category Weight
Schema & Docs 25%
Protocol Compliance 20%
Reliability 20%
Maintenance 15%
Security 20%

Score types:

  • partial — Only 1 data tier available. Numeric score but no letter grade.
  • full — 2+ data tiers. Graded A+ through F.

API Reference

Core

Function Description
compute_score(server, static_result?, deep_probe?, reliability?) Compute weighted composite score → ScoreResult
score_to_grade(score) Convert 0–100 → letter grade (A+, A, B, C, D, F)
classify_server(server) Categorize a server → (category, target_platforms)
detect_flags(server, context?) Detect red flags → list[Flag]
generate_badges(server, static_result?, deep_probe?, reliability?, flags?) Generate display badges → dict

Probes

Function Description
probe_server(url) Fast health check over HTTP → FastProbeResult
probe_server_stdio(command) Fast health check over stdio → FastProbeResult
deep_probe_server(url) Full protocol probe over HTTP → DeepProbeResult
deep_probe_server_stdio(command) Full protocol probe over stdio → DeepProbeResult
analyze_repo(repo_url) Static analysis of GitHub repo → StaticAnalysis
compute_reliability_score(data) Score from uptime + latency → int

Types

All inputs and outputs are plain dataclasses:

  • ServerInfo — Input server metadata (name, description, repo_url, etc.)
  • ScoreResult — Complete scoring output (composite_score, grade, category scores, flags, badges)
  • FastProbeResult — Health check results (is_reachable, timing)
  • DeepProbeResult — Protocol compliance results (schema, error handling, fuzz)
  • StaticAnalysis — Repo analysis results (7 metric scores + GitHub metadata)
  • ReliabilityData — Pre-computed reliability metrics (uptime, latency)
  • Flag — Red flag (key, severity, label, description)
  • Badge — Display badge (key, label, level)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_scoring_engine-0.1.2.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_scoring_engine-0.1.2-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file mcp_scoring_engine-0.1.2.tar.gz.

File metadata

  • Download URL: mcp_scoring_engine-0.1.2.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_scoring_engine-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8d3e68ac733df09495009221fb94b11a4491a7fddbc285aded12dd501b050f72
MD5 5ee9e05f94435987f0074ef3f8ef97d5
BLAKE2b-256 6a287a024f5d0d7a0ba5750fdf5d9586fca113bdadd955cd4f5e708faf891c02

See more details on using hashes here.

File details

Details for the file mcp_scoring_engine-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_scoring_engine-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 66d05f3793cf286cda23aa8afc746767bd9a768e555d6da14a17b04f296b791e
MD5 bb31489dfea0cf9b7f86ffa20bdc7cf5
BLAKE2b-256 a313d796a56a866e8b7dab841b6f3f24e60d5afd64e847ba63c6b4803d342b50

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page