Scoring engine for MCP server quality assessment

These details have not been verified by PyPI

Project links

Project description

mcp-scoring-engine

A standalone scoring engine for evaluating the quality of Model Context Protocol (MCP) servers. Pure Python, no framework dependencies — just dataclasses, scoring logic, and network probes.

Used in production by MCP Scoreboard to grade thousands of MCP servers.

Installation

pip install mcp-scoring-engine

Quick Start

Score a server from its GitHub repo (static analysis)

from mcp_scoring_engine import ServerInfo, analyze_repo, compute_score

server = ServerInfo(
    name="my-mcp-server",
    description="A tool server for doing useful things",
    repo_url="https://github.com/owner/my-mcp-server",
)

static = analyze_repo(server.repo_url)
result = compute_score(server, static_result=static)

print(result.composite_score)  # 0–100
print(result.grade)            # "A+", "B", "D", etc.
print(result.score_type)       # "partial" (1 tier) or "full" (2+ tiers)

Probe a running server

from mcp_scoring_engine import (
    ServerInfo, probe_server, deep_probe_server, compute_score
)

# Fast health check (~10s) — connection, initialize, ping
fast = probe_server("https://my-server.example.com/mcp")
print(fast.is_reachable, fast.connection_ms)

# Deep protocol probe (~30s) — schema validation, error handling, fuzz testing
deep = deep_probe_server("https://my-server.example.com/mcp")
print(deep.tools_count, deep.schema_valid, deep.fuzz_score)

# Score with the probe results
server = ServerInfo(name="my-server", description="...", repo_url="...")
static = analyze_repo(server.repo_url)
result = compute_score(server, static_result=static, deep_probe=deep)
print(result.grade)

Probe a stdio server

from mcp_scoring_engine import probe_server_stdio, deep_probe_server_stdio

fast = probe_server_stdio(["npx", "-y", "@modelcontextprotocol/server-memory"])
deep = deep_probe_server_stdio(["python", "-m", "my_mcp_server"])

Classify a server

from mcp_scoring_engine import classify_server, ServerInfo

server = ServerInfo(
    name="stripe-mcp",
    description="MCP server for Stripe payment processing",
    repo_url="https://github.com/stripe/stripe-mcp",
)

category, targets = classify_server(server)
print(category)  # "finance"
print(targets)   # ["Stripe"]

Detect entry points for stdio servers

from mcp_scoring_engine import detect_entry_point, make_github_file_reader

# With a GitHubPublicClient (from your own GitHub API code)
file_reader = make_github_file_reader(client)
tree = client.get_tree()

result = detect_entry_point(tree, file_reader)
# {"language": "python", "run_cmd": ["python", "-m", "my_server"],
#  "install_cmd": "uv pip install -e .",
#  "source": "pyproject.toml [project.scripts]", "confidence": "high"}

Entry point detection parses build metadata to infer how to run an MCP server:

Python: pyproject.toml scripts, setup.cfg/setup.py console_scripts, __main__.py
Node: package.json bin field, scripts.start, main field

When called via analyze_repo(), detection piggybacks on the already-fetched file tree at zero extra API cost. The result is stored in StaticAnalysis.details["entry_point"].

Detect red flags

from mcp_scoring_engine import detect_flags, ServerInfo

server = ServerInfo(
    name="sketchy-server",
    description="A MCP server",
    repo_url="",
    remote_endpoint_url="http://localhost:3000/mcp",
)

flags = detect_flags(server)
for flag in flags:
    print(f"[{flag.severity}] {flag.label}: {flag.description}")
    # [critical] No Source Code: No repository URL or source link provided
    # [warning] Staging Artifact: Endpoint URL contains localhost or staging reference

Architecture

The engine evaluates servers across three data tiers:

Tier	Source	What it measures
Tier 1 — Static Analysis	GitHub repo	Schema completeness, description quality, documentation, maintenance pulse, dependency health, license clarity, version hygiene
Tier 2 — Protocol Probe	Live server	Connection health, tool schema validation, error handling, fuzz resilience, auth discovery
Tier 3 — Reliability	Rolling window	Uptime percentage, p50/p95 latency

The composite score is a weighted blend of five categories:

Category	Weight
Schema & Docs	25%
Protocol Compliance	20%
Reliability	20%
Maintenance	15%
Security	20%

Score types:

partial — Only 1 data tier available. Numeric score but no letter grade.
full — 2+ data tiers. Graded A+ through F.

API Reference

Core

Function	Description
`compute_score(server, static_result?, deep_probe?, reliability?)`	Compute weighted composite score → `ScoreResult`
`score_to_grade(score)`	Convert 0–100 → letter grade (A+, A, B, C, D, F)
`classify_server(server)`	Categorize a server → `(category, target_platforms)`
`detect_flags(server, context?)`	Detect red flags → `list[Flag]`
`generate_badges(server, static_result?, deep_probe?, reliability?, flags?)`	Generate display badges → `dict`

Probes

Function	Description
`probe_server(url)`	Fast health check over HTTP → `FastProbeResult`
`probe_server_stdio(command)`	Fast health check over stdio → `FastProbeResult`
`deep_probe_server(url)`	Full protocol probe over HTTP → `DeepProbeResult`
`deep_probe_server_stdio(command)`	Full protocol probe over stdio → `DeepProbeResult`
`analyze_repo(repo_url)`	Static analysis of GitHub repo → `StaticAnalysis`
`detect_entry_point(file_tree, file_reader)`	Detect how to run a server from repo metadata → `dict \| None`
`make_github_file_reader(client)`	Create a file_reader callable from a GitHub API client → `Callable`
`compute_reliability_score(data)`	Score from uptime + latency → `int`

Types

All inputs and outputs are plain dataclasses:

ServerInfo — Input server metadata (name, description, repo_url, etc.)
ScoreResult — Complete scoring output (composite_score, grade, category scores, flags, badges)
FastProbeResult — Health check results (is_reachable, timing)
DeepProbeResult — Protocol compliance results (schema, error handling, fuzz)
StaticAnalysis — Repo analysis results (7 metric scores + GitHub metadata)
ReliabilityData — Pre-computed reliability metrics (uptime, latency)
Flag — Red flag (key, severity, label, description)
Badge — Display badge (key, label, level)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.0

Mar 12, 2026

0.8.0

Mar 6, 2026

0.7.2

Mar 6, 2026

0.7.1

Mar 6, 2026

0.7.0

Mar 6, 2026

0.6.0

Mar 5, 2026

0.5.0

Mar 5, 2026

0.4.2

Mar 5, 2026

This version

0.4.1

Mar 5, 2026

0.4.0

Mar 3, 2026

0.3.0

Mar 3, 2026

0.2.0

Mar 2, 2026

0.1.2

Mar 2, 2026

0.1.1

Mar 2, 2026

0.1.0

Mar 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_scoring_engine-0.4.1.tar.gz (44.7 kB view details)

Uploaded Mar 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_scoring_engine-0.4.1-py3-none-any.whl (37.5 kB view details)

Uploaded Mar 5, 2026 Python 3

File details

Details for the file mcp_scoring_engine-0.4.1.tar.gz.

File metadata

Download URL: mcp_scoring_engine-0.4.1.tar.gz
Upload date: Mar 5, 2026
Size: 44.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_scoring_engine-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`b54bfee1335900d97926a96da2442c909ab98de9de7bbf465333e873efc0d6c8`
MD5	`42e395d376c3a954af7d221785c34a24`
BLAKE2b-256	`0e48bc6282c1e7e20391a9c7b09baa53385de10bbe024a09f81d69129d922219`

See more details on using hashes here.

File details

Details for the file mcp_scoring_engine-0.4.1-py3-none-any.whl.

File metadata

Download URL: mcp_scoring_engine-0.4.1-py3-none-any.whl
Upload date: Mar 5, 2026
Size: 37.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mcp_scoring_engine-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0dd39bbcb27b97cc76d91988f64ee16c0de0a276126634bbe28552b9aeb51ab3`
MD5	`7badd747f87d7773360a6291a0526316`
BLAKE2b-256	`5438d532a64c728e68f6bce224a3f663634a7adfad989df1dcb5b6b7a1c25b4c`

See more details on using hashes here.

mcp-scoring-engine 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mcp-scoring-engine

Installation

Quick Start

Score a server from its GitHub repo (static analysis)

Probe a running server

Probe a stdio server

Classify a server

Detect entry points for stdio servers

Detect red flags

Architecture

API Reference

Core

Probes

Types

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes