CLI for Claude Code marketplace health — routing evals, coverage checks, and semantic collision detection
Project description
claude-marketplace-evaluator
cme is a CLI for Claude Code marketplace health. It validates that skills route correctly and detects semantic collisions between skills — without running expensive LLM-based eval suites.
Marketplace health is not about optimizing skill descriptions. It is about catching structural problems early: missing evals, broken routing, and overlapping skills that confuse Claude's router. cme runs fast, fits in CI, and fails loud.
Installation
Zero-install with uvx (recommended for CI):
uvx --from claude-marketplace-evaluator cme --help
Or install globally:
pip install claude-marketplace-evaluator
Commands
cme routing
Three-step pipeline that generates routing tests, checks coverage, and runs evals:
- Generate — reads
evals/evals.jsonfrom each skill directory, produces routing test YAML - Coverage check — verifies every skill has an evals file (fails if below threshold)
- Routing eval runner — sends each prompt through the Claude Agent SDK, checks Claude routes to the expected skill
cme routing --plugins-dir plugins/
| Flag | Default | Description |
|---|---|---|
--plugins-dir |
plugins/ |
Path to the plugins directory |
--coverage-threshold |
100 |
Minimum eval coverage percentage. Fails if any skill lacks evals |
--threshold |
95 |
Minimum routing pass rate percentage. Set to 0 to skip the eval runner |
-j / --workers |
4 |
Parallel workers for the eval runner |
--timeout |
30 |
Per-test timeout in seconds |
--max-retries |
1 |
Max retries on rate limit errors (exponential backoff) |
Exit codes: 0 = all checks pass, 1 = coverage or routing threshold not met.
cme overlap
Detects semantic collisions between skills across a marketplace. Two skills collide when their descriptions or trigger queries are similar enough to confuse Claude's routing. Uses an LLM to analyze all skill pairs and produces a JSON report with severity: high | medium | low collision pairs.
cme overlap --plugins-dir plugins/ --output overlap-report.json
| Flag | Default | Description |
|---|---|---|
--plugins-dir |
plugins/ |
Path to the plugins directory |
--output |
overlap-report.json |
Output path for the JSON collision report |
--model |
claude-sonnet-4-5 |
Model for analysis (overrides ANTHROPIC_MODEL env var) |
Exit codes: 0 = no collisions, 1 = collisions detected.
The output report structure:
{
"timestamp": "2026-04-17T00:00:00+00:00",
"model_used": "claude-sonnet-4-5",
"total_skills_analyzed": 6,
"total_collisions": 1,
"collisions": [
{
"skill_a": "plugins/my-plugin/skills/create-pr",
"skill_b": "plugins/my-plugin/skills/submit-pr",
"overlapping_triggers": ["open a pull request"],
"description_excerpts": ["Both skills handle PR creation workflows"],
"severity": "high"
}
]
}
Plugin Layout
cme expects this directory structure:
plugins/
<plugin-name>/
skills/
<skill-name>/
SKILL.md
evals/
evals.json
Each evals.json is a JSON array of routing test entries:
[
{ "query": "Run the test suite for this project", "should_trigger": true },
{ "query": "Can you execute the unit tests?", "should_trigger": true },
{ "query": "Open a pull request for this branch", "should_trigger": false }
]
| Field | Type | Description |
|---|---|---|
query |
string | A user prompt to test routing against |
should_trigger |
boolean | true = this prompt should route to this skill, false = it should not |
Only should_trigger: true entries are used to generate routing test cases. Include should_trigger: false entries to document negative cases (used by overlap detection for trigger context).
Authentication
cme does not manage credentials. It passes through environment variables to the Claude Agent SDK. Configure one of these auth modes:
Claude subscription (OAuth)
For users with a Claude Pro/Team/Enterprise subscription:
claude setup-token # generates the token
export CLAUDE_CODE_OAUTH_TOKEN="your-token"
cme routing --plugins-dir plugins/
Direct API key
For direct Anthropic API access:
export ANTHROPIC_API_KEY="sk-ant-..."
cme routing --plugins-dir plugins/
Databricks AI Gateway
For routing through Databricks AI Gateway, map your workspace secrets to the standard Anthropic SDK env vars:
export ANTHROPIC_AUTH_TOKEN="<DATABRICKS_SP_TOKEN>" # service principal PAT
export ANTHROPIC_BASE_URL="<DATABRICKS_AI_GATEWAY_URL>" # AI Gateway endpoint URL
export ANTHROPIC_MODEL="<DATABRICKS_AI_GATEWAY_MODEL>" # endpoint model name
export ANTHROPIC_CUSTOM_HEADERS="x-databricks-use-coding-agent-mode: true"
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS="1"
export CLAUDE_CODE_ENABLE_FINE_GRAINED_TOOL_STREAMING=""
cme routing --plugins-dir plugins/
In GitHub Actions, these map directly from repository secrets:
env:
ANTHROPIC_AUTH_TOKEN: ${{ secrets.DATABRICKS_SP_TOKEN }}
ANTHROPIC_BASE_URL: ${{ secrets.DATABRICKS_AI_GATEWAY_URL }}
ANTHROPIC_MODEL: ${{ secrets.DATABRICKS_AI_GATEWAY_MODEL }}
ANTHROPIC_CUSTOM_HEADERS: "x-databricks-use-coding-agent-mode: true"
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
CLAUDE_CODE_ENABLE_FINE_GRAINED_TOOL_STREAMING: ""
CI/CD Integration
GitHub Actions workflow
This is a production workflow from claude-marketplace-builder that runs both cme routing and cme overlap on every PR that touches plugin files:
name: CME Checks
on:
pull_request:
paths:
- "plugins/**"
- "evals/**"
workflow_dispatch:
jobs:
coverage:
runs-on: ubuntu-latest
environment: cicd
timeout-minutes: 30
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v4
- name: Check eval coverage and routing
env:
ANTHROPIC_AUTH_TOKEN: ${{ secrets.DATABRICKS_SP_TOKEN }}
ANTHROPIC_BASE_URL: ${{ secrets.DATABRICKS_AI_GATEWAY_URL }}
ANTHROPIC_MODEL: ${{ secrets.DATABRICKS_AI_GATEWAY_MODEL }}
ANTHROPIC_CUSTOM_HEADERS: "x-databricks-use-coding-agent-mode: true"
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
run: uvx --from claude-marketplace-evaluator cme routing --plugins-dir plugins/ --coverage-threshold 100 --threshold 95 --timeout 180
overlap:
runs-on: ubuntu-latest
environment: cicd
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v4
- name: Check skill overlap
env:
ANTHROPIC_AUTH_TOKEN: ${{ secrets.DATABRICKS_SP_TOKEN }}
ANTHROPIC_BASE_URL: ${{ secrets.DATABRICKS_AI_GATEWAY_URL }}
ANTHROPIC_MODEL: ${{ secrets.DATABRICKS_AI_GATEWAY_MODEL }}
ANTHROPIC_CUSTOM_HEADERS: "x-databricks-use-coding-agent-mode: true"
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
run: |
set +e
uvx --from claude-marketplace-evaluator cme overlap --plugins-dir plugins/ --output overlap-report.json
EXIT_CODE=$?
if [ -f overlap-report.json ]; then
echo "## Overlap Report" >> "$GITHUB_STEP_SUMMARY"
echo '```json' >> "$GITHUB_STEP_SUMMARY"
cat overlap-report.json >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
cat overlap-report.json
fi
exit $EXIT_CODE
Posting overlap results as a PR comment
Extend the overlap job to post a formatted collision table as a PR comment using actions/github-script:
overlap:
runs-on: ubuntu-latest
environment: cicd
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v4
- name: Check skill overlap
id: overlap
env:
ANTHROPIC_AUTH_TOKEN: ${{ secrets.DATABRICKS_SP_TOKEN }}
ANTHROPIC_BASE_URL: ${{ secrets.DATABRICKS_AI_GATEWAY_URL }}
ANTHROPIC_MODEL: ${{ secrets.DATABRICKS_AI_GATEWAY_MODEL }}
ANTHROPIC_CUSTOM_HEADERS: "x-databricks-use-coding-agent-mode: true"
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
run: |
set +e
uvx --from claude-marketplace-evaluator cme overlap --plugins-dir plugins/ --output overlap-report.json
echo "exit_code=$?" >> "$GITHUB_OUTPUT"
if [ -f overlap-report.json ]; then
echo "## Overlap Report" >> "$GITHUB_STEP_SUMMARY"
echo '```json' >> "$GITHUB_STEP_SUMMARY"
cat overlap-report.json >> "$GITHUB_STEP_SUMMARY"
echo '```' >> "$GITHUB_STEP_SUMMARY"
fi
- name: Comment on PR with overlap results
if: always() && github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const path = 'overlap-report.json';
if (!fs.existsSync(path)) return;
const report = JSON.parse(fs.readFileSync(path, 'utf8'));
const collisions = report.collisions || [];
let body = '## Skill Overlap Report\n\n';
body += `**Skills analyzed:** ${report.total_skills_analyzed}\n`;
body += `**Collisions found:** ${report.total_collisions}\n\n`;
if (collisions.length === 0) {
body += '✅ No semantic collisions detected.\n';
} else {
body += '| Severity | Skill A | Skill B | Overlapping Triggers |\n';
body += '|----------|---------|---------|---------------------|\n';
for (const c of collisions) {
const triggers = c.overlapping_triggers.join(', ');
body += `| ${c.severity.toUpperCase()} | \`${c.skill_a}\` | \`${c.skill_b}\` | ${triggers} |\n`;
}
body += '\nResolve collisions before merging. Rename skills, narrow descriptions, or deduplicate functionality.\n';
}
// Delete previous cme comments to avoid spam
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
});
for (const comment of comments) {
if (comment.body.startsWith('## Skill Overlap Report')) {
await github.rest.issues.deleteComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: comment.id,
});
}
}
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body,
});
- name: Fail on collisions
if: steps.overlap.outputs.exit_code != '0'
run: exit 1
Local Usage
Run against a local plugins directory:
# Coverage check only (skip routing evals)
cme routing --plugins-dir ./plugins --threshold 0
# Full routing pipeline
cme routing --plugins-dir ./plugins --timeout 60
# Overlap detection
cme overlap --plugins-dir ./plugins
# Increase parallelism for large marketplaces
cme routing --plugins-dir ./plugins -j 8 --timeout 120
# Debug mode (verbose Agent SDK logging)
CME_DEBUG=1 cme routing --plugins-dir ./plugins
Two-Tier Eval Strategy
cme is designed as the fast, structural first tier of a two-tier evaluation approach:
Tier 1: cme (fast, free, structural)
- Runs in seconds to minutes
- Coverage checks require zero LLM calls
- Routing evals use one short Agent SDK call per test case
- Catches missing evals, broken routing, and skill collisions
- Runs on every PR in CI
Tier 2: Full LLM eval runners (deep, expensive)
- Runs multi-turn conversations testing skill behavior end-to-end
- Validates output quality, not just routing correctness
- Costs significantly more in tokens and time
- Runs on release branches or nightly schedules
cme answers "did Claude pick the right skill?" — it does not answer "did the skill produce a good result?" Use tier 1 to gate PRs cheaply, then run tier 2 for deeper validation on release candidates.
Development
uv sync
pre-commit install
make check # lint + format + typecheck
make test # pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file claude_marketplace_evaluator-0.1.2.tar.gz.
File metadata
- Download URL: claude_marketplace_evaluator-0.1.2.tar.gz
- Upload date:
- Size: 94.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27b1b5f6bb4755f474d08bc7e752e0ed620b1e24479e6741a5a293dbd544a916
|
|
| MD5 |
6d37b5f35e1c98acee585fbebfe91f0e
|
|
| BLAKE2b-256 |
81ab376e9ba18a35c0d9b5a43929a73fc2972e25e57b592f62757ca8a959189b
|
Provenance
The following attestation bundles were made for claude_marketplace_evaluator-0.1.2.tar.gz:
Publisher:
release.yml on IceRhymers/claude-marketplace-evaluator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claude_marketplace_evaluator-0.1.2.tar.gz -
Subject digest:
27b1b5f6bb4755f474d08bc7e752e0ed620b1e24479e6741a5a293dbd544a916 - Sigstore transparency entry: 1331569349
- Sigstore integration time:
-
Permalink:
IceRhymers/claude-marketplace-evaluator@b0f52cc926fba078925134defc65676002fcb742 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/IceRhymers
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b0f52cc926fba078925134defc65676002fcb742 -
Trigger Event:
push
-
Statement type:
File details
Details for the file claude_marketplace_evaluator-0.1.2-py3-none-any.whl.
File metadata
- Download URL: claude_marketplace_evaluator-0.1.2-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f6c9ca0410a138f1abcdae774c7ada3ea67a1f08058702d5aba324ac4a1e379
|
|
| MD5 |
d847222384a40b3babc9c15c383bb2c0
|
|
| BLAKE2b-256 |
e3067dd749f4f4bd7242c2a47faa9c96156f648d5d9920602442161ef1408344
|
Provenance
The following attestation bundles were made for claude_marketplace_evaluator-0.1.2-py3-none-any.whl:
Publisher:
release.yml on IceRhymers/claude-marketplace-evaluator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
claude_marketplace_evaluator-0.1.2-py3-none-any.whl -
Subject digest:
5f6c9ca0410a138f1abcdae774c7ada3ea67a1f08058702d5aba324ac4a1e379 - Sigstore transparency entry: 1331569519
- Sigstore integration time:
-
Permalink:
IceRhymers/claude-marketplace-evaluator@b0f52cc926fba078925134defc65676002fcb742 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/IceRhymers
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b0f52cc926fba078925134defc65676002fcb742 -
Trigger Event:
push
-
Statement type: