CLI for Claude Code marketplace health — routing evals, coverage checks, and semantic collision detection

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

IceRhymers

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language

Project description

claude-marketplace-evaluator

cme is a CLI for Claude Code marketplace health. It validates that skills route correctly and detects semantic collisions between skills — without running expensive LLM-based eval suites.

Marketplace health is not about optimizing skill descriptions. It is about catching structural problems early: missing evals, broken routing, and overlapping skills that confuse Claude's router. cme runs fast, fits in CI, and fails loud.

Installation

Zero-install with uvx (recommended for CI):

uvx --from claude-marketplace-evaluator cme --help

Or install globally:

pip install claude-marketplace-evaluator

Commands

`cme routing`

Three-step pipeline that generates routing tests, checks coverage, and runs evals:

Generate — reads evals/evals.json from each skill directory, produces routing test YAML
Coverage check — verifies every skill has an evals file (fails if below threshold)
Routing eval runner — sends each prompt through the Claude Agent SDK, checks Claude routes to the expected skill

cme routing --plugins-dir plugins/

Flag	Default	Description
`--plugins-dir`	`plugins/`	Path to the plugins directory
`--coverage-threshold`	`100`	Minimum eval coverage percentage. Fails if any skill lacks evals
`--threshold`	`95`	Minimum routing pass rate percentage. Set to `0` to skip the eval runner
`-j` / `--workers`	`4`	Parallel workers for the eval runner
`--timeout`	`30`	Per-test timeout in seconds
`--max-retries`	`1`	Max retries on rate limit errors (exponential backoff)

Exit codes: 0 = all checks pass, 1 = coverage or routing threshold not met.

`cme overlap`

Detects semantic collisions between skills across a marketplace. Two skills collide when their descriptions or trigger queries are similar enough to confuse Claude's routing. Uses an LLM to analyze all skill pairs and produces a JSON report with severity: high | medium | low collision pairs.

cme overlap --plugins-dir plugins/ --output overlap-report.json

Flag	Default	Description
`--plugins-dir`	`plugins/`	Path to the plugins directory
`--output`	`overlap-report.json`	Output path for the JSON collision report
`--model`	`claude-sonnet-4-5`	Model for analysis (overrides `ANTHROPIC_MODEL` env var)

Exit codes: 0 = no collisions, 1 = collisions detected.

The output report structure:

{
  "timestamp": "2026-04-17T00:00:00+00:00",
  "model_used": "claude-sonnet-4-5",
  "total_skills_analyzed": 6,
  "total_collisions": 1,
  "collisions": [
    {
      "skill_a": "plugins/my-plugin/skills/create-pr",
      "skill_b": "plugins/my-plugin/skills/submit-pr",
      "overlapping_triggers": ["open a pull request"],
      "description_excerpts": ["Both skills handle PR creation workflows"],
      "severity": "high"
    }
  ]
}

Plugin Layout

cme expects this directory structure:

plugins/
  <plugin-name>/
    skills/
      <skill-name>/
        SKILL.md
        evals/
          evals.json

Each evals.json is a JSON array of routing test entries:

[
  { "query": "Run the test suite for this project", "should_trigger": true },
  { "query": "Can you execute the unit tests?", "should_trigger": true },
  { "query": "Open a pull request for this branch", "should_trigger": false }
]

Field	Type	Description
`query`	string	A user prompt to test routing against
`should_trigger`	boolean	`true` = this prompt should route to this skill, `false` = it should not

Only should_trigger: true entries are used to generate routing test cases. Include should_trigger: false entries to document negative cases (used by overlap detection for trigger context).

Authentication

cme does not manage credentials. It passes through environment variables to the Claude Agent SDK. Configure one of these auth modes:

Claude subscription (OAuth)

For users with a Claude Pro/Team/Enterprise subscription:

claude setup-token              # generates the token
export CLAUDE_CODE_OAUTH_TOKEN="your-token"
cme routing --plugins-dir plugins/

Direct API key

For direct Anthropic API access:

export ANTHROPIC_API_KEY="sk-ant-..."
cme routing --plugins-dir plugins/

Databricks AI Gateway

For routing through Databricks AI Gateway, map your workspace secrets to the standard Anthropic SDK env vars:

export ANTHROPIC_AUTH_TOKEN="<DATABRICKS_SP_TOKEN>"           # service principal PAT
export ANTHROPIC_BASE_URL="<DATABRICKS_AI_GATEWAY_URL>"       # AI Gateway endpoint URL
export ANTHROPIC_MODEL="<DATABRICKS_AI_GATEWAY_MODEL>"        # endpoint model name
export ANTHROPIC_CUSTOM_HEADERS="x-databricks-use-coding-agent-mode: true"
export CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS="1"
export CLAUDE_CODE_ENABLE_FINE_GRAINED_TOOL_STREAMING=""
cme routing --plugins-dir plugins/

In GitHub Actions, these map directly from repository secrets:

env:
  ANTHROPIC_AUTH_TOKEN: ${{ secrets.DATABRICKS_SP_TOKEN }}
  ANTHROPIC_BASE_URL: ${{ secrets.DATABRICKS_AI_GATEWAY_URL }}
  ANTHROPIC_MODEL: ${{ secrets.DATABRICKS_AI_GATEWAY_MODEL }}
  ANTHROPIC_CUSTOM_HEADERS: "x-databricks-use-coding-agent-mode: true"
  CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
  CLAUDE_CODE_ENABLE_FINE_GRAINED_TOOL_STREAMING: ""

CI/CD Integration

GitHub Actions workflow

This is a production workflow from claude-marketplace-builder that runs both cme routing and cme overlap on every PR that touches plugin files:

name: CME Checks

on:
  pull_request:
    paths:
      - "plugins/**"
      - "evals/**"
  workflow_dispatch:

jobs:
  coverage:
    runs-on: ubuntu-latest
    environment: cicd
    timeout-minutes: 30
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v4
      - name: Check eval coverage and routing
        env:
          ANTHROPIC_AUTH_TOKEN: ${{ secrets.DATABRICKS_SP_TOKEN }}
          ANTHROPIC_BASE_URL: ${{ secrets.DATABRICKS_AI_GATEWAY_URL }}
          ANTHROPIC_MODEL: ${{ secrets.DATABRICKS_AI_GATEWAY_MODEL }}
          ANTHROPIC_CUSTOM_HEADERS: "x-databricks-use-coding-agent-mode: true"
          CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
        run: uvx --from claude-marketplace-evaluator cme routing --plugins-dir plugins/ --coverage-threshold 100 --threshold 95 --timeout 180

  overlap:
    runs-on: ubuntu-latest
    environment: cicd
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v4
      - name: Check skill overlap
        env:
          ANTHROPIC_AUTH_TOKEN: ${{ secrets.DATABRICKS_SP_TOKEN }}
          ANTHROPIC_BASE_URL: ${{ secrets.DATABRICKS_AI_GATEWAY_URL }}
          ANTHROPIC_MODEL: ${{ secrets.DATABRICKS_AI_GATEWAY_MODEL }}
          ANTHROPIC_CUSTOM_HEADERS: "x-databricks-use-coding-agent-mode: true"
          CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
        run: |
          set +e
          uvx --from claude-marketplace-evaluator cme overlap --plugins-dir plugins/ --output overlap-report.json
          EXIT_CODE=$?
          if [ -f overlap-report.json ]; then
            echo "## Overlap Report" >> "$GITHUB_STEP_SUMMARY"
            echo '```json' >> "$GITHUB_STEP_SUMMARY"
            cat overlap-report.json >> "$GITHUB_STEP_SUMMARY"
            echo '```' >> "$GITHUB_STEP_SUMMARY"
            cat overlap-report.json
          fi
          exit $EXIT_CODE

Posting overlap results as a PR comment

Extend the overlap job to post a formatted collision table as a PR comment using actions/github-script:

  overlap:
    runs-on: ubuntu-latest
    environment: cicd
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v4
      - name: Check skill overlap
        id: overlap
        env:
          ANTHROPIC_AUTH_TOKEN: ${{ secrets.DATABRICKS_SP_TOKEN }}
          ANTHROPIC_BASE_URL: ${{ secrets.DATABRICKS_AI_GATEWAY_URL }}
          ANTHROPIC_MODEL: ${{ secrets.DATABRICKS_AI_GATEWAY_MODEL }}
          ANTHROPIC_CUSTOM_HEADERS: "x-databricks-use-coding-agent-mode: true"
          CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
        run: |
          set +e
          uvx --from claude-marketplace-evaluator cme overlap --plugins-dir plugins/ --output overlap-report.json
          echo "exit_code=$?" >> "$GITHUB_OUTPUT"
          if [ -f overlap-report.json ]; then
            echo "## Overlap Report" >> "$GITHUB_STEP_SUMMARY"
            echo '```json' >> "$GITHUB_STEP_SUMMARY"
            cat overlap-report.json >> "$GITHUB_STEP_SUMMARY"
            echo '```' >> "$GITHUB_STEP_SUMMARY"
          fi

      - name: Comment on PR with overlap results
        if: always() && github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const path = 'overlap-report.json';
            if (!fs.existsSync(path)) return;

            const report = JSON.parse(fs.readFileSync(path, 'utf8'));
            const collisions = report.collisions || [];

            let body = '## Skill Overlap Report\n\n';
            body += `**Skills analyzed:** ${report.total_skills_analyzed}\n`;
            body += `**Collisions found:** ${report.total_collisions}\n\n`;

            if (collisions.length === 0) {
              body += '✅ No semantic collisions detected.\n';
            } else {
              body += '| Severity | Skill A | Skill B | Overlapping Triggers |\n';
              body += '|----------|---------|---------|---------------------|\n';
              for (const c of collisions) {
                const triggers = c.overlapping_triggers.join(', ');
                body += `| ${c.severity.toUpperCase()} | \`${c.skill_a}\` | \`${c.skill_b}\` | ${triggers} |\n`;
              }
              body += '\nResolve collisions before merging. Rename skills, narrow descriptions, or deduplicate functionality.\n';
            }

            // Delete previous cme comments to avoid spam
            const { data: comments } = await github.rest.issues.listComments({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
            });
            for (const comment of comments) {
              if (comment.body.startsWith('## Skill Overlap Report')) {
                await github.rest.issues.deleteComment({
                  owner: context.repo.owner,
                  repo: context.repo.repo,
                  comment_id: comment.id,
                });
              }
            }

            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body,
            });

      - name: Fail on collisions
        if: steps.overlap.outputs.exit_code != '0'
        run: exit 1

Local Usage

Run against a local plugins directory:

# Coverage check only (skip routing evals)
cme routing --plugins-dir ./plugins --threshold 0

# Full routing pipeline
cme routing --plugins-dir ./plugins --timeout 60

# Overlap detection
cme overlap --plugins-dir ./plugins

# Increase parallelism for large marketplaces
cme routing --plugins-dir ./plugins -j 8 --timeout 120

# Debug mode (verbose Agent SDK logging)
CME_DEBUG=1 cme routing --plugins-dir ./plugins

Two-Tier Eval Strategy

cme is designed as the fast, structural first tier of a two-tier evaluation approach:

Tier 1: cme (fast, free, structural)

Runs in seconds to minutes
Coverage checks require zero LLM calls
Routing evals use one short Agent SDK call per test case
Catches missing evals, broken routing, and skill collisions
Runs on every PR in CI

Tier 2: Full LLM eval runners (deep, expensive)

Runs multi-turn conversations testing skill behavior end-to-end
Validates output quality, not just routing correctness
Costs significantly more in tokens and time
Runs on release branches or nightly schedules

cme answers "did Claude pick the right skill?" — it does not answer "did the skill produce a good result?" Use tier 1 to gate PRs cheaply, then run tier 2 for deeper validation on release candidates.

Development

uv sync
pre-commit install
make check   # lint + format + typecheck
make test    # pytest

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

IceRhymers

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language

Release history Release notifications | RSS feed

0.3.0

Apr 21, 2026

0.2.1

Apr 17, 2026

0.2.0

Apr 17, 2026

This version

0.1.2

Apr 17, 2026

0.1.1

Apr 17, 2026

0.1.0

Apr 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

claude_marketplace_evaluator-0.1.2.tar.gz (94.4 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

claude_marketplace_evaluator-0.1.2-py3-none-any.whl (17.5 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file claude_marketplace_evaluator-0.1.2.tar.gz.

File metadata

Download URL: claude_marketplace_evaluator-0.1.2.tar.gz
Upload date: Apr 17, 2026
Size: 94.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claude_marketplace_evaluator-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`27b1b5f6bb4755f474d08bc7e752e0ed620b1e24479e6741a5a293dbd544a916`
MD5	`6d37b5f35e1c98acee585fbebfe91f0e`
BLAKE2b-256	`81ab376e9ba18a35c0d9b5a43929a73fc2972e25e57b592f62757ca8a959189b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_marketplace_evaluator-0.1.2.tar.gz:

Publisher: release.yml on IceRhymers/claude-marketplace-evaluator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: claude_marketplace_evaluator-0.1.2.tar.gz
- Subject digest: 27b1b5f6bb4755f474d08bc7e752e0ed620b1e24479e6741a5a293dbd544a916
- Sigstore transparency entry: 1331569349
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: IceRhymers/claude-marketplace-evaluator@b0f52cc926fba078925134defc65676002fcb742
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/IceRhymers
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b0f52cc926fba078925134defc65676002fcb742
- Trigger Event: push

File details

Details for the file claude_marketplace_evaluator-0.1.2-py3-none-any.whl.

File metadata

Download URL: claude_marketplace_evaluator-0.1.2-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 17.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for claude_marketplace_evaluator-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f6c9ca0410a138f1abcdae774c7ada3ea67a1f08058702d5aba324ac4a1e379`
MD5	`d847222384a40b3babc9c15c383bb2c0`
BLAKE2b-256	`e3067dd749f4f4bd7242c2a47faa9c96156f648d5d9920602442161ef1408344`

See more details on using hashes here.

Provenance

The following attestation bundles were made for claude_marketplace_evaluator-0.1.2-py3-none-any.whl:

Publisher: release.yml on IceRhymers/claude-marketplace-evaluator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: claude_marketplace_evaluator-0.1.2-py3-none-any.whl
- Subject digest: 5f6c9ca0410a138f1abcdae774c7ada3ea67a1f08058702d5aba324ac4a1e379
- Sigstore transparency entry: 1331569519
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: IceRhymers/claude-marketplace-evaluator@b0f52cc926fba078925134defc65676002fcb742
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/IceRhymers
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b0f52cc926fba078925134defc65676002fcb742
- Trigger Event: push

claude-marketplace-evaluator 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

claude-marketplace-evaluator

Installation

Commands

cme routing

cme overlap

Plugin Layout

Authentication

Claude subscription (OAuth)

Direct API key

Databricks AI Gateway

CI/CD Integration

GitHub Actions workflow

Posting overlap results as a PR comment

Local Usage

Two-Tier Eval Strategy

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`cme routing`

`cme overlap`