Skip to main content

CLI tool for managing MLPerf endpoint submissions

Project description

MLCommons Endpoints Submission Tools

A Python package with two tools for managing MLPerf Endpoints benchmark submissions:

  • endpoints-submission-cli — registers benchmark runs, assembles submission packages, runs compliance checks, and opens GitHub pull requests via the PRISM API.
  • submission-checker — validates a submission folder against the §9.1 automated compliance rules before or after upload.

Installation

With pip:

pip install endpoints-submission-cli

From source (editable):

pip install -e ".[dev]"

With uv:

uv sync --extra dev

endpoints-submission-cli

Requirements

  • Python 3.10 or later
  • gh CLI — required for creating, updating, and withdrawing submissions

Authentication

Every command requires a PRISM API token in mlc_… format. Supply it as an env var or pass --token per command:

# Persistent (add to shell profile)
export PRISM_USER_API_TOKEN=mlc_your_token_here

# Per-command override
endpoints-submission-cli runs list --token mlc_your_token_here

Submission commands that create or update GitHub pull requests also require the gh CLI:

gh auth login

Configuration

Environment variable Default Description
PRISM_USER_API_TOKEN API key. Required unless --token is passed.
MLPERF_SUBMISSION_REPO MLCommons-Systems/test-endpoints-submission-repo Target GitHub repository for submission PRs (owner/repo).

Add to your shell profile for a persistent setup:

export PRISM_USER_API_TOKEN=mlc_your_token_here
export MLPERF_SUBMISSION_REPO=MLCommons-Systems/endpoints-submission-repo

Quick start

# 1. Verify connectivity
endpoints-submission-cli runs list

# 2. Register a benchmark run from a local result folder
endpoints-submission-cli runs create --path /results/llama3_h100_c4
# → Run created: d5d9873e-5eca-4f8d-a487-4be1cb8b440c
RUN_ID=d5d9873e-5eca-4f8d-a487-4be1cb8b440c

# 3. Create a submission (assembles, checks, uploads, opens PR)
endpoints-submission-cli submissions create \
  --division standardized \
  --availability available \
  --run-ids $RUN_ID
# → Submission created: a1b2c3d4-…
# → PR: https://github.com/MLCommons-Systems/…/pull/42
SUB_ID=a1b2c3d4-e5f6-7890-abcd-ef1234567890

# 4. Add another run later
endpoints-submission-cli submissions add-run \
  --submission-id $SUB_ID \
  --run-id <new-run-id>

# 5. Withdraw if needed
endpoints-submission-cli submissions withdraw --submission-id $SUB_ID

Command reference

endpoints-submission-cli
├── runs
│   ├── list        List all runs
│   ├── create      Register a run from a local folder
│   ├── get         Fetch run details
│   ├── delete      Delete a run and its archive
│   ├── pin         Pin a run (prevent expiry)
│   └── unpin       Restore normal expiry
└── submissions
    ├── list        List all submissions
    ├── create      Create a submission from runs (full pipeline)
    ├── get         Fetch submission details
    ├── update      Update run list or metadata
    ├── withdraw    Withdraw a submission
    ├── add-run     Add a run to an existing submission
    └── remove-run  Remove a run from a submission

Use --help on any command for full flag details:

endpoints-submission-cli submissions create --help

submission-checker

CLI tool for validating MLPerf Endpoints submissions against the §9.1 automated compliance checks.

Usage

Check a submission

submission-checker check /path/to/submission

The tool expects the submission root to contain systems/ and pareto/ subdirectories as specified in §8.1.

Options:

Flag Description
--strict Treat warnings as errors (exit 1 on any warning)
--quiet / -q Suppress INFO-level passing checks
--output FILE / -o FILE Write full results as JSON to FILE

Exit codes: 0 = all checks passed, 1 = one or more errors (or warnings with --strict).

Show region boundaries

submission-checker regions --max-concurrency 1024

Prints the concurrency ranges for each region given a declared Maximum Supported Concurrency M (§5.5).

Required Files in submission structure

<org>/
├── systems/
│   └── <system_desc_id>.json         # §8.2 — hardware + software description
└── pareto/
    └── <system_desc_id>/
        └── <benchmark_model>/
            ├── points/
            │   └── point_<N>.yaml    # §8.3 — one config per measurement point
            ├── results/
            │   └── point_<N>/
            │       ├── mlperf_endpoints_log_summary.json
            │       └── mlperf_endpoints_log_detail.json
            └── accuracy/
                ├── accuracy.txt
                └── accuracy_result.json

What gets checked

Rule Spec Description
path-exists §1 Submission root directory exists
required-dir §1 systems/ and pareto/ present
system-description-present §1 At least one *.json file found in systems/
system-description-valid §1 systems/*.json parses against schema
src-dir §1 src/ present for Standardized submissions
pareto-dir-exists §1 pareto/<system_id>/ directory exists
benchmark-model-dir §1 At least one benchmark-model directory in pareto/<system_id>/
pareto-subdir §1 points/, results/, accuracy/ present
measurement-points-present §1 At least one point_*.yaml found
point-config-valid §1 YAML parses against PointConfig schema
point-filename-concurrency §1 Filename concurrency matches declared value
result-file-present §1 Result summary log exists for each point config
result-detail-present §1 Result detail log exists for each point config
result-file-valid §1 Result summary log parses against PointSummary schema
point-count §2, §8 7–32 measurement points
point-cap §2, §8 Point count does not exceed 32
low-latency-coverage §3 At least one point in Low Latency region
low-throughput-coverage §4 At least one point in Low Throughput region
med-throughput-coverage §5 At least one point in Medium Throughput region
high-throughput-coverage §6 At least one point in High Throughput region
max-concurrency-declared §7 max_supported_concurrency field present
region-computation §7 M > 32 (required for region formula)
concurrency-in-range §9 Concurrency within region bounds (incl. 10% margin)
load-pattern §10 load_pattern is concurrency with a positive concurrency level
point-duration §11 Point meets per-region minimum duration
min-query-count §12 n_samples_completed meets dataset-specific minimum (§6.4)
streaming-config §13 stream_all_chunks is True
metric-consistency-duration §14 duration_ns > 0
metric-consistency-accounting §14 completed + failed == issued
metric-consistency-output-tokens §14 total_output_tokens ≥ 0
metric-consistency-system-tps §9.1 Stored system_tps consistent with derived value
metric-consistency-tps-per-user §9.1 Stored tps_per_user consistent with system_tps / concurrency
accuracy-file §15 accuracy.txt and accuracy_result.json present
accuracy-valid §15 accuracy_result.json parses correctly
accuracy-consistency §15 passed flag consistent with score >= quality_target
accuracy-gate §15 Score ≥ quality target
config-consistency-dataset §16 All points use the same dataset
config-consistency-model §16 Directory name matches benchmark_model
region-declared §8.3 Declared region field (if present) is valid and matches computed region

Programmatic API

from submission_checker import SubmissionChecker, Report

checker = SubmissionChecker(Path("/submissions/acme_corp"))
report = checker.run()

if report.passed:
    print("All checks passed")
else:
    for result in report.errors:
        print(f"[{result.rule}] {result.message}")

The Report object also exposes report.warnings and serialises cleanly via report.model_dump_json().


Development

uv run pytest                          # run all tests
uv run pytest --no-cov -x             # fast fail on first error
uv run ruff check src/ tests/          # lint
uv run ruff format src/ tests/         # auto-format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

endpoints_submission_cli-0.1.1.9.tar.gz (216.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

endpoints_submission_cli-0.1.1.9-py3-none-any.whl (79.6 kB view details)

Uploaded Python 3

File details

Details for the file endpoints_submission_cli-0.1.1.9.tar.gz.

File metadata

File hashes

Hashes for endpoints_submission_cli-0.1.1.9.tar.gz
Algorithm Hash digest
SHA256 a3538ea72a125aff2fef3bfaa65481809384f1847f81ac8a07b15b90584a4239
MD5 d0a3ea9643e4e1ab62e12e9775fa7a3c
BLAKE2b-256 9e7ded542e73f76df5b4cbc582002badf2028e28ceaee5105239aa036244ceef

See more details on using hashes here.

Provenance

The following attestation bundles were made for endpoints_submission_cli-0.1.1.9.tar.gz:

Publisher: publish.yml on mlcommons/endpoints-submission-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file endpoints_submission_cli-0.1.1.9-py3-none-any.whl.

File metadata

File hashes

Hashes for endpoints_submission_cli-0.1.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 8b3ec7b18780db480691275350fb3c7c33a9890c6cf517bfc753c8f3fa7e2c92
MD5 ea01d7190ef085cd95779cff8419a5ad
BLAKE2b-256 646897473924ea697788cc5fbcfb468fc7ff0376712b8a4d5f7d6e354a1fc9d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for endpoints_submission_cli-0.1.1.9-py3-none-any.whl:

Publisher: publish.yml on mlcommons/endpoints-submission-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page