Public eval + leaderboard for the Morpheus AI inference network. Drives MRC 76 (agent benchmarking) with reproducible probe sets.

These details have not been verified by PyPI

Project links

Project description

hypnex-bench

Public eval + leaderboard for the Morpheus AI inference network. The off-chain implementation of MRC 76 (Agent Performance Benchmarking).

pip install hypnex-bench

What it does

Runs a small, reproducible probe set against every LLM on the Morpheus network, collects per-model pass-rates, latencies (p50/p95), and token counts, and renders a markdown leaderboard. Designed to run nightly so the data flywheel compounds.

Default suites (~19 probes total — a full run across all live LLMs is typically <$0.20 of MOR):

Suite	Probes	What it tests
`coding`	6	HumanEval-style — model writes a Python function, we exec it + assert
`math`	8	GSM8K-style word problems with deterministic numeric answers
`json`	5	Strict JSON adherence — does the model produce parseable, schema-matching JSON?

Quickstart

# 1. List available LLMs (no key needed, public registry)
hypnex-bench models

# 2. Run all suites against the default LLM set (key required, costs MOR)
HYPNEX_API_KEY=mor_xxx  hypnex-bench run

# 3. Render the leaderboard from data/latest.json
hypnex-bench leaderboard

Programmatic

from hypnex_bench import BenchRunner, all_suites, to_markdown

runner = BenchRunner(api_key="mor_...")
results = runner.run(["mistral-31-24b", "glm-5"], all_suites())
print(to_markdown(results))

CLI reference

hypnex-bench models                              # list active LLMs

hypnex-bench run [options]
    --models a,b,c           # comma-list (default: all live LLMs)
    --limit N                # only first N models (when --models omitted)
    --suite SUITE            # all | coding | math | json | a,b
    --output DIR             # output dir (default: ./data)
    --api-key KEY            # override HYPNEX_API_KEY
    --base-url URL           # override https://api.mor.org/api/v1

hypnex-bench leaderboard [options]
    --input DIR              # dir containing latest.json (default: ./data)
    --output FILE            # write to file (default: stdout)

Output

data/
  run-20260507T031502Z.jsonl      # one full run, append-only
  run-20260508T031455Z.jsonl
  ...
  latest.json                     # snapshot of the most recent run

latest.json is what the leaderboard renderer (and any future static-site generator) consumes.

Why not just use HumanEval / GSM8K / MMLU directly?

Those benchmarks have leaked into model training data. The probes here are small-set, slightly-rephrased variations chosen to be cheap (so a full run costs cents, not dollars), language-canonical (Python only for coding; ASCII + ASCII numbers for math), and verifiable without an LLM grader (deterministic evaluators that exec or regex). For canonical leaderboard claims, swap these probe sets for the official suites — the runner architecture stays the same.

Tests

pip install -e ".[dev]"
pytest                  # 17 pure-Python evaluator tests, no API key needed

Status & affiliation

Hypnex Labs draft of MRC 76. Not affiliated with the Morpheus AI Foundation. Suite definitions are MIT-licensed; submit PRs to add probes.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hypnex_bench-0.1.0.tar.gz (51.8 kB view details)

Uploaded May 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hypnex_bench-0.1.0-py3-none-any.whl (18.2 kB view details)

Uploaded May 9, 2026 Python 3

File details

Details for the file hypnex_bench-0.1.0.tar.gz.

File metadata

Download URL: hypnex_bench-0.1.0.tar.gz
Upload date: May 9, 2026
Size: 51.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hypnex_bench-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`f6caf0c386a5d2cefe465032c5a8e87e49078dceaa8015211e0927fe232ed323`
MD5	`e799e0ec7c4c58049951746b7c90148a`
BLAKE2b-256	`e55917814ee1fafe4cddf31ef0b0e2b23402bcd27e0f094e58b54241c049d550`

See more details on using hashes here.

File details

Details for the file hypnex_bench-0.1.0-py3-none-any.whl.

File metadata

Download URL: hypnex_bench-0.1.0-py3-none-any.whl
Upload date: May 9, 2026
Size: 18.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for hypnex_bench-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9afac1913fc8f63d135bcb434473477ff4bad2454b3f3bb5cff372abb777b529`
MD5	`4e745802cdc9ddbebb68ff3919e20e73`
BLAKE2b-256	`dc763b1d7f3a2106e5a9db036a408a5c31946b9f2a03b3377d7022d86adb535b`

See more details on using hashes here.

hypnex-bench 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

hypnex-bench

What it does

Quickstart

Programmatic

CLI reference

Output

Why not just use HumanEval / GSM8K / MMLU directly?

Tests

Status & affiliation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes