separates precious LLMs from base LLMs. works with any OpenAI/Anthropic compatible API

These details have not been verified by PyPI

Project links

Project description

cupel

separates precious LLMs from base LLMs

score local and cloud LLMs with custom prompts and a configurable judge

PyPI - Version Python - Version LICENSE

cupel leaderboard — models ranked by score

a `cupel` is the small dish used in a fire assay to separate precious metal from base metal

install

curl -fsSL https://cupel.run/install | bash

pip install cupel

the UI is bundled in the package

quick start

cupel

opens a browser at localhost:8042

ships with example data (8 models scored by Claude Opus 4.6 on 8 prompts) — the dashboard is populated on first launch

LLM-assisted authoring — describe what you want to test, an LLM drafts the prompt and 0–3 rubric
local + cloud — oMLX, Ollama, LM Studio, SGLang, OpenRouter, Anthropic, OpenAI
configurable judge — any model can score responses on a 0–3 rubric with reasoning
thinking model support — separates <think> blocks from answers, only judges the response
multi-turn + tool calling — multi-step conversations with injected tool results
speed tracking — tok/s and response times per model
auto-discovery — probes known ports for local inference servers

leaderboard

leaderboard with score vs. speed, overall accuracy, and per-category breakdowns

cupel dashboard — score vs speed scatter plot with leaderboard overall accuracy — horizontal bar chart of all models

category fingerprint — radar chart comparing models across categories

score models

select models from discovered providers, filter by prompt category, choose a judge model, and start the run

progress updates via SSE as each prompt completes

author prompts

describe what to test, select a category and difficulty — an LLM generates the title, prompt text, and 0–3 rubric. edit before saving

authoring a prompt authored prompt with rubric

results

each run is saved as JSON with the model, judge, timestamp, and per-prompt scores with judge reasoning
results can be sorted, tagged, muted and expanded to inspect individual evaluations

judge

set a default judge in UI settings or in config.yml:

judge:
  model: claude-opus-4-6

scores are 0–3:

score	meaning
3	correct and insightful
2	correct but shallow
1	partially correct
0	wrong or hallucinated

prompt format

{
  "id": 14,
  "category": "math_estimation",
  "title": "Model Memory from Quantization",
  "prompt": "A model has 70B parameters. Estimate memory for FP16, 8-bit, and 4-bit.",
  "rubric": {
    "3": "FP16: ~140GB, 8-bit: ~70GB, 4-bit: ~35GB. Shows the math.",
    "2": "Correct for 2 of 3, or all correct but no explanation.",
    "1": "Gets the direction right but wrong numbers.",
    "0": "Wrong math or doesn't understand quantization."
  }
}

multi-turn prompts

for tool calling and conversations, use turns instead of prompt:

{
  "id": 21,
  "title": "Tool Calling — School Status Check",
  "turns": [
    {
      "messages": [
        {"role": "system", "content": "You have tools: get_grades(name), ..."},
        {"role": "user", "content": "How are both kids doing?"}
      ]
    },
    {
      "inject_after": [
        {"role": "user", "content": "Tool results: get_grades(\"phoebe\") => ..."}
      ]
    }
  ],
  "rubric": { "3": "Emits correct tool calls, synthesizes results...", "..." : "..." }
}

thinking models

cupel handles <think> blocks automatically — separates thinking from the answer, only judges the response:

thinking: null   # model default (recommended)
thinking: 0      # disable
thinking: 4096   # explicit budget

providers

cloud providers can be added from presets (Anthropic, OpenRouter, OpenAI) or as custom endpoints. the settings page fetches model lists from a provider's API (includes per-token pricing for OpenRouter), validates API keys, and tests connections

cupel auto-discovers local servers on known ports:

port	server
8000	oMLX / vLLM
11434	Ollama
1234	LM Studio
30000	SGLang
8080	llama.cpp

API keys

each provider gets its own env var. put them in .env or ~/.cupel/.env:

OMLX_API_KEY=4242
ANTHROPIC_API_KEY=sk-ant-...
OPENROUTER_API_KEY=sk-or-...
OPENAI_API_KEY=sk-proj-...

or configure in config.yml:

providers:
  - name: openrouter
    api_url: https://openrouter.ai/api/v1/chat/completions
    api_key_env: OPENROUTER_API_KEY
    models: [google/gemini-2.5-pro, deepseek/deepseek-r1]

  - name: anthropic
    api_url: https://api.anthropic.com/v1/messages
    api_key_env: ANTHROPIC_API_KEY
    models: [claude-opus-4-6, claude-sonnet-4-6]

CLI

cupel                                  # open dashboard
cupel run                              # collect responses
cupel run --models "Qwen3.5-27B-8bit"  # specific model
cupel run --prompts 18-22              # specific prompts
cupel judge eval-results/*.json        # score with judge
cupel judge eval-results/*.json --judge-model gemma-4-26b-a4b-it-4bit
cupel init                             # create config.yml + eval-set

development

git clone https://github.com/tolitius/cupel.git && cd cupel
pip install -e .
uvicorn cupel.server:app --reload --port 8042

vanilla JS frontend (Preact + HTM from CDN). no build step.

license

Distributed under the Apache 2.0 License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.82

May 16, 2026

0.1.81

May 16, 2026

This version

0.1.80

May 16, 2026

0.1.79

May 16, 2026

0.1.77

May 15, 2026

0.1.76

May 15, 2026

0.1.75

May 15, 2026

0.1.74

May 15, 2026

0.1.73

May 14, 2026

0.1.72

May 14, 2026

0.1.71

May 14, 2026

0.1.70

May 13, 2026

0.1.69

Apr 24, 2026

0.1.68

Apr 12, 2026

0.1.67

Apr 12, 2026

0.1.66

Apr 10, 2026

0.1.65

Apr 9, 2026

0.1.64

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cupel-0.1.80.tar.gz (5.3 MB view details)

Uploaded May 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cupel-0.1.80-py3-none-any.whl (222.3 kB view details)

Uploaded May 16, 2026 Python 3

File details

Details for the file cupel-0.1.80.tar.gz.

File metadata

Download URL: cupel-0.1.80.tar.gz
Upload date: May 16, 2026
Size: 5.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for cupel-0.1.80.tar.gz
Algorithm	Hash digest
SHA256	`cbe81b8b7f1c310db3f6dc3b823195884a9fdb130e4ace6c09a483ed8c75ae3c`
MD5	`fe60b03b69afe463f7909f07e62f95c4`
BLAKE2b-256	`4f833d215bff40431565e32194c9e88f077ee2be78e489686ab1ee6a88c6d6db`

See more details on using hashes here.

File details

Details for the file cupel-0.1.80-py3-none-any.whl.

File metadata

Download URL: cupel-0.1.80-py3-none-any.whl
Upload date: May 16, 2026
Size: 222.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for cupel-0.1.80-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f4ead0defe5f20427acfd0d8deba802aeecffe3e600e7d86cf91678f220fcf8e`
MD5	`dbcf0908704cf9a736f4e41afbd3bf92`
BLAKE2b-256	`16ca2db874476e1e9dbb017654760670a890dd68779bc0417281905d350df151`

See more details on using hashes here.

cupel 0.1.80

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cupel

a cupel is the small dish used in a fire assay to separate precious metal from base metal

install

quick start

leaderboard

score models

author prompts

results

judge

prompt format

multi-turn prompts

thinking models

providers

API keys

CLI

development

license

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

a `cupel` is the small dish used in a fire assay to separate precious metal from base metal