Find the cheapest LLM that gets your task 100% right

These details have not been verified by PyPI

Project links

Project description

SkillEval

Find the cheapest LLM that gets your task 100% right.

SkillEval is a CLI tool that automates LLM evaluation for deterministic tasks. It runs your task across multiple models in parallel, compares outputs against expected results, and recommends the most cost-effective option.

Quick Start

pip install -e .

# Set at least one provider API key
export DASHSCOPE_API_KEY="sk-..."   # Qwen (Alibaba DashScope)

# Create a task folder
skilleval init my-task

# Add your input files, expected output, and skill prompt
# Then run the evaluation
skilleval run my-task/

# Machine-readable output
skilleval run my-task/ --json | jq '.recommendation'

Supported Providers

Provider	Platform	Env Variable
Qwen	Alibaba Cloud / DashScope	`DASHSCOPE_API_KEY`
GLM	Zhipu AI / BigModel	`ZHIPU_API_KEY`
MiniMax	MiniMax	`MINIMAX_API_KEY`

Evaluation Modes

Mode 1 (run) — You write the prompt (skill), SkillEval tests it across models.
Mode 2 (matrix) — One model writes the prompt, another executes it. Tests all creator x executor combinations.
Mode 3 (chain) — A meta-skill guides prompt creation, then another model executes it. Full pipeline evaluation.

Additional Features

Ad-hoc endpoints — Use any OpenAI-compatible API without editing the catalog: --endpoint, --api-key, --model-name.
Skill linting (lint) — Validate Claude Code skill structure (frontmatter, phases, references, code blocks).
Skill testing (skill-test) — Test a skill's core prompt logic against expected outputs.
Run comparison (compare) — Diff two runs to detect improvements or regressions.
HTML reports (report --html) — Generate self-contained HTML reports for sharing.
JSON output (--json) — Machine-readable JSON on run, matrix, chain, catalog, and report commands for piping into other tools.
Verbose logging (-v / -vv) — -v for INFO, -vv for DEBUG. Logs go to stderr so they don't interfere with --json output.
Auto-confirm (--yes / -y) — Skip the confirmation prompt on chain (replaces the old --confirm flag).
Config validation — Warns on unknown keys in config.yaml and validates comparator names at load time.
Circuit breaker — Automatically skips a provider after 5 consecutive failures, avoiding wasted time and cost.
Ctrl+C handling — Saves partial results on interrupt so you never lose a half-finished run.
Friendly errors — No raw tracebacks by default; use -vv to see full stack traces when debugging.
Progress bar — Now shows elapsed time and ETA alongside the completion percentage.

Documentation

See the User Manual (中文) for detailed setup instructions, configuration options, comparator reference, and walkthroughs.

Development

pip install -e ".[dev,docs]"
pytest
ruff check src/ tests/

See CONTRIBUTING.md (中文) for full contributor guidelines.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skilleval-0.1.0.tar.gz (109.0 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

skilleval-0.1.0-py3-none-any.whl (53.1 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file skilleval-0.1.0.tar.gz.

File metadata

Download URL: skilleval-0.1.0.tar.gz
Upload date: Mar 3, 2026
Size: 109.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for skilleval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ba48757b19300bb1da97b21845e1eb409aada23f528015ed53ae92765f428ca2`
MD5	`00eb39b64b43aac573205bd6091c7a92`
BLAKE2b-256	`0e4b23f530483b5ec9898f4db32ddfe894d61d37ec6d5d0bb519bb2e75ed117b`

See more details on using hashes here.

File details

Details for the file skilleval-0.1.0-py3-none-any.whl.

File metadata

Download URL: skilleval-0.1.0-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 53.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for skilleval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d7188d498e8ef4190554ac062c0448f0b0585110aaca5a20742cf2260b0b20b`
MD5	`01a21de3e63b62b5af02a249fae61f5f`
BLAKE2b-256	`2cce0fc7f373760b319a0707e208de8326e799e2348750b6cd3a7954ec698dfe`

See more details on using hashes here.

skilleval 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SkillEval

Quick Start

Supported Providers

Evaluation Modes

Additional Features

Documentation

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes