Skip to main content

Opinionated AI model benchmark aggregator — install via cargo: cargo install pondus

Project description

pondus

Opinionated AI model benchmark aggregator.

crates.io License: MIT

What it does

Aggregates AI model benchmark data from 8 trusted sources into a unified JSON schema. Designed for AI agents (Claude Code, etc.) to consume programmatically. Caches results for 24h to avoid rate limiting.

Sources

Source Type Data
Artificial Analysis agent-browser scrape Intelligence index, speed, pricing
LM Arena (LMSYS) Community JSON ELO ratings from human preferences
SWE-bench GitHub JSON Code generation resolve rates
SWE-rebench agent-browser scrape Code generation resolve rates (rebench variant)
Aider GitHub YAML Polyglot coding benchmark pass rates
LiveBench HuggingFace API Multi-domain benchmark scores
Terminal-Bench HuggingFace YAML Terminal/CLI task completion
SEAL agent-browser scrape Scale AI multi-benchmark evaluations

Note: Sources marked "agent-browser scrape" require the agent-browser CLI. All other sources work out of the box. LiveBench data depends on the upstream HuggingFace dataset which may lag behind other sources.

Installation

cargo install pondus

Usage

pondus rank                     # rank all models (default command)
pondus                          # same as `pondus rank`
pondus rank --top 10            # top 10 only
pondus check claude-opus-4.6    # check one model across all sources
pondus compare gpt-5.2 claude-opus-4.6  # head-to-head comparison
pondus sources                  # show source status
pondus refresh                  # clear cache and re-fetch

Global Flags

Flag Description
`--format json table
--refresh Bypass cache for this run

Configuration

Config location: ~/.config/pondus/config.toml

[cache]
ttl_hours = 24

[alias]
path = "models.toml"  # relative to config dir, or absolute path

[sources.artificial_analysis]
api_key = "your-key"  # optional, for AA source

[sources.agent_browser]
path = "agent-browser"  # path to agent-browser CLI

Model Aliases

Different benchmarks use different naming conventions. models.toml maps canonical model names to source-specific variants:

[claude-opus-4_6]
canonical = "claude-opus-4.6"
aliases = [
  "Claude Opus 4.6",
  "claude-opus-4-6",
  "anthropic/claude-opus-4.6",
  "Opus 4.6",
]

When you run pondus check opus-4.6, pondus resolves the alias to the canonical name and matches across all sources. Prefix matching also works automatically — gemini-2.5-pro-preview-06-05 matches gemini-2.5-pro since the suffix starts with -. PRs welcome to add new models.

Output Format

Default JSON output:

{
  "timestamp": "2026-02-27T10:30:00Z",
  "query": { "query_type": "rank" },
  "sources": [
    {
      "source": "arena",
      "status": "ok",
      "scores": [
        { "model": "gpt-5.2", "rank": 1, "metrics": { "elo": 1350 } }
      ]
    }
  ]
}

Contributing

  • Add a model: Add an entry to models.toml with canonical name and known aliases
  • Add a source: Implement the Source trait in src/sources/

PRs welcome.

License

MIT

Sister Tools

Part of a family of AI-augmented CLI tools:

  • lustro — AI news aggregator
  • consilium — Multi-model deliberation
  • hexis — Meta-cognitive framework

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pondus-0.6.1.tar.gz (49.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pondus-0.6.1-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file pondus-0.6.1.tar.gz.

File metadata

  • Download URL: pondus-0.6.1.tar.gz
  • Upload date:
  • Size: 49.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pondus-0.6.1.tar.gz
Algorithm Hash digest
SHA256 8d97779f813c72612bf2e4b94901ca7a7937a28d41aac11a0f52835e7a141dc1
MD5 1765dfff19c74d192e0f3111dc8dc197
BLAKE2b-256 8acdd25d1d4b15502f660d206ecfa0b6833e3cf36f9ad3d388e43704440d2a87

See more details on using hashes here.

File details

Details for the file pondus-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: pondus-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pondus-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb848720de5c3b3d128127af18bbaf4d2df5ff4b86463aa1a2b7306635dc8c39
MD5 892e46f11700dcf31c2087b14c2b7ec8
BLAKE2b-256 4c4e239cac71aa3b8e934c32da705653bdc58718f3191e9a1645c1b52248c840

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page