Skip to main content

MCP server wrapping the Artificial Analysis API for LLM and multimodal model data queries

Project description

aa-mcp

MCP server wrapping the Artificial Analysis public API. Enables AI agents to query LLM and multimodal model benchmarks, pricing, speed data, and track model updates via structured diffs.

The PyPI package is aa-mcp; it installs both aa-mcp and aa-mcp-server console commands.

Requirements

  • Python 3.10+
  • uv (for installation and running)
  • An Artificial Analysis API key (get one free)

Installation & Running

Run from PyPI with uvx

After the package is published:

export ARTIFICIAL_ANALYSIS_API_KEY="aa_your_key_here"
uvx aa-mcp

Run directly from a local checkout with uvx

# Set your API key
export ARTIFICIAL_ANALYSIS_API_KEY="aa_your_key_here"

# Run the MCP server from a local path (stdio transport)
uvx --from /path/to/aa-mcp-server aa-mcp-server

Run from source (development)

cd aa-mcp-server
uv sync
uv run aa-mcp-server

Run with uvx from a local directory

uvx --from ./aa-mcp-server aa-mcp-server

Environment Variables

Variable Required Default Description
ARTIFICIAL_ANALYSIS_API_KEY Yes - Your AA API key
AA_MCP_SNAPSHOT_DIR No ~/.local/share/aa-mcp/snapshots/ Directory for update snapshots
AA_MCP_LOG_LEVEL No INFO Log level (DEBUG, INFO, WARNING, ERROR)

Official API Coverage

This server wraps the current free Artificial Analysis API endpoints documented at https://artificialanalysis.ai/api-reference:

Artificial Analysis endpoint MCP tool
GET /api/v2/data/llms/models aa_list_llms, aa_get_model, aa_compare_models, aa_list_recent_updates, aa_healthcheck
GET /api/v2/data/media/text-to-image aa_list_media_models(modality="text-to-image")
GET /api/v2/data/media/image-editing aa_list_media_models(modality="image-editing")
GET /api/v2/data/media/text-to-speech aa_list_media_models(modality="text-to-speech")
GET /api/v2/data/media/text-to-video aa_list_media_models(modality="text-to-video")
GET /api/v2/data/media/image-to-video aa_list_media_models(modality="image-to-video")
POST /api/v2/critpt/evaluate aa_evaluate_critpt

MCP Tools

aa_list_llms

List LLM models with filtering and sorting.

  • Filters: creator, name, slug (substring match)
  • Sort by: intelligence (default), price, speed, ttft, coding, math
  • limit: Max results (default 20)

aa_get_model

Get full details for a single model by id, slug, or name.

  • Returns candidates if multiple matches found
  • Supports partial/fuzzy matching

aa_compare_models

Side-by-side comparison of 2+ models.

  • Compares: intelligence, coding, math, pricing, speed, latency
  • Returns rankings across all metrics
  • Input: list of identifiers (ids, slugs, or names)

aa_list_recent_updates

Detect changes since the last local snapshot.

  • New models: present in current data but not in snapshot
  • Removed models: present in snapshot but gone from current data
  • Changed models: field-level diffs for pricing, speed, intelligence scores, etc.
  • First run creates a baseline snapshot
  • Float changes below 0.01 threshold are ignored (noise filtering)

aa_list_media_models

Query multimodal / media model rankings.

  • Modalities: text-to-image, image-editing, text-to-speech, text-to-video, image-to-video
  • top_n: Limit results (default 10)
  • include_categories: Per-category Elo breakdown where the upstream endpoint supports it

aa_evaluate_critpt

Submit a complete CritPt benchmark batch to the official evaluation endpoint.

  • Requires submissions for the full public CritPt problem set
  • Validates required fields before sending: problem_id, generated_code, model, generation_config
  • Optional batch_metadata object is passed through to Artificial Analysis
  • The upstream endpoint is rate-limited separately and may take substantial time to complete

aa_healthcheck

Verify API key and upstream connectivity.

  • Returns masked key preview, model count, rate limit info
  • Reports specific error types (auth, rate limit, server error)

Snapshot / Update Tracking

The aa_list_recent_updates tool uses a local JSON snapshot mechanism:

  1. First call: Fetches all LLM models, saves a normalized snapshot to disk, reports "baseline created"
  2. Subsequent calls: Fetches fresh data, diffs against the latest snapshot, reports changes
  3. Snapshot location: ~/.local/share/aa-mcp/snapshots/llm_models_YYYYMMDDTHHMMSSZ.json
  4. Noise filtering: Float fields use a 0.01 threshold to avoid reporting insignificant fluctuations
  5. Tracked fields: name, slug, creator, all evaluation scores, all pricing fields, speed/latency

opencode Integration

Add to your opencode.json:

{
  "mcp": {
    "servers": {
      "artificial-analysis": {
        "command": "uvx",
        "args": ["--from", "/path/to/aa-mcp-server", "aa-mcp-server"],
        "env": {
          "ARTIFICIAL_ANALYSIS_API_KEY": "aa_your_key_here"
        }
      }
    }
  }
}

For PyPI and local-checkout MCP client examples, see docs/mcp-client-config.md.

Example Usage (via MCP client)

# List top 5 most intelligent LLMs
aa_list_llms(sort_by="intelligence", limit=5)

# Get details on Claude 3.5 Sonnet
aa_get_model("claude-3-5-sonnet")

# Compare GPT-4o vs Claude 3.5 Sonnet vs Gemini 1.5 Pro
aa_compare_models(["gpt-4o", "claude-3-5-sonnet", "gemini-1.5-pro"])

# Check for recent model changes
aa_list_recent_updates()

# Top 5 text-to-image models
aa_list_media_models(modality="text-to-image", top_n=5)

# Submit CritPt benchmark results
aa_evaluate_critpt(
  submissions=[
    {
      "problem_id": "Challenge_1_main",
      "generated_code": "def solution(): return 42",
      "model": "example-model",
      "generation_config": {"temperature": 0}
    }
  ],
  batch_metadata={"run_id": "local-test"}
)

# Verify API connectivity
aa_healthcheck()

Development Checks

uv sync --dev
uv run pytest
uv run ruff check .
uv build
uv run twine check dist/*

Known Limitations

  • Free API tier: 1000 requests/day rate limit
  • No explicit "updated_at" field: Update detection relies on snapshot diffs, not API metadata
  • LLM data only for snapshots: Media model snapshot tracking is not yet implemented
  • CritPt completeness: The upstream evaluation API requires submissions for the full public problem set; this server validates object shape but cannot verify set completeness locally
  • No pagination: The free API returns all models in a single response; no cursor/offset support
  • Snapshot storage: Local filesystem only; no cloud sync

Attribution

Data from Artificial Analysis. Attribution required per their terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aa_mcp-0.1.0.tar.gz (86.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aa_mcp-0.1.0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file aa_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: aa_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 86.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for aa_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f556560bd87bfb9fc1033c7827bb57296535c17bae8f4613fe1e10947043b594
MD5 b34d3746b2b68a819c3c6164040aeffd
BLAKE2b-256 45cb3e682f39ccd6a1a055b312bf6321944b805eaea0df09f69272e4f65b6a9c

See more details on using hashes here.

File details

Details for the file aa_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: aa_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for aa_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5797534891d145f5c6fb9957290260e0bba8553c1108671a8df03bb871f14a8e
MD5 16264e95bbf9e67548f4ffb926ea9b54
BLAKE2b-256 8356c96329bfbe5a4bf212755554bde133a18bcf916753fff42ec79f8718974b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page