Skip to main content

Python SDK for ModelScout - LLM Benchmarking and Evaluation

Project description

ModelScout Python SDK

Find the best LLM for your product. Run benchmarks across multiple models on your own data to see which performs best for quality, cost, and latency.

Installation

pip install modelscout-sdk

Quick Start

from modelscout import Benchmark

# Set MODELSCOUT_API_KEY in your environment, or pass api_key="ms_..."
# Models are selected at checkout and locked to your purchase
results = Benchmark().run(
    purchased_benchmark_id="trial",  # free trial, or "pb_..." from dashboard
    prompts=["Write a SQL query to find active users", "Explain quantum computing"],
)

print(results.best_model_for("quality"))  # Best quality model
print(results.best_model_for("cost"))     # Cheapest model

Features

Benchmarking

Compare LLMs side-by-side on your evaluation data. Get quality scores, cost analysis, latency metrics, and statistical significance.

Data Generation

Need synthetic test data? Generate evaluation datasets from the dashboard — describe your use case and get representative prompts in minutes.

Dataset Upload

Upload your own evaluation data:

dataset_id = benchmark.upload_dataset(
    name="My Test Data",
    samples=[
        {"input": "What is machine learning?"},
        {"input": "Explain neural networks"},
    ],
)

Agentic Evaluation

Test tool-calling capabilities with multi-turn evaluation (SDK-only):

from modelscout import Benchmark, AgenticConfig, ToolDefinition

def my_search_function(query: str) -> str:
    return f"Results for: {query}"

config = AgenticConfig(tools=[
    ToolDefinition(
        name="search",
        description="Search the web",
        parameters={"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
        implementation=my_search_function,
    )
])

# Models are locked to your purchase — selected at checkout
results = Benchmark().run(
    name="Agent Eval",
    purchased_benchmark_id="pb_...",
    prompts=["Find information about quantum computing"],
    agentic_config=config,
)

Supported Models

23 models across 8 providers:

Provider Models
OpenAI gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5-mini, gpt-5-nano
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
Google gemini-3.1-pro, gemini-3-flash, gemini-3.1-flash-lite, gemini-2.5-flash-lite
DeepSeek deepseek-v3.2, deepseek-v3.2-speciale, deepseek-r1
Qwen qwen3.5-397b-a17b, qwen3.5-flash-02-23, qwen3-235b-a22b
Meta llama-4-maverick, llama-4-scout
Mistral mistral-large-2512, mistral-small-2603
xAI grok-4, grok-4.1-fast

Pricing

Free trial: Every new organization gets one free benchmark (10 samples, 2 standard models).

Pay-as-you-go: Purchase benchmarks from the dashboard. Price depends on selected models, sample count, and judge tier. Starting from $4.99.

Documentation

Full documentation: modelscout.co/docs/sdk


License

Proprietary. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelscout_sdk-0.1.2.tar.gz (60.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modelscout_sdk-0.1.2-py3-none-any.whl (60.2 kB view details)

Uploaded Python 3

File details

Details for the file modelscout_sdk-0.1.2.tar.gz.

File metadata

  • Download URL: modelscout_sdk-0.1.2.tar.gz
  • Upload date:
  • Size: 60.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for modelscout_sdk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9d95433e9782b870274e76ce0ce9d8fef123069b6319296bbab93146f47bdc1b
MD5 c6a305da147ab4d27872126aa75056c3
BLAKE2b-256 6fd4cb4cfcb1659924d8d4a8648473cfa70c8a6085e777ac41b31398333a6558

See more details on using hashes here.

File details

Details for the file modelscout_sdk-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: modelscout_sdk-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 60.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for modelscout_sdk-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d4a85a4cd48ba93c33ab60c84f1a8782f880739f22f2aaef07955662928a8f7f
MD5 49ee5b71ffa1b1561fe626e4566794c7
BLAKE2b-256 1a2fec9bc51744b152e4ab76cf9dc64dd626d5b0bb499b6ff33b5239049c17a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page