Skip to main content

Python SDK for ModelScout - LLM Benchmarking and Evaluation

Project description

ModelScout Python SDK

Find the best LLM for your product. Run benchmarks across multiple models on your own data to see which performs best for quality, cost, and latency.

Installation

pip install modelscout-sdk

Quick Start

from modelscout import Benchmark, ModelConfig

# Set MODELSCOUT_API_KEY in your environment, or pass api_key="ms_..."
results = Benchmark().run(
    purchased_benchmark_id="trial",  # free trial, or "pb_..." from dashboard purchase
    prompts=["Write a SQL query to find active users", "Explain quantum computing"],
    models=[
        ModelConfig(provider="openai", model="gpt-5-mini"),
        ModelConfig(provider="anthropic", model="claude-haiku-4-5-20251001"),
        ModelConfig(provider="deepseek", model="deepseek-v3.2"),
    ],
)

print(results.best_model_for("quality"))  # Best quality model
print(results.best_model_for("cost"))     # Cheapest model

Features

Benchmarking

Compare LLMs side-by-side on your evaluation data. Get quality scores, cost analysis, latency metrics, and statistical significance.

Data Generation

Need synthetic test data? Generate evaluation datasets from the dashboard — describe your use case and get representative prompts in minutes.

Dataset Upload

Upload your own evaluation data:

dataset_id = benchmark.upload_dataset(
    name="My Test Data",
    samples=[
        {"input": "What is machine learning?"},
        {"input": "Explain neural networks"},
    ],
)

Agentic Evaluation

Test tool-calling capabilities with multi-turn evaluation (SDK-only):

from modelscout import Benchmark, ModelConfig, AgenticConfig, ToolDefinition

def my_search_function(query: str) -> str:
    return f"Results for: {query}"

config = AgenticConfig(tools=[
    ToolDefinition(
        name="search",
        description="Search the web",
        parameters={"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
        implementation=my_search_function,
    )
])

results = Benchmark().run(
    name="Agent Eval",
    purchased_benchmark_id="pb_...",
    prompts=["Find information about quantum computing"],
    models=[
        ModelConfig(provider="openai", model="gpt-5-mini"),
        ModelConfig(provider="anthropic", model="claude-haiku-4-5-20251001"),
    ],
    agentic_config=config,
)

Supported Models

23 models across 8 providers:

Provider Models
OpenAI gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5-mini, gpt-5-nano
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
Google gemini-3.1-pro, gemini-3-flash, gemini-3.1-flash-lite, gemini-2.5-flash-lite
DeepSeek deepseek-v3.2, deepseek-v3.2-speciale, deepseek-r1
Qwen qwen3.5-397b-a17b, qwen3.5-flash-02-23, qwen3-235b-a22b
Meta llama-4-maverick, llama-4-scout
Mistral mistral-large-2512, mistral-small-2603
xAI grok-4, grok-4.1-fast

Pricing

Free trial: Every new organization gets one free benchmark (10 samples, 2 standard models).

Pay-as-you-go: Purchase benchmarks from the dashboard. Price depends on selected models, sample count, and judge tier. Starting from $4.99.

Documentation

Full documentation: modelscout.co/docs/sdk


License

Proprietary. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelscout_sdk-0.1.1.tar.gz (60.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modelscout_sdk-0.1.1-py3-none-any.whl (59.9 kB view details)

Uploaded Python 3

File details

Details for the file modelscout_sdk-0.1.1.tar.gz.

File metadata

  • Download URL: modelscout_sdk-0.1.1.tar.gz
  • Upload date:
  • Size: 60.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for modelscout_sdk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a928d64b3285efdc3e1dbe6d9441961cfad275433923939fdc538fa36656ee95
MD5 c10ea94ffa2e93a15c9d67863d5d707f
BLAKE2b-256 370660a0dbe6b446b9216c44c98cc510cd35fd9f6d0e25e6203ad3421f5c213a

See more details on using hashes here.

File details

Details for the file modelscout_sdk-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: modelscout_sdk-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 59.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for modelscout_sdk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 023fac7bd51df1d1a05d99203fffe3d7c839cee305b022d08ba02ed9f6c0ff69
MD5 2e8a56802987e17af7def10b15c3940a
BLAKE2b-256 d76c81bc772e53f70c8c475b8fc9ba4ad23ed2d0358fbc92e2498fdc686724c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page