Skip to main content

Python SDK for ModelScout - LLM Benchmarking and Evaluation

Project description

ModelScout Python SDK

Find the best LLM for your product. Run benchmarks across multiple models on your own data to see which performs best for quality, cost, and latency.

Installation

pip install modelscout-sdk

Quick Start

from modelscout import Benchmark, ModelConfig

# Set MODELSCOUT_API_KEY in your environment, or pass api_key="ms_..."
results = Benchmark().run(
    pack="trial",  # free trial, or "pb_..." from dashboard purchase
    prompts=["Write a SQL query to find active users", "Explain quantum computing"],
    models=[
        ModelConfig(provider="openai", model="gpt-5-mini"),
        ModelConfig(provider="anthropic", model="claude-haiku-4-5-20251001"),
        ModelConfig(provider="deepseek", model="deepseek-v3.2"),
    ],
)

print(results.best_model_for("quality"))  # Best quality model
print(results.best_model_for("cost"))     # Cheapest model

Features

Benchmarking

Compare LLMs side-by-side on your evaluation data. Get quality scores, cost analysis, latency metrics, and statistical significance.

Data Generation

Need synthetic test data? Generate evaluation datasets from the dashboard — describe your use case and get representative prompts in minutes.

Dataset Upload

Upload your own evaluation data:

dataset_id = benchmark.upload_dataset(
    name="My Test Data",
    samples=[
        {"input": "What is machine learning?"},
        {"input": "Explain neural networks"},
    ],
)

Agentic Evaluation

Test tool-calling capabilities with multi-turn evaluation (SDK-only):

from modelscout import Benchmark, ModelConfig, AgenticConfig, ToolDefinition

def my_search_function(query: str) -> str:
    return f"Results for: {query}"

config = AgenticConfig(tools=[
    ToolDefinition(
        name="search",
        description="Search the web",
        parameters={"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
        implementation=my_search_function,
    )
])

results = Benchmark().run(
    name="Agent Eval",
    pack="pb_...",
    prompts=["Find information about quantum computing"],
    models=[
        ModelConfig(provider="openai", model="gpt-5-mini"),
        ModelConfig(provider="anthropic", model="claude-haiku-4-5-20251001"),
    ],
    agentic_config=config,
)

Supported Models

23 models across 8 providers:

Provider Models
OpenAI gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5-mini, gpt-5-nano
Anthropic claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001
Google gemini-3.1-pro, gemini-3-flash, gemini-3.1-flash-lite, gemini-2.5-flash-lite
DeepSeek deepseek-v3.2, deepseek-v3.2-speciale, deepseek-r1
Qwen qwen3.5-397b-a17b, qwen3.5-flash-02-23, qwen3-235b-a22b
Meta llama-4-maverick, llama-4-scout
Mistral mistral-large-2512, mistral-small-2603
xAI grok-4, grok-4.1-fast

Pricing

Free trial: Every new organization gets one free benchmark (10 samples, 2 standard models).

Pay-as-you-go: Purchase benchmarks from the dashboard. Price depends on selected models, sample count, and judge tier. Starting from $4.99.

Documentation

Full documentation: modelscout.co/docs/sdk


License

Proprietary. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelscout_sdk-0.1.0.tar.gz (59.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modelscout_sdk-0.1.0-py3-none-any.whl (59.2 kB view details)

Uploaded Python 3

File details

Details for the file modelscout_sdk-0.1.0.tar.gz.

File metadata

  • Download URL: modelscout_sdk-0.1.0.tar.gz
  • Upload date:
  • Size: 59.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for modelscout_sdk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a937ae46e2fbdf675ab1284ad43cc190a5cfb9f5d82ad2e5ea5e17e0d7249b9c
MD5 91a42b2bed93e291c782dc5c1dfadffe
BLAKE2b-256 4f61d7ee5ee676fc92ea70ae8c9a505edfd80a08c5423c446275531f1dcb99fb

See more details on using hashes here.

File details

Details for the file modelscout_sdk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: modelscout_sdk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 59.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for modelscout_sdk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bf2ed3d09ba933dc16c8158b96c958e5235845906bec51bfd1db69ad4c39303f
MD5 77061134fd361d88136d320c7592fa9b
BLAKE2b-256 9d5fe4f4336838e3c9a062fe3c213c1e434427f36107f54a53b572e5f7448d4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page