Python SDK for ModelScout - LLM Benchmarking and Evaluation
Project description
ModelScout Python SDK
Find the best LLM for your product. Run benchmarks across multiple models on your own data to see which performs best for quality, cost, and latency.
Installation
pip install modelscout-sdk
Quick Start
from modelscout import Benchmark
# Set MODELSCOUT_API_KEY in your environment, or pass api_key="ms_..."
# Models are selected at checkout and locked to your purchase
results = Benchmark().run(
purchased_benchmark_id="trial", # free trial, or "pb_..." from dashboard
prompts=["Write a SQL query to find active users", "Explain quantum computing"],
)
print(results.best_model_for("quality")) # Best quality model
print(results.best_model_for("cost")) # Cheapest model
Features
Benchmarking
Compare LLMs side-by-side on your evaluation data. Get quality scores, cost analysis, latency metrics, and statistical significance.
Data Generation
Need synthetic test data? Generate evaluation datasets from the dashboard — describe your use case and get representative prompts in minutes.
Dataset Upload
Upload your own evaluation data:
dataset_id = benchmark.upload_dataset(
name="My Test Data",
samples=[
{"input": "What is machine learning?"},
{"input": "Explain neural networks"},
],
)
Agentic Evaluation
Test tool-calling capabilities with multi-turn evaluation (SDK-only):
from modelscout import Benchmark, AgenticConfig, ToolDefinition
def my_search_function(query: str) -> str:
return f"Results for: {query}"
config = AgenticConfig(tools=[
ToolDefinition(
name="search",
description="Search the web",
parameters={"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]},
implementation=my_search_function,
)
])
# Models are locked to your purchase — selected at checkout
results = Benchmark().run(
name="Agent Eval",
purchased_benchmark_id="pb_...",
prompts=["Find information about quantum computing"],
agentic_config=config,
)
Supported Models
23 models across 8 providers:
| Provider | Models |
|---|---|
| OpenAI | gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5-mini, gpt-5-nano |
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5-20251001 |
| gemini-3.1-pro, gemini-3-flash, gemini-3.1-flash-lite, gemini-2.5-flash-lite | |
| DeepSeek | deepseek-v3.2, deepseek-v3.2-speciale, deepseek-r1 |
| Qwen | qwen3.5-397b-a17b, qwen3.5-flash-02-23, qwen3-235b-a22b |
| Meta | llama-4-maverick, llama-4-scout |
| Mistral | mistral-large-2512, mistral-small-2603 |
| xAI | grok-4, grok-4.1-fast |
Pricing
Free trial: Every new organization gets one free benchmark (10 samples, 2 standard models).
Pay-as-you-go: Purchase benchmarks from the dashboard. Price depends on selected models, sample count, and judge tier. Starting from $4.99.
Documentation
Full documentation: modelscout.co/docs/sdk
License
Proprietary. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modelscout_sdk-0.1.2.tar.gz.
File metadata
- Download URL: modelscout_sdk-0.1.2.tar.gz
- Upload date:
- Size: 60.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d95433e9782b870274e76ce0ce9d8fef123069b6319296bbab93146f47bdc1b
|
|
| MD5 |
c6a305da147ab4d27872126aa75056c3
|
|
| BLAKE2b-256 |
6fd4cb4cfcb1659924d8d4a8648473cfa70c8a6085e777ac41b31398333a6558
|
File details
Details for the file modelscout_sdk-0.1.2-py3-none-any.whl.
File metadata
- Download URL: modelscout_sdk-0.1.2-py3-none-any.whl
- Upload date:
- Size: 60.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4a85a4cd48ba93c33ab60c84f1a8782f880739f22f2aaef07955662928a8f7f
|
|
| MD5 |
49ee5b71ffa1b1561fe626e4566794c7
|
|
| BLAKE2b-256 |
1a2fec9bc51744b152e4ab76cf9dc64dd626d5b0bb499b6ff33b5239049c17a2
|