Find the right LLM models for your AI agents — automatic model selection with accuracy/cost/latency tradeoffs
Project description
AgentOpt
Find the right LLM models for your AI agents.
A simple model swap can cut your agent's costs by 10–100x without sacrificing performance.
Why AgentOpt
Choosing models for your agent is surprisingly hard. Which family? Small or big? Thinking or non-thinking? And different steps may need different models. The combinatorial space explodes fast — 3 steps × 8 models = 512 combinations to evaluate.
AgentOpt solves this automatically. Give it your agent and a small evaluation dataset, and it will efficiently search the model combination space to present you with the Pareto curve of performance/cost/latency tradeoffs — so you can make an informed choice.
AgentOpt works with almost any agent implementation and requires minimal wrappers to your existing agents.
Use Cases
Same accuracy band, 20–100x cost difference — just by picking the right model combination:
| Benchmark | Expensive Combo | Acc | Cost | Budget Combo | Acc | Cost | Savings |
|---|---|---|---|---|---|---|---|
| BFCL | Opus | 72% | $60.78 | Qwen3 Next | 71% | $1.87 | 32x |
| HotpotQA | Opus + Opus | ~73% | $2.71 | Qwen3 Next + gpt-oss-120b | 71.3% | $0.13 | 21x |
| MathQA | Opus + Opus | ~98.5% | $5.89 | Ministral + C3 Haiku | 94.0% | $0.05 | 118x |
Read more in our blog post.
Installation
pip install agentopt
Quick Start
Say you have an agent with two LLM steps (a planner and a solver) and you want to find the best model for each:
from agentopt import ModelSelector
selector = ModelSelector(
agent=MyAgent,
models={
"planner": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"], # 3 options
"solver": ["gpt-4o", "gpt-4o-mini", "gpt-4.1-nano"], # 3 options
}, # → 3 × 3 = 9 combinations to evaluate
eval_fn=eval_fn,
dataset=dataset,
method="brute_force", # or "auto" for smarter selection algorithms
)
results = selector.select_best(parallel=True, max_concurrent=50)
results.print_summary()
Output:
Model Selection Results
----------------------------------------------------------------------------
Rank Model Accuracy Latency Price
----------------------------------------------------------------------------
>>> 1 planner=gpt-4.1-nano + solver=gpt-4.1-nano 100.00% 0.85s $0.000420
2 planner=gpt-4o-mini + solver=gpt-4o-mini 100.00% 1.20s $0.002372
3 planner=gpt-4o + solver=gpt-4o 100.00% 2.70s $0.014355
...
Conceptually, this is what happens under the hood:
for combo in all_combinations(models): # e.g. {"planner": "gpt-4o", "solver": "gpt-4o-mini"}
agent = MyAgent(combo) # build agent with this model combo
for input_data, expected in dataset:
actual = agent.run(input_data) # run on each datapoint
score = eval_fn(expected, actual) # score the output
# rank combos by quality score, latency & cost
But AgentOpt does this efficiently with smart algorithms, parallelization, cost & latency tracking, and caching. With method="auto" (the default), it eliminates clearly worse combinations after just a few datapoints — finding the best model combination with far fewer API calls.
You just provide four things:
Agent — wrap your agent into a class with __init__(self, models) and run(self, input_data):
__init__(self, models)— receive a model configuration and do your agent creation.modelsis a dict that maps each step you want to optimize to a specific model, e.g.{"planner": "gpt-4o-mini", "solver": "gpt-4o"}.run(self, input_data)— run your agent on a single datapoint and return the output.
from openai import OpenAI
class MyAgent:
def __init__(self, models):
self.client = OpenAI()
self.planner_model = models["planner"]
self.solver_model = models["solver"]
def run(self, input_data):
plan = self.client.chat.completions.create(
model=self.planner_model,
messages=[{"role": "user", "content": f"Plan: {input_data}"}],
).choices[0].message.content
answer = self.client.chat.completions.create(
model=self.solver_model,
messages=[
{"role": "system", "content": f"Follow this plan:\n{plan}"},
{"role": "user", "content": input_data},
],
).choices[0].message.content
return answer
Dataset — a list of (input_data, expected_output) pairs:
dataset = [
("What is the capital of France?", "Paris"),
("What is 2 + 2?", "4"),
("What color is the sky?", "blue"),
# We recommend at least 100 samples for production decisions,
# but even 10-20 samples can surface clear winners during development.
]
Eval function — compares the agent output against the expected answer, returns a score:
def eval_fn(expected, actual):
return 1.0 if expected.lower() in str(actual).lower() else 0.0
LLM-as-judge is also supported — just call your judge LLM inside eval_fn.
Models — a dict mapping each step name to a list of candidate models to try. AgentOpt picks one from each list, constructs the agent, and evaluates it.
Framework Compatibility
AgentOpt works with any LLM framework that uses httpx under the hood. Here we provide examples for a few popular frameworks, but it literally works with any custom implementation:
| Framework | Status | Example |
|---|---|---|
| OpenAI Agents SDK | Supported | openai_sdk_example.py |
| LangChain / LangGraph | Supported | langchain_example.py, langgraph_example.py |
| CrewAI | Supported | crewai_example.py |
| LlamaIndex | Supported | llamaindex_example.py |
| AG2 | Supported | ag2_example.py |
| OpenAI-Compatible API SDK | Supported | custom_agent_example.py |
Selection Algorithms
AgentOpt includes a rich set of selection algorithms. Advanced users may get significant speedups by choosing the right method for their use case. See the documentation and advanced_selection_example.py for details.
method= |
Best for | How it works |
|---|---|---|
"auto" (default) |
General use | Automatically picks the best approach |
"brute_force" |
Small search spaces | Evaluates all combinations |
"random" |
Quick exploration | Samples a random fraction |
"hill_climbing" |
Topology-aware search | Greedy search using model quality/speed rankings |
"arm_elimination" |
Early pruning | Eliminates statistically dominated combinations |
"epsilon_lucb" |
Best-arm identification | Stops when LUCB confidence gap is within user epsilon |
"threshold" |
Thresholding objectives | Classifies combinations above/below user threshold |
"lm_proposal" |
LLM-guided search | Uses a proposer LLM to shortlist promising combinations |
"bayesian" |
Expensive evaluations | GP-based optimization (requires pip install "agentopt[bayesian]") |
selector = ModelSelector(
agent=MyAgent, models=models, eval_fn=eval_fn, dataset=dataset,
method="epsilon_lucb",
epsilon=0.5
)
results = selector.select_best(parallel=True)
How It Works
AgentOpt intercepts LLM calls at the httpx transport layer — the one chokepoint every LLM SDK shares. No proxy server, no framework adapters required.
your_agent(input)
└── framework internals (LangChain, CrewAI, etc.)
└── httpx.Client.send() ← intercepted here
└── LLM API (OpenAI, Anthropic, etc.)
For each model combination, AgentOpt:
- Instantiates your agent class with the candidate models
- Calls
run()on every datapoint in your evaluation set - Tracks token usage, latency, and cost automatically
- Scores the output using your evaluation function
- Reports the Pareto-optimal combinations
Response caching (in-memory + SQLite on disk) is enabled by default — identical LLM calls are never repeated, making iterative experimentation fast and cheap.
Results API
results = selector.select_best()
results.print_summary() # formatted table
best = results.get_best() # ModelResult with highest accuracy
combo = results.get_best_combo() # {"planner": "gpt-4o", "solver": "gpt-4o-mini"}
results.to_csv("results.csv") # export all results
results.export_config("config.yaml") # export best combo as YAML
Advanced Usage
Custom model pricing — define pricing for self-hosted or custom models:
selector = ModelSelector(
...,
model_prices={
"my-custom-model": {"input_price": 2.50, "output_price": 10.00},
},
)
Custom cache directory — LLM response caching is enabled by default (.agentopt_cache/). To customize:
from agentopt import LLMTracker
tracker = LLMTracker(cache_dir="./my_cache")
selector = ModelSelector(..., tracker=tracker)
results = selector.select_best() # cache flushed automatically
Using prebuilt LLM instances — pass framework-specific LLM objects instead of model name strings:
from langchain_openai import ChatOpenAI
selector = ModelSelector(
agent=MyAgent,
models={
"planner": [ChatOpenAI(model="gpt-4o"), ChatOpenAI(model="gpt-4o-mini")],
"solver": [ChatOpenAI(model="gpt-4o"), ChatOpenAI(model="gpt-4o-mini")],
},
eval_fn=eval_fn,
dataset=dataset,
)
Documentation
Full documentation at agentoptimizer.github.io/agentopt — including detailed guides on the Results API, response caching, and custom model pricing.
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentopt_py-0.1.0.tar.gz.
File metadata
- Download URL: agentopt_py-0.1.0.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c00f4f589bfef5407e123416fe4f4cf1f52c8efbb674c6e7d7d1fbe5a1902ce4
|
|
| MD5 |
6aa8330b69b7c788d2d18decaab03627
|
|
| BLAKE2b-256 |
8d16334382bba07974abf6b87bbd2bf164f80c2234a031526e15b257676e8333
|
File details
Details for the file agentopt_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentopt_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 61.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4c4ec7c348410eed31a0f3b53111165c4991e536685738ab7afffbae0e40f63
|
|
| MD5 |
158e20bd532c9f81558222c6984f7a58
|
|
| BLAKE2b-256 |
dabb13d7e420b864535d8d2d649aacc6780a93c8730662f739d31b0ae241e8c6
|