Skip to main content

ToolPicker - hybrid lexical + semantic tool selection for LLM agents with many tools.

Project description

ToolPicker

Hybrid lexical + semantic tool selection for LLM agents with too many tools to fit in context. Three-stage router (BM25 + embeddings + optional intent classifier), Reciprocal Rank Fusion, token-budget packing.

pypi python license docs

Docs: ashwinugale.github.io/toolpicker · Issues: GitHub


Why

LLM agents have a tool-count ceiling. Past 15-20 tools in the schema, accuracy drops — the model gets confused about which tool to use, hallucinates parameters, takes longer paths. Past 50 tools, performance collapses. Carrying every tool schema also burns prompt tokens linearly while value is sparse: most tools are irrelevant to most queries.

The fix is to route: pick the K tools most relevant to the current query and only show those. Naive semantic search over tool descriptions handles some queries and fails on others (lexical-heavy queries like "get the order for BAN 989678111" miss semantic matches if no tool description uses the word "BAN"). Hybrid retrieval — BM25 + embeddings — fixes that, the same way modern document RAG does.

ToolPicker is the library that does this end to end, with a budget-aware packer, an optional intent classifier, and a reproducible eval harness.


Install

pip install toolpicker                    # core, zero deps
pip install "toolpicker[openai]"          # add real semantic retrieval
pip install "toolpicker[openai,openapi]"  # parse OpenAPI specs as tool sources
pip install "toolpicker[openai,mcp]"      # introspect MCP servers
pip install "toolpicker[openai,tokens]"   # accurate token-budget packing via tiktoken

Quickstart

from toolpicker import FunctionSchemaSource, OpenAIEmbeddings, ToolPicker

tools = [
    {"name": "get_weather", "description": "Get weather for a city.",
     "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}},
    {"name": "send_email", "description": "Send an email.",
     "parameters": {"type": "object", "properties": {"to": {"type": "string"}}}},
    # ... 48 more
]

picker = ToolPicker(FunctionSchemaSource(tools), embedder=OpenAIEmbeddings())
selected = picker.select("send a message to bob about the demo", k=5, token_budget=2000)
# selected = [Tool(name='send_email', ...), ...]  -- ready to hand to the LLM

Read the quickstart for the full walkthrough including the intent classifier and token-budget packer.


Headline numbers (v0.6)

Five-strategy comparison on a 200-case in-repo synthetic corpus and a 500-case Gorilla slice, OpenAI text-embedding-3-small:

Synthetic (200 cases, 25 tools):

strategy p@1 p@3 mrr
bm25-only 0.645 0.760 0.701
semantic-only 0.885 0.970 0.926
hybrid-rrf 0.800 0.960 0.879
intent-only 0.715 0.925 0.819
bm25+semantic+intent 0.845 0.965 0.908

Gorilla (500 cases, 1726 tools):

strategy p@1 p@3 mrr
bm25-only 0.062 0.122 0.098
semantic-only 0.102 0.186 0.147
hybrid-rrf 0.088 0.168 0.132

Honest read: on these corpora under uniform-weight RRF, pure semantic beats every hybrid. Intent narrows the gap (synthetic: 0.800 → 0.845 p@1) but doesn't close it. The library exposes all five strategies and weight knobs so you can find what works for your distribution. Reproducer:

uv run python -m evals.compare --benchmark synthetic --embedder openai --output out/compare.json

More on the concepts and eval harness pages.


What ToolPicker is not

  • Not a tool runner. Returns tools; you call them.
  • Not an agent framework. Plugs into LangChain, LlamaIndex, raw OpenAI, Claude Agent SDK — anything that takes a list[function_schema].
  • Not a vector database. Semantic half stores embeddings in-process; under ~10k tools is the sweet spot. If you have 100k tools, you want a vector DB.

Documentation

Full docs at ashwinugale.github.io/toolpicker:

  • Quickstart — install, declare tools, route a query.
  • Concepts — BM25, semantic, intent, RRF, token packing.
  • SourcesFunctionSchemaSource, OpenAPISource, MCPSource, MergedSource.
  • Eval harness — reproduce the headline numbers, run on ToolBench / Gorilla.
  • API reference — autogenerated.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolpicker-1.0.0.tar.gz (208.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toolpicker-1.0.0-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file toolpicker-1.0.0.tar.gz.

File metadata

  • Download URL: toolpicker-1.0.0.tar.gz
  • Upload date:
  • Size: 208.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for toolpicker-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5ba95dee7f89c4cfe9232a270eca973501882945b8b9374a79e4d6f666edc729
MD5 4062a251b72de991d8ea86b8bc532881
BLAKE2b-256 1b1c094e8fa99ca5c32c1bf09bdd8be797b50d21a89ff6f85051c59e09961984

See more details on using hashes here.

Provenance

The following attestation bundles were made for toolpicker-1.0.0.tar.gz:

Publisher: release.yml on AshwinUgale/toolpicker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file toolpicker-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: toolpicker-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for toolpicker-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69124369423dbf5961a5324410a2c12e266d68ec844ca1b252882dee9df6abd7
MD5 e5bb48ac7a97b19f578cae5184149af5
BLAKE2b-256 b94b2ac684f18d0dfc1e4f188ca697d340caa0cbdb16df58445d929c9c30283f

See more details on using hashes here.

Provenance

The following attestation bundles were made for toolpicker-1.0.0-py3-none-any.whl:

Publisher: release.yml on AshwinUgale/toolpicker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page