Skip to main content

Production-grade tool routing for AI agents

Project description

CLIbrary

Production-grade tool routing for AI agents.
給 AI agent 用的 production 級工具路由系統。

License: MIT Python


The Problem

AI agents waste massive amounts of tokens on tool selection. When an agent has 100+ available tools, current approaches force LLMs to read every tool's documentation in the context window before making a decision:

  • Token waste: 50,000+ tokens per call just for tool descriptions
  • High latency: 2–5 seconds before any actual work happens
  • Poor accuracy: LLMs get "lost in the middle" with too many options
  • Unbounded cost: Linear cost growth as tool count increases

When you scale beyond 50 tools, current systems break down.


How CLIbrary Works

CLIbrary uses a two-stage retrieval architecture to route agent intents to the right tool in ~36ms using only ~150 tokens.

Traditional approach:
  Agent intent → Stuff all 500 tools into LLM → LLM picks
  ~50,000 tokens, ~3 seconds, ~70% accuracy

CLIbrary approach:
  Agent intent → CLIbrary router → Single tool + params
  ~150 tokens, ~36ms, ~93% top-3 accuracy

Architecture

Agent intent
    │
    ▼
Embed (multilingual-e5-base, ~10ms)
    │
    ▼
Stage 1: FAISS cli_index → top-3 candidates (~5ms)
    │
    ▼
MaxSim re-rank  (combined = 0.7×mean_sim + 0.3×max_sim)
    │
    ├── low confidence + small gap → Clarify (return top-3 to LLM)
    │
    └── high confidence
            │
            ▼
        Stage 2: example_index → best matching example
            │
            ├── sim ≥ 0.85 → Path A: template fill (no LLM, ~80% of calls)
            │
            └── sim < 0.85 → Path B: LLM param extraction (~20% of calls)
                    │
                    ▼
              Tool call JSON output

Quick Start

pip install clibrary
from clibrary import CLIbrary

router = CLIbrary()
result = router.route("幫我查上週銷售總額")

# {
#   "action": "route",
#   "cli": "sql-runner",
#   "params": {"query": "SELECT SUM(amount) FROM orders WHERE ...", "output_format": "json"},
#   "confidence": 0.94,
#   "source": "A",
#   "latency_ms": 36
# }

Performance

Evaluated on 2,050 queries across 500 CLIs:

Metric Result
Top-1 accuracy 82.3%
Top-3 accuracy 92.5%
Path A hit rate (no LLM needed) 93.6%
Median latency 36ms
Token usage ~150

Compared to traditional "stuff all tools into LLM" approach:

Metric CLIbrary Traditional
Tokens ~150 ~50,000
Latency 36ms 2–5s
Accuracy (100+ tools) 82–92% 60–75%

Manifest Format

Each CLI tool has a manifest.json describing its purpose, inputs, and examples:

{
  "name": "sql-runner",
  "version": "1.0.0",
  "category": "data",
  "description": "Execute SQL queries against a database",
  "intent_triggers": [
    "query a database",
    "run a SQL statement",
    "查資料庫",
    "跑 SQL"
  ],
  "input_schema": {
    "type": "object",
    "properties": {
      "query": {"type": "string", "description": "SQL query to execute"},
      "output_format": {"type": "string", "enum": ["json", "csv", "table"], "default": "json"}
    },
    "required": ["query"]
  },
  "examples": [
    {
      "query": "查上週銷售總額",
      "params": {"query": "SELECT SUM(amount) FROM orders WHERE created_at > NOW() - INTERVAL 7 DAY", "output_format": "json"}
    }
  ],
  "tags": ["sql", "database", "query"]
}

Three FAISS Indices

Index Vectors Content Usage
cli_index 500 Mean-pooled intent_triggers per CLI Stage 1 candidate retrieval
trigger_index ~3,500 Individual trigger vectors MaxSim re-ranking
example_index ~1,500 Example query vectors Stage 2 template matching

Project Structure

clibrary/
├── src/clibrary/
│   ├── router.py          # Core routing logic
│   ├── manifest.py        # Manifest loader
│   └── matchers/          # Embedding, keyword, LLM matchers
├── poc/
│   ├── router.py          # POC implementation
│   ├── build_index.py     # FAISS index builder
│   ├── bench.py           # Benchmark runner
│   └── eval/              # Evaluation datasets
├── tests/
└── README.md

Why It Works

  1. Embedding routing beats LLM selection at scale — geometric distance is not affected by "lost in the middle" attention dilution.
  2. Two-stage design improves accuracy: broad candidate retrieval in Stage 1, precise example matching in Stage 2.
  3. Example-based caching eliminates LLM calls for ~80% of queries.
  4. Multilingual (multilingual-e5-base) handles mixed Chinese/English queries natively.

Comparison

CLIbrary LangChain Tool Retrieval Function Calling MCP
Scales to 500+ tools ⚠️ ⚠️
No LLM for routing ✅ (80%)
Manifest standard
Multilingual ⚠️ ⚠️ ⚠️

CLIbrary complements MCP: CLIbrary handles routing, MCP handles protocol.


Status

  • ✅ Manifest schema v1.0
  • ✅ 500 CLI manifests (8 categories)
  • ✅ Reference POC implementation
  • ✅ Evaluation dataset (2,050 queries)
  • 🚧 pip package (in progress)

License

MIT License — free to use, including for commercial purposes.


Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clibrary_hub-0.1.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clibrary_hub-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file clibrary_hub-0.1.0.tar.gz.

File metadata

  • Download URL: clibrary_hub-0.1.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for clibrary_hub-0.1.0.tar.gz
Algorithm Hash digest
SHA256 88e0f27d1b361e6da5bc2d5d40b8dfe2c1e56354ed80f2ab0e9bae4723a775de
MD5 ea401385823e60e00a1289b2e172315f
BLAKE2b-256 87ffa17ef2d78048fa1323aaf7406d4645b2c3c8fbf977412a26b26303603de7

See more details on using hashes here.

File details

Details for the file clibrary_hub-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: clibrary_hub-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for clibrary_hub-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35a272b97eb357ee457ac87ab76af31d19b970c27b5a1aa355573b8a162d448d
MD5 25aa1a3c8c836c2cfc1d94d81f1ef0e9
BLAKE2b-256 b8563695594adb53035aa9101b7523315ca7544c5eae0487484cbd3cf901b8e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page