Skip to main content

Tool-call parsing and normalization for LLM dialects

Project description

Tooletta

Tool-call normalization for LLM dialects.

Tooletta is a small translation layer between model-specific tool-call formats and a canonical ToolCall schema. It is designed to fit the annoying part of model training, distillation, evals, and agent infrastructure: OpenAI, Qwen/Hermes, Mistral, Kimi, DeepSeek, and other model families do not agree on the text they emit for the same semantic tool call.

The center of gravity is Tinker-style training workflows. Tooletta is standalone, but its API is meant to stay easy to slot next to Tinker-compatible renderer, dataset-prep, and distillation code.

teacher trace / model output -> dialect parser -> ToolCall IR -> target renderer -> student format

Why

Tool calls are semantically simple and syntactically annoying. A model may emit <tool_call>{...}</tool_call>, another may use [TOOL_CALLS]..., another may expose OpenAI-compatible JSON objects, and training code still needs one reliable representation.

Tooletta's bet is simple: parse supported dialects at the boundary, keep a tiny canonical IR in the middle, and render only when you need model-specific text again.

That shape is directly inspired by Tinker's cookbook renderers: parse model-specific responses into canonical messages/tool calls, then render the canonical structure into the target model's format. Tooletta starts smaller on purpose: string-level tool-call parsing and rendering first, with tokenizer-aware training helpers considered only after the IR and dialect contract are solid.

Tinker Compatibility

Tinker compatibility is a first-class design target. Tooletta should make it boring to normalize tool calls before they enter a Tinker-style renderer or dataset builder, and to translate tool calls from one model-family dialect into the target format a student model expects.

The package does not depend on Tinker today and does not claim full Tinker renderer parity yet. The near-term goal is to keep the canonical ToolCall contract and parse/render APIs aligned with that style of workflow while the richer training helpers earn their tests.

The current boundary is intentionally boring: ToolCall.to_openai() returns the OpenAI/Tinker-style function-call object, and ToolCall.from_openai(...) accepts both plain mappings and object-style calls with .function.name / .function.arguments.

Scope

Tooletta currently handles:

  • parsing supported tool-call dialects into ToolCall objects
  • rendering ToolCall objects into supported target dialects
  • best-effort dialect auto-detection
  • custom dialect registration
  • a stdin/stdout CLI for normalization and format conversion

Tooletta does not yet handle:

  • tokenizer-specific token boundaries
  • loss-mask generation
  • full chat-template rendering
  • streaming parser state
  • full training or distillation pipeline orchestration

Install

After the first release:

uv add tooletta

For local development:

uv sync

Quick Start

from tooletta import parse_tool_calls, render_tool_calls

text = '<tool_call>{"name":"search","arguments":{"query":"tool calling"}}</tool_call>'

calls = parse_tool_calls(text, dialect="hermes")
print(calls[0].name)
print(calls[0].arguments)

print(render_tool_calls(calls, dialect="kimi"))
## Calling: search
{"query":"tool calling"}

CLI

Normalize a Qwen/Hermes-style tool call to canonical JSON:

printf '<tool_call>{"name":"search","arguments":{"query":"python"}}</tool_call>' \
  | uv run tooletta --from hermes

Render to another dialect:

printf '<tool_call>{"name":"search","arguments":{"query":"python"}}</tool_call>' \
  | uv run tooletta --from hermes --to kimi

Built-In Dialects

Dialect Aliases Accepted shape Rendered shape
canonical json, tooletta {"name": "...", "arguments": {...}} or a list of those objects compact JSON list of canonical tool-call objects
openai oai OpenAI/Tinker-style tool_calls and legacy function_call, including object-style .function.name / .function.arguments calls compact JSON list of OpenAI-compatible function tool-call objects
hermes qwen, nous, nous-hermes <tool_call>{...}</tool_call> blocks <tool_call> blocks with canonical {name, arguments} payloads
mistral none [TOOL_CALLS] followed by an embedded JSON object or list payload [TOOL_CALLS] plus a compact JSON list
deepseek deepseek-v3 DeepSeek-style <|tool▁calls▁begin|>...<|tool▁calls▁end|> blocks the same DeepSeek tool-call block style
kimi moonshot simplified ## Calling: name blocks followed by JSON arguments ## Calling: name blocks
kimi-k2 kimi_k2, moonshot-k2 Kimi K2 `< tool_calls_section_begin

parse_tool_calls(..., dialect="auto") uses a stable built-in dialect order. Custom dialects are supported for explicit parsing/rendering by name; keep auto-detection for unambiguous built-ins.

Development

uv run ruff check .
uv run ruff format --check .
uv run mypy src tests
uv run pytest
uv run pre-commit run --all-files

Run the live Tinker smoke only when you intentionally want a real API call:

TOOLETTA_RUN_LIVE_TINKER=1 \
  uv run --env-file .env --with 'tinker>=0.9.0' pytest tests/test_live_tinker_smoke.py -q

By default it uses meta-llama/Llama-3.2-1B, rank 1, and a single training datum built from Tooletta-rendered tool-call text.

Status

Pre-alpha. The first goal is a fast, dependency-free runtime core that nails the IR, parser behavior, renderer behavior, and dialect plugin surface before growing a larger model-format zoo or training-specific APIs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tooletta-0.1.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tooletta-0.1.0-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file tooletta-0.1.0.tar.gz.

File metadata

  • Download URL: tooletta-0.1.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tooletta-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5e9c74ec5703f82fa0439128aab1b1e232745f125de6eaff748b3a961a94307a
MD5 10b2c547dc4e5bf676e6831f4e639142
BLAKE2b-256 ad65bf8ba66e6680c04f98fe9582fbb103f8cccd28b912cfc1280fa042a10929

See more details on using hashes here.

File details

Details for the file tooletta-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tooletta-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for tooletta-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ca17617ed8d76f9e50700599f5478db9cbb10cc93b3ddeca359f9178a1f54b0
MD5 e4083d21c7016dacfff08d590ebe805a
BLAKE2b-256 69566fc27065ec46e6a0de7079c8ee605a7ca1d72fdd64f962eefedcc4ef8099

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page