Skip to main content

A provider-agnostic LLM toolkit with tool calling, skills, and parallel execution.

Project description

llmstitch

A provider-agnostic LLM toolkit with tool calling, skills, and parallel execution.

Stitch together Anthropic, OpenAI, Gemini, Groq, and OpenRouter behind one Agent loop. Define tools with a decorator, compose behaviors as skills, and execute tool calls concurrently — all with a tiny, typed core.

Install

pip install llmstitch[anthropic]       # just the Anthropic SDK
pip install llmstitch[openai]          # just the OpenAI SDK
pip install llmstitch[gemini]          # just the Gemini SDK
pip install llmstitch[groq]            # just the Groq SDK
pip install llmstitch[openrouter]      # OpenRouter (reuses the openai SDK)
pip install llmstitch[all]             # all five

The bare pip install llmstitch has zero runtime dependencies — provider SDKs are opt-in extras.

30-second example

import asyncio
from llmstitch import Agent, tool
from llmstitch.providers.anthropic import AnthropicAdapter

@tool
def get_weather(city: str) -> str:
    """Return a canned weather report for the given city."""
    return f"{city}: 72°F and sunny"

agent = Agent(
    provider=AnthropicAdapter(),
    model="claude-opus-4-7",
    system="You are a helpful weather assistant.",
)
agent.tools.register(get_weather)

messages = asyncio.run(agent.run("What's the weather in Tokyo?"))
print(messages[-1].content)

Features

  • Provider-agnostic — swap AnthropicAdapter for OpenAIAdapter, GeminiAdapter, GroqAdapter, or OpenRouterAdapter without touching your agent code.
  • Typed @tool decorator — JSON Schema generated from type hints (Optional, Literal, defaults, async).
  • Parallel tool execution — when a model returns multiple tool calls in one turn, they run concurrently.
  • StreamingAgent.run_stream() yields provider-neutral events (TextDelta, ToolUseStart / Delta / Stop, MessageStop, terminal StreamDone) and handles tool execution between turns.
  • Retries — opt in with a RetryPolicy; exponential backoff with jitter, honors Retry-After headers, uses each adapter's own transient-error classes.
  • Token countingAgent.count_tokens(prompt) via native provider endpoints (Anthropic, Gemini).
  • Usage and costagent.usage (a UsageTally) accumulates tokens, turns, API calls, and retries across a run; agent.cost() prices it against a Pricing rate card in USD.
  • Skills — bundle a system prompt with a set of tools; compose with .extend().
  • PEP 561 typed — ships with py.typed, fully checked under mypy --strict.

Streaming example

import asyncio
from llmstitch import Agent, TextDelta, StreamDone
from llmstitch.providers.anthropic import AnthropicAdapter

async def main() -> None:
    agent = Agent(provider=AnthropicAdapter(), model="claude-opus-4-7")
    async for event in agent.run_stream("Tell me a haiku about streams."):
        if isinstance(event, TextDelta):
            print(event.text, end="", flush=True)
        elif isinstance(event, StreamDone):
            print(f"\n[stop_reason={event.response.stop_reason}]")

asyncio.run(main())

Retries

from llmstitch import Agent, RetryPolicy
from llmstitch.providers.anthropic import AnthropicAdapter

agent = Agent(
    provider=AnthropicAdapter(),
    model="claude-opus-4-7",
    retry_policy=RetryPolicy(
        max_attempts=3,
        retry_on=AnthropicAdapter.default_retryable(),
    ),
)

Transient errors (rate limits, timeouts, connection drops, 5xx) are retried with exponential backoff + jitter; Retry-After headers raise the delay floor. Non-retryable exceptions pass through unchanged. Retries cover Agent.run (non-streaming) — run_stream is not retried in v0.1.3 because deltas may already have been yielded to the caller.

Token counting

count = await agent.count_tokens("How many tokens is this?")
print(count.input_tokens)

Available natively on AnthropicAdapter and GeminiAdapter. Other adapters raise NotImplementedError — llmstitch doesn't estimate with third-party tokenizers, since the counts can disagree with the provider's own.

Usage and cost

from llmstitch import Agent, Pricing
from llmstitch.providers.anthropic import AnthropicAdapter

agent = Agent(
    provider=AnthropicAdapter(),
    model="claude-opus-4-7",
    pricing=Pricing(input_per_mtok=15.00, output_per_mtok=75.00),  # paste from vendor rate card
)

await agent.run("Summarize the Iliad in three sentences.")

print(agent.usage)         # UsageTally(input_tokens=..., output_tokens=..., turns=1, api_calls=1, retries=0)
print(agent.cost().total)  # USD

agent.usage accumulates across every run / run_stream on that agent — tokens (fed by adapters that report usage), turns (model responses folded in), api_calls (provider invocations), and retries (from the retry policy). Call agent.usage.reset() to zero the counters between logical sessions, or usage.cost(other_pricing) directly to price the same tally against a different rate card. The default Pricing(1.00, 2.00) is a placeholder — pass real vendor rates for accurate costs.

More examples

The examples/ directory has runnable scripts for:

  • basic.py — minimal agent with one tool.
  • skills_demo.py — composing two Skills with .extend().
  • streaming.pyAgent.run_stream with rich event handling.
  • providers_gallery.py — the same agent against every provider.
  • parallel_tools.py — parallel tool execution with order-preserving results.
  • async_and_timeout.py — async tools, per-call timeout, captured-exception semantics.
  • retries.pyRetryPolicy with backoff, jitter, and an on_retry observability hook.
  • token_counting.pyAgent.count_tokens on Anthropic + Gemini, with graceful fallback on adapters that don't support native counting.

Status

Alpha. MCP support and structured-output helpers are on the roadmap. See CHANGELOG.md for release history and ARCHITECTURE.md for a walkthrough of how the library is put together.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmstitch-0.1.3.tar.gz (55.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmstitch-0.1.3-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file llmstitch-0.1.3.tar.gz.

File metadata

  • Download URL: llmstitch-0.1.3.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmstitch-0.1.3.tar.gz
Algorithm Hash digest
SHA256 6e09d9863bb613019f36ba39a622a5309e0dd11dcc377ba9399b2e1731e12aee
MD5 0ffaf86cb5b17fe39c3a5d6e1f8f0bdc
BLAKE2b-256 008af2ebdcc144ffc198490b91d751ce486f6bad99c3af420b919b4c23202065

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmstitch-0.1.3.tar.gz:

Publisher: release.yml on bengeos/llmstitch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llmstitch-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: llmstitch-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmstitch-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 22ee3508f01dc170c762907e2316c2622ad79d1c90e049417738abfedd0be6ab
MD5 8cbd0390dc46f3f1d8b1292784e52fab
BLAKE2b-256 2296528914a056e8eb0fe08d27cb3b044258fd08183721a299a8755ffb304d1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmstitch-0.1.3-py3-none-any.whl:

Publisher: release.yml on bengeos/llmstitch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page