arcllm-sdk

The arc connecting you to every LLM. Minimal dependencies, maximum performance.

These details have not been verified by PyPI

Project links

Project description

arcllm

The arc connecting you to every LLM

Minimal dependencies. Maximum performance. One unified API.

Installation • Quick Start • Providers • Features • Docs

Why ArcLLM?

ArcLLM ships a single unified, OpenAI-compatible surface across every major LLM provider with a tightly curated runtime footprint:

4 runtime deps: httpx[http2], aiohttp, msgspec, orjson — all chosen for raw speed.
OpenAI-compatible API so existing client code keeps working.
Sync + async, streaming, tools, structured output, vision, embeddings in one library.
Built-in cost + capability tracking for every supported model.

Built for developers who want speed, simplicity, and reliability when working with LLMs.

Installation

pip install arcllm-sdk

Quick Start

import arcllm

# Simple completion
response = arcllm.completion(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Streaming

stream = arcllm.completion(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Async

response = await arcllm.acompletion(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

Different providers

# OpenAI
arcllm.completion(model="gpt-4o", messages=messages)

# Anthropic
arcllm.completion(model="anthropic/claude-sonnet-4-5", messages=messages)

# Google Gemini
arcllm.completion(model="gemini/gemini-2.5-pro", messages=messages)

# Groq (ultra-fast inference)
arcllm.completion(model="groq/llama-3.3-70b-versatile", messages=messages)

# Together AI / Fireworks (open-weight flagships: Llama 4, Qwen 3, DeepSeek, Kimi, GLM, MiniMax)
arcllm.completion(model="together_ai/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", messages=messages)
arcllm.completion(model="fireworks_ai/accounts/fireworks/models/deepseek-v4-pro", messages=messages)

# Local with Ollama
arcllm.completion(model="ollama/llama3.3", messages=messages)

Supported providers

28 providers, grouped by surface. The model prefix you pass to arcllm.completion(model=...) is shown in the Prefix column.

First-party APIs

Provider	Prefix	Highlights
OpenAI	`openai/`	GPT-5 family, GPT-4.1, GPT-4o, o-series reasoning, embeddings
Anthropic	`anthropic/`	Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 (extended thinking)
Google Gemini	`gemini/`	Gemini 2.5 / 3.x with thinking config
Mistral	`mistral/`	Mistral Large/Medium/Small, Codestral, Pixtral, embeddings
Cohere	`cohere/`	Command A/R+/R, Aya Vision, Embed v4, Rerank v3.5
DeepSeek	`deepseek/`	DeepSeek V4 Flash + Pro (chat + reasoner)
xAI	`xai/`	Grok-4 / 4.1 / 4.20 / 4.3 family + Grok-3 (legacy)
Perplexity	`perplexity/`	Sonar, Sonar Pro, Sonar Reasoning, Deep Research
Groq	`groq/`	Llama 3/4, GPT-OSS, Qwen 3 (LPU low-latency)
Together AI	`together_ai/`	Llama 4, Qwen 3, DeepSeek V4, Kimi, GLM, MiniMax
Fireworks AI	`fireworks_ai/`	DeepSeek V4 Pro, Kimi K2, GLM 5.1, Llama, Qwen
Cerebras	`cerebras/`	Llama 3.x, Qwen 3, GPT-OSS on CS-3 wafer-scale
SambaNova	`sambanova/`	Llama 3.x / Llama 4, DeepSeek, MiniMax on RDU
DeepInfra	`deepinfra/`	Full open-weights catalog: Llama, Qwen, DeepSeek, Phi, Gemma, Kimi
AI21	`ai21/`	Jamba 1.5 Large + Mini
Nebius AI	`nebius/`	Llama 3.x, Qwen 2.5/3, DeepSeek R1/V3, Mistral, Nemotron
OVHcloud	`ovhcloud/`	Llama 3.x, DeepSeek R1, Mistral, Qwen 3 — European GPU cloud
Z.AI (GLM)	`zai/`	GLM-4.5 / 4.6 / 5 family by Zhipu AI (incl. vision + reasoning)
Moonshot AI	`moonshot/`	Kimi K2.5 / K2.6 / K2-thinking (long-context, multimodal)

Cloud platforms

Provider	Prefix	Highlights
Azure	`azure/`	OpenAI Service deployments + AI Foundry (Phi, Llama, Cohere, Mistral)
AWS Bedrock	`bedrock/`	Anthropic, OpenAI GPT-OSS, Llama, Mistral, Cohere, Nova, Titan, AI21
Google Vertex	`vertex_ai/`	Gemini + Anthropic Claude + Mistral + Llama on Vertex
Databricks	`databricks/`	Llama, Claude, Gemini, GPT-5 on Foundation Model APIs
IBM watsonx	`watsonx/`	Granite, Llama, Mistral on IBM Cloud (auto IAM-token exchange)
NVIDIA NIM	`nvidia_nim/`	Llama, Nemotron, Mixtral, Phi on `build.nvidia.com`

Gateways, local & custom

Provider	Prefix	Highlights
OpenRouter	`openrouter/`	Unified gateway over 300+ upstream models
HuggingFace	`huggingface/`	Hub Inference + Inference Endpoints (chat-completions API)
Ollama	`ollama/`	Local: Llama, Qwen, Gemma, DeepSeek-R1, Phi (no API key)
Custom	`custom/`	Any user-supplied OpenAI-compatible HTTP endpoint

Authentication

Every provider reads its key from a documented env var. You can also pass api_key= per-call to override.

Provider	Env var(s)	Notes
OpenAI	`OPENAI_API_KEY`
Anthropic	`ANTHROPIC_API_KEY`
Gemini	`GEMINI_API_KEY`	AI Studio key
Mistral	`MISTRAL_API_KEY`
Cohere	`COHERE_API_KEY`	v2 endpoints
DeepSeek	`DEEPSEEK_API_KEY`	direct API (`api.deepseek.com`)
xAI	`XAI_API_KEY`
Perplexity	`PERPLEXITY_API_KEY`
Groq	`GROQ_API_KEY`
Together AI	`TOGETHER_API_KEY`
Fireworks AI	`FIREWORKS_API_KEY`
Cerebras	`CEREBRAS_API_KEY`
SambaNova	`SAMBANOVA_API_KEY`
DeepInfra	`DEEPINFRA_API_KEY`
AI21	`AI21_API_KEY`	Jamba family
Nebius AI	`NEBIUS_API_KEY`
OVHcloud	`OVHCLOUD_API_KEY`	European AI Endpoints
Z.AI (GLM)	`ZAI_API_KEY`
Moonshot AI	`MOONSHOT_API_KEY`	clamp `temperature` to [0, 1]; multimodal arrays only on Kimi vision/video models
Azure	`AZURE_OPENAI_API_KEY`	+ `api_base` + `api_version` per call
AWS Bedrock	`AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`	SigV4-signed; honors `AWS_REGION_NAME` / `AWS_SESSION_TOKEN`
Vertex AI	OAuth (gcloud ADC)	falls back to `GOOGLE_APPLICATION_CREDENTIALS`
Databricks	`DATABRICKS_TOKEN`	+ `DATABRICKS_HOST`
IBM watsonx	`WATSONX_API_KEY`	raw IBM Cloud key (auto-exchanged for IAM JWT) or pre-exchanged JWT. Plus `WATSONX_URL` + `WATSONX_PROJECT_ID`
NVIDIA NIM	`NVIDIA_NIM_API_KEY`
OpenRouter	`OPENROUTER_API_KEY`	optional `OPENROUTER_REFERER` + `OPENROUTER_APP_NAME` for app attribution
HuggingFace	`HUGGINGFACE_API_KEY`	works against router or custom Inference Endpoint URL
Ollama	none	uses local `OLLAMA_API_BASE` (default `http://localhost:11434`)
Custom	user-supplied	pass `api_base=` plus optional `api_key=` / `extra_headers={...}`

Features

🛠️ Tool Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

response = arcllm.completion(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        print(f"Call: {tool_call.function.name}({tool_call.function.arguments})")

📋 Structured Output

response = arcllm.completion(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Generate a user profile"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "user_profile",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "interests": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["name", "age"]
            }
        }
    }
)

🖼️ Vision

response = arcllm.completion(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

📄 PDF input (Anthropic, Gemini)

response = arcllm.completion(
    model="anthropic/claude-haiku-4-5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "input_file", "file": {
                "data": pdf_base64, "media_type": "application/pdf"
            }},
            {"type": "text", "text": "Summarise this document"},
        ],
    }],
    max_tokens=512,
)

🧠 Reasoning models (thinking budget + reasoning effort)

# OpenAI o-series + GPT-5 hybrid: reasoning_effort
arcllm.completion(
    model="openai/o4-mini",
    messages=[{"role": "user", "content": "What is 7*8?"}],
    reasoning_effort="medium",
    max_completion_tokens=64,
)
# (passing temperature= here is dropped automatically with a warning —
#  o4-mini rejects temperature, and the capability table knows it)

# Anthropic Claude with extended thinking
arcllm.completion(
    model="anthropic/claude-opus-4-7",
    messages=[{"role": "user", "content": "Solve this hard problem"}],
    thinking_budget=2048,
    max_tokens=4096,
)

# Gemini 2.5+ with thinking config
arcllm.completion(
    model="gemini/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Solve"}],
    thinking_budget=1024,
    include_thoughts=True,
)

🔎 Citations from grounded providers

# Perplexity Sonar — search is implicit
response = arcllm.completion(
    model="perplexity/sonar-pro",
    messages=[{"role": "user", "content": "Latest news on small models?"}],
)
for c in response.choices[0].message.citations or []:
    print(f"{c.title or '(no title)'}: {c.url}")

# Anthropic + Gemini grounded responses populate the same field, sourced
# from `web_search_tool_result` blocks / `groundingMetadata` respectively.

🛡️ Built-in provider tools (pass-through)

# Anthropic web search + code execution
arcllm.completion(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Research arcllm and run a quick demo"}],
    tools=[
        {"type": "web_search_20250305", "name": "web_search"},
        {"type": "code_execution_20250825", "name": "code_execution"},
    ],
    max_tokens=1024,
)

# Gemini Google Search grounding
arcllm.completion(
    model="gemini/gemini-2.5-pro",
    messages=[{"role": "user", "content": "What happened in AI yesterday?"}],
    tools=[{"google_search": {}}],
)

📊 Embeddings

response = arcllm.embedding(
    model="text-embedding-3-small",
    input=["Hello world", "Goodbye world"]
)
print(f"Dimensions: {len(response.data[0].embedding)}")

🔁 Reranking

response = arcllm.rerank(
    model="cohere/rerank-v3.5",
    query="Who created the Python programming language?",
    documents=[
        "Linus Torvalds created the Linux kernel in 1991.",
        "Guido van Rossum created the Python programming language in 1991.",
        "Dennis Ritchie designed the C programming language at Bell Labs.",
    ],
    top_n=2,
)
for r in response.results:
    print(f"#{r.index}  score={r.relevance_score:.3f}  {r.document}")

arcllm.arerank(...) is the async equivalent. Cohere is the supported rerank provider; other adapters raise UnsupportedModelError when called through this surface.

🖼️ Image generation

# DALL-E 3 / gpt-image-1
img = arcllm.image_generation(
    model="openai/dall-e-3",
    prompt="a teal arc connecting two glowing endpoints, vector art",
    size="1024x1024",
    quality="standard",
)
print(img.data[0].url)

# Variation + edit (multipart) follow the same OpenAI shape
arcllm.image_variation(model="openai/dall-e-2", image=open("orig.png", "rb").read())
arcllm.image_edit(
    model="openai/gpt-image-1",
    image=open("orig.png", "rb").read(),
    mask=open("mask.png", "rb").read(),
    prompt="replace the sky with a starfield",
)

aimage_generation, aimage_variation, aimage_edit are async equivalents.

🔢 Token counting

n = arcllm.token_counter(
    model="gpt-4o",
    messages=[{"role": "user", "content": "How many tokens?"}],
)

Without extras it falls back to a chars / 4 heuristic and warns once. For exact counts on OpenAI-family models install with the tokenize extra:

pip install "arcllm-sdk[tokenize]"   # pulls in tiktoken

💰 Cost Tracking

response = arcllm.completion(model="gpt-4o", messages=messages)

# Calculate cost
cost = arcllm.completion_cost(response)
print(f"Cost: ${cost:.6f}")

# Or get per-token pricing
input_cost, output_cost = arcllm.cost_per_token(
    model="gpt-4o",
    prompt_tokens=1000,
    completion_tokens=500
)

🔍 Model capabilities

Pure-Python lookups against the bundled capability + pricing tables. No network calls.

# Boolean predicates
arcllm.supports_vision("gpt-4o")                          # True
arcllm.supports_pdf_input("claude-sonnet-4-5-20250929")   # True
arcllm.supports_tools("gemini-2.5-pro")                   # True
arcllm.supports_structured_output("gpt-4o")               # True
arcllm.supports_function_calling("openai/o4-mini")        # True (alias of supports_tools)

# Numbers + records
arcllm.get_max_tokens("gpt-4o")           # 16384
arcllm.get_model_pricing("gpt-4o")        # ModelPricing(input_cost_per_million=2.5, ...)
arcllm.get_model_info("gpt-4o")           # full dict (capabilities + pricing)

# Which OpenAI request params does this model accept?
arcllm.get_supported_openai_params("openai/o4-mini")
# -> ['messages', 'max_completion_tokens', 'reasoning_effort', 'tools', ...]
# (drops 'temperature' / 'top_p' / 'stop' for reasoning models that reject them)

Error Handling

from arcllm import (
    ArcLLMError,
    AuthenticationError,
    RateLimitError,
    TimeoutError,
)

try:
    response = arcllm.completion(model="gpt-4o", messages=messages)
except AuthenticationError:
    print("Check your API key")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except TimeoutError:
    print("Request timed out")
except ArcLLMError as e:
    print(f"Error: {e.message}")

Configuration

# Per-request configuration
response = arcllm.completion(
    model="gpt-4o",
    messages=messages,
    api_key="sk-...",           # Override API key
    api_base="https://...",     # Custom endpoint
    timeout=120.0,              # Request timeout
    max_retries=5,              # Retry count
)

# Azure OpenAI
response = arcllm.completion(
    model="azure/my-deployment",
    messages=messages,
    api_base="https://myresource.openai.azure.com",
    api_version="2024-10-21",
)

Documentation

Maintained by

Dynamiq AI. Issues and pull requests welcome.

Why "Arc"?

An arc is the shortest path between two points. ArcLLM is the shortest path between your code and any LLM provider—minimal, direct, efficient.

License

Apache 2.0 - see LICENSE

_{Built with ❤️ for developers who value simplicity}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.6

May 6, 2026

0.4.5

May 6, 2026

This version

0.4.4

May 6, 2026

0.4.3

May 6, 2026

0.4.2

May 6, 2026

0.4.1

May 6, 2026

0.4.0

May 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcllm_sdk-0.4.4.tar.gz (262.0 kB view details)

Uploaded May 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arcllm_sdk-0.4.4-py3-none-any.whl (136.6 kB view details)

Uploaded May 6, 2026 Python 3

File details

Details for the file arcllm_sdk-0.4.4.tar.gz.

File metadata

Download URL: arcllm_sdk-0.4.4.tar.gz
Upload date: May 6, 2026
Size: 262.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arcllm_sdk-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`11baae25a6c7b81e1ca871e3bd33544a801d3578fedfff33430db65c6cacda73`
MD5	`71d96fb37924185a1b09741820710cf7`
BLAKE2b-256	`5a9460d701a370d5c64093ced79d3a80a91a972d19040ed1f45f7efc7954a80f`

See more details on using hashes here.

File details

Details for the file arcllm_sdk-0.4.4-py3-none-any.whl.

File metadata

Download URL: arcllm_sdk-0.4.4-py3-none-any.whl
Upload date: May 6, 2026
Size: 136.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arcllm_sdk-0.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e1231e93848c8f0258d81c37af9aa6e71521af8486bea133f8b8835939c01440`
MD5	`bfe8ec84c60466e298ced29451ae0496`
BLAKE2b-256	`824d0f85958eab7b2ff8d6b092b8551cd6746cc8b77e17d4343df3c26591afec`

See more details on using hashes here.

arcllm-sdk 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The arc connecting you to every LLM

Why ArcLLM?

Installation

Quick Start

Streaming

Async

Different providers

Supported providers

First-party APIs

Cloud platforms

Gateways, local & custom

Authentication

Features

🛠️ Tool Calling

📋 Structured Output

🖼️ Vision

📄 PDF input (Anthropic, Gemini)

🧠 Reasoning models (thinking budget + reasoning effort)

🔎 Citations from grounded providers

🛡️ Built-in provider tools (pass-through)

📊 Embeddings

🔁 Reranking

🖼️ Image generation

🔢 Token counting

💰 Cost Tracking

🔍 Model capabilities

Error Handling

Configuration

Documentation

Maintained by

Why "Arc"?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes