Skip to main content

The arc connecting you to every LLM. Minimal dependencies, maximum performance.

Project description

arcllm

The arc connecting you to every LLM

Minimal dependencies. Maximum performance. One unified API.

PyPI Python License CI

InstallationQuick StartProvidersFeaturesDocs


Why ArcLLM?

ArcLLM ships a single unified, OpenAI-compatible surface across every major LLM provider with a tightly curated runtime footprint:

  • 4 runtime deps: httpx[http2], aiohttp, msgspec, orjson — all chosen for raw speed.
  • OpenAI-compatible API so existing client code keeps working.
  • Sync + async, streaming, tools, structured output, vision, embeddings in one library.
  • Built-in cost + capability tracking for every supported model.

Built for developers who want speed, simplicity, and reliability when working with LLMs.

Installation

pip install arcllm-sdk

Quick Start

import arcllm

# Simple completion
response = arcllm.completion(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

Streaming

stream = arcllm.completion(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Async

response = await arcllm.acompletion(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

Different providers

# OpenAI
arcllm.completion(model="gpt-4o", messages=messages)

# Anthropic
arcllm.completion(model="anthropic/claude-sonnet-4-5", messages=messages)

# Google Gemini
arcllm.completion(model="gemini/gemini-2.5-pro", messages=messages)

# Groq (ultra-fast inference)
arcllm.completion(model="groq/llama-3.3-70b-versatile", messages=messages)

# Together AI / Fireworks (open-weight flagships: Llama 4, Qwen 3, DeepSeek, Kimi, GLM, MiniMax)
arcllm.completion(model="together_ai/meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8", messages=messages)
arcllm.completion(model="fireworks_ai/accounts/fireworks/models/deepseek-v4-pro", messages=messages)

# Local with Ollama
arcllm.completion(model="ollama/llama3.3", messages=messages)

Supported providers

28 providers, grouped by surface. The model prefix you pass to arcllm.completion(model=...) is shown in the Prefix column.

First-party APIs

Provider Prefix Highlights
OpenAI openai/ GPT-5 family, GPT-4.1, GPT-4o, o-series reasoning, embeddings
Anthropic anthropic/ Claude Opus 4.7, Sonnet 4.6, Haiku 4.5 (extended thinking)
Google Gemini gemini/ Gemini 2.5 / 3.x with thinking config
Mistral mistral/ Mistral Large/Medium/Small, Codestral, Pixtral, embeddings
Cohere cohere/ Command A/R+/R, Aya Vision, Embed v4, Rerank v3.5
DeepSeek deepseek/ DeepSeek V4 Flash + Pro (chat + reasoner)
xAI xai/ Grok-4 / 4.1 / 4.20 / 4.3 family + Grok-3 (legacy)
Perplexity perplexity/ Sonar, Sonar Pro, Sonar Reasoning, Deep Research
Groq groq/ Llama 3/4, GPT-OSS, Qwen 3 (LPU low-latency)
Together AI together_ai/ Llama 4, Qwen 3, DeepSeek V4, Kimi, GLM, MiniMax
Fireworks AI fireworks_ai/ DeepSeek V4 Pro, Kimi K2, GLM 5.1, Llama, Qwen
Cerebras cerebras/ Llama 3.x, Qwen 3, GPT-OSS on CS-3 wafer-scale
SambaNova sambanova/ Llama 3.x / Llama 4, DeepSeek, MiniMax on RDU
DeepInfra deepinfra/ Full open-weights catalog: Llama, Qwen, DeepSeek, Phi, Gemma, Kimi
AI21 ai21/ Jamba 1.5 Large + Mini
Nebius AI nebius/ Llama 3.x, Qwen 2.5/3, DeepSeek R1/V3, Mistral, Nemotron
OVHcloud ovhcloud/ Llama 3.x, DeepSeek R1, Mistral, Qwen 3 — European GPU cloud
Z.AI (GLM) zai/ GLM-4.5 / 4.6 / 5 family by Zhipu AI (incl. vision + reasoning)
Moonshot AI moonshot/ Kimi K2.5 / K2.6 / K2-thinking (long-context, multimodal)

Cloud platforms

Provider Prefix Highlights
Azure azure/ OpenAI Service deployments + AI Foundry (Phi, Llama, Cohere, Mistral)
AWS Bedrock bedrock/ Anthropic, OpenAI GPT-OSS, Llama, Mistral, Cohere, Nova, Titan, AI21
Google Vertex vertex_ai/ Gemini + Anthropic Claude + Mistral + Llama on Vertex
Databricks databricks/ Llama, Claude, Gemini, GPT-5 on Foundation Model APIs
IBM watsonx watsonx/ Granite, Llama, Mistral on IBM Cloud (auto IAM-token exchange)
NVIDIA NIM nvidia_nim/ Llama, Nemotron, Mixtral, Phi on build.nvidia.com

Gateways, local & custom

Provider Prefix Highlights
OpenRouter openrouter/ Unified gateway over 300+ upstream models
HuggingFace huggingface/ Hub Inference + Inference Endpoints (chat-completions API)
Ollama ollama/ Local: Llama, Qwen, Gemma, DeepSeek-R1, Phi (no API key)
Custom custom/ Any user-supplied OpenAI-compatible HTTP endpoint

Authentication

Every provider reads its key from a documented env var. You can also pass api_key= per-call to override.

Provider Env var(s) Notes
OpenAI OPENAI_API_KEY
Anthropic ANTHROPIC_API_KEY
Gemini GEMINI_API_KEY AI Studio key
Mistral MISTRAL_API_KEY
Cohere COHERE_API_KEY v2 endpoints
DeepSeek DEEPSEEK_API_KEY direct API (api.deepseek.com)
xAI XAI_API_KEY
Perplexity PERPLEXITY_API_KEY
Groq GROQ_API_KEY
Together AI TOGETHER_API_KEY
Fireworks AI FIREWORKS_API_KEY
Cerebras CEREBRAS_API_KEY
SambaNova SAMBANOVA_API_KEY
DeepInfra DEEPINFRA_API_KEY
AI21 AI21_API_KEY Jamba family
Nebius AI NEBIUS_API_KEY
OVHcloud OVHCLOUD_API_KEY European AI Endpoints
Z.AI (GLM) ZAI_API_KEY
Moonshot AI MOONSHOT_API_KEY clamp temperature to [0, 1]; multimodal arrays only on Kimi vision/video models
Azure AZURE_OPENAI_API_KEY + api_base + api_version per call
AWS Bedrock AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY SigV4-signed; honors AWS_REGION_NAME / AWS_SESSION_TOKEN
Vertex AI OAuth (gcloud ADC) falls back to GOOGLE_APPLICATION_CREDENTIALS
Databricks DATABRICKS_TOKEN + DATABRICKS_HOST
IBM watsonx WATSONX_API_KEY raw IBM Cloud key (auto-exchanged for IAM JWT) or pre-exchanged JWT. Plus WATSONX_URL + WATSONX_PROJECT_ID
NVIDIA NIM NVIDIA_NIM_API_KEY
OpenRouter OPENROUTER_API_KEY optional OPENROUTER_REFERER + OPENROUTER_APP_NAME for app attribution
HuggingFace HUGGINGFACE_API_KEY works against router or custom Inference Endpoint URL
Ollama none uses local OLLAMA_API_BASE (default http://localhost:11434)
Custom user-supplied pass api_base= plus optional api_key= / extra_headers={...}

Features

🛠️ Tool Calling

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}]

response = arcllm.completion(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

if response.choices[0].message.tool_calls:
    for tool_call in response.choices[0].message.tool_calls:
        print(f"Call: {tool_call.function.name}({tool_call.function.arguments})")

📋 Structured Output

response = arcllm.completion(
    model="gpt-5.4-mini",
    messages=[{"role": "user", "content": "Generate a user profile"}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "user_profile",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "interests": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["name", "age"]
            }
        }
    }
)

🖼️ Vision

response = arcllm.completion(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
    }]
)

📄 PDF input (Anthropic, Gemini)

response = arcllm.completion(
    model="anthropic/claude-haiku-4-5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "input_file", "file": {
                "data": pdf_base64, "media_type": "application/pdf"
            }},
            {"type": "text", "text": "Summarise this document"},
        ],
    }],
    max_tokens=512,
)

🧠 Reasoning models (thinking budget + reasoning effort)

# OpenAI o-series + GPT-5 hybrid: reasoning_effort
arcllm.completion(
    model="openai/o4-mini",
    messages=[{"role": "user", "content": "What is 7*8?"}],
    reasoning_effort="medium",
    max_completion_tokens=64,
)
# (passing temperature= here is dropped automatically with a warning —
#  o4-mini rejects temperature, and the capability table knows it)

# Anthropic Claude with extended thinking
arcllm.completion(
    model="anthropic/claude-opus-4-7",
    messages=[{"role": "user", "content": "Solve this hard problem"}],
    thinking_budget=2048,
    max_tokens=4096,
)

# Gemini 2.5+ with thinking config
arcllm.completion(
    model="gemini/gemini-2.5-pro",
    messages=[{"role": "user", "content": "Solve"}],
    thinking_budget=1024,
    include_thoughts=True,
)

🔎 Citations from grounded providers

# Perplexity Sonar — search is implicit
response = arcllm.completion(
    model="perplexity/sonar-pro",
    messages=[{"role": "user", "content": "Latest news on small models?"}],
)
for c in response.choices[0].message.citations or []:
    print(f"{c.title or '(no title)'}: {c.url}")

# Anthropic + Gemini grounded responses populate the same field, sourced
# from `web_search_tool_result` blocks / `groundingMetadata` respectively.

🛡️ Built-in provider tools (pass-through)

# Anthropic web search + code execution
arcllm.completion(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Research arcllm and run a quick demo"}],
    tools=[
        {"type": "web_search_20250305", "name": "web_search"},
        {"type": "code_execution_20250825", "name": "code_execution"},
    ],
    max_tokens=1024,
)

# Gemini Google Search grounding
arcllm.completion(
    model="gemini/gemini-2.5-pro",
    messages=[{"role": "user", "content": "What happened in AI yesterday?"}],
    tools=[{"google_search": {}}],
)

📊 Embeddings

response = arcllm.embedding(
    model="text-embedding-3-small",
    input=["Hello world", "Goodbye world"]
)
print(f"Dimensions: {len(response.data[0].embedding)}")

🔁 Reranking

response = arcllm.rerank(
    model="cohere/rerank-v3.5",
    query="Who created the Python programming language?",
    documents=[
        "Linus Torvalds created the Linux kernel in 1991.",
        "Guido van Rossum created the Python programming language in 1991.",
        "Dennis Ritchie designed the C programming language at Bell Labs.",
    ],
    top_n=2,
)
for r in response.results:
    print(f"#{r.index}  score={r.relevance_score:.3f}  {r.document}")

arcllm.arerank(...) is the async equivalent. Cohere is the supported rerank provider; other adapters raise UnsupportedModelError when called through this surface.

🖼️ Image generation

# DALL-E 3 / gpt-image-1
img = arcllm.image_generation(
    model="openai/dall-e-3",
    prompt="a teal arc connecting two glowing endpoints, vector art",
    size="1024x1024",
    quality="standard",
)
print(img.data[0].url)

# Variation + edit (multipart) follow the same OpenAI shape
arcllm.image_variation(model="openai/dall-e-2", image=open("orig.png", "rb").read())
arcllm.image_edit(
    model="openai/gpt-image-1",
    image=open("orig.png", "rb").read(),
    mask=open("mask.png", "rb").read(),
    prompt="replace the sky with a starfield",
)

aimage_generation, aimage_variation, aimage_edit are async equivalents.

🔢 Token counting

n = arcllm.token_counter(
    model="gpt-4o",
    messages=[{"role": "user", "content": "How many tokens?"}],
)

Without extras it falls back to a chars / 4 heuristic and warns once. For exact counts on OpenAI-family models install with the tokenize extra:

pip install "arcllm-sdk[tokenize]"   # pulls in tiktoken

💰 Cost Tracking

response = arcllm.completion(model="gpt-4o", messages=messages)

# Calculate cost
cost = arcllm.completion_cost(response)
print(f"Cost: ${cost:.6f}")

# Or get per-token pricing
input_cost, output_cost = arcllm.cost_per_token(
    model="gpt-4o",
    prompt_tokens=1000,
    completion_tokens=500
)

🔍 Model capabilities

Pure-Python lookups against the bundled capability + pricing tables. No network calls.

# Boolean predicates
arcllm.supports_vision("gpt-4o")                          # True
arcllm.supports_pdf_input("claude-sonnet-4-5-20250929")   # True
arcllm.supports_tools("gemini-2.5-pro")                   # True
arcllm.supports_structured_output("gpt-4o")               # True
arcllm.supports_function_calling("openai/o4-mini")        # True (alias of supports_tools)

# Numbers + records
arcllm.get_max_tokens("gpt-4o")           # 16384
arcllm.get_model_pricing("gpt-4o")        # ModelPricing(input_cost_per_million=2.5, ...)
arcllm.get_model_info("gpt-4o")           # full dict (capabilities + pricing)

# Which OpenAI request params does this model accept?
arcllm.get_supported_openai_params("openai/o4-mini")
# -> ['messages', 'max_completion_tokens', 'reasoning_effort', 'tools', ...]
# (drops 'temperature' / 'top_p' / 'stop' for reasoning models that reject them)

Error Handling

from arcllm import (
    ArcLLMError,
    AuthenticationError,
    RateLimitError,
    TimeoutError,
)

try:
    response = arcllm.completion(model="gpt-4o", messages=messages)
except AuthenticationError:
    print("Check your API key")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except TimeoutError:
    print("Request timed out")
except ArcLLMError as e:
    print(f"Error: {e.message}")

Configuration

# Per-request configuration
response = arcllm.completion(
    model="gpt-4o",
    messages=messages,
    api_key="sk-...",           # Override API key
    api_base="https://...",     # Custom endpoint
    timeout=120.0,              # Request timeout
    max_retries=5,              # Retry count
)

# Azure OpenAI
response = arcllm.completion(
    model="azure/my-deployment",
    messages=messages,
    api_base="https://myresource.openai.azure.com",
    api_version="2024-10-21",
)

Documentation

Maintained by

Dynamiq AI. Issues and pull requests welcome.

Why "Arc"?

An arc is the shortest path between two points. ArcLLM is the shortest path between your code and any LLM provider—minimal, direct, efficient.

License

Apache 2.0 - see LICENSE


Built with ❤️ for developers who value simplicity

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcllm_sdk-0.4.4.tar.gz (262.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arcllm_sdk-0.4.4-py3-none-any.whl (136.6 kB view details)

Uploaded Python 3

File details

Details for the file arcllm_sdk-0.4.4.tar.gz.

File metadata

  • Download URL: arcllm_sdk-0.4.4.tar.gz
  • Upload date:
  • Size: 262.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arcllm_sdk-0.4.4.tar.gz
Algorithm Hash digest
SHA256 11baae25a6c7b81e1ca871e3bd33544a801d3578fedfff33430db65c6cacda73
MD5 71d96fb37924185a1b09741820710cf7
BLAKE2b-256 5a9460d701a370d5c64093ced79d3a80a91a972d19040ed1f45f7efc7954a80f

See more details on using hashes here.

File details

Details for the file arcllm_sdk-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: arcllm_sdk-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 136.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arcllm_sdk-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e1231e93848c8f0258d81c37af9aa6e71521af8486bea133f8b8835939c01440
MD5 bfe8ec84c60466e298ced29451ae0496
BLAKE2b-256 824d0f85958eab7b2ff8d6b092b8551cd6746cc8b77e17d4343df3c26591afec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page