Skip to main content

Plug-and-play LLM connector via YAML config

Project description

llmgate

License: MIT Python 3.10+ GitHub stars

PyPI version Downloads

Plug-and-play LLM connector via YAML config. One interface, 21 providers, zero bloat.

Why llmgate?

You've probably seen LiteLLM. It's great — if you want a proxy server, Redis, PostgreSQL, a dashboard, and 50+ transitive dependencies. If you just want to call an LLM from Python without installing a framework, there's nothing lightweight out there.

llmgate is the opposite: pip install llmgt pulls in exactly two dependencies (httpx + pyyaml). Drop a YAML file in your project, set your API key, and call any model. No proxy server, no database, no SDK lock-in — just a Python library that reads a config and makes HTTP calls. Swap providers by changing one line in your YAML.

llmgate LiteLLM
Install size ~2 MB ~200 MB+
Dependencies 2 (httpx, pyyaml) 50+
Architecture Library (import it) Proxy server
Provider swap Change 1 line in YAML Change code
Latency overhead ~0 (direct HTTP) Proxy hop + DB logging

Note: The PyPI package is llmgt (pip install llmgt), but the import is llmgate.

Install

pip install llmgt

Optional extras:

pip install llmgt[aws]    # AWS Bedrock (boto3)
pip install llmgt[gcp]    # Google Vertex AI (google-auth)
pip install llmgt[dev]    # pytest + dev tools

Quickstart

  1. Create llmgate.yaml in your project:
provider: anthropic
model: claude-sonnet-4-20250514
api_key: ${ANTHROPIC_API_KEY}
temperature: 0.7
max_tokens: 1024
  1. Use it:
from llmgate import LLMGate

gate = LLMGate()
response = gate.chat("Explain transformers in one sentence")
print(response.text)
print(response.tokens_used)

# Streaming
for chunk in gate.stream("Write a haiku"):
    print(chunk, end="", flush=True)

System Prompts & Multi-Turn

For simple prompts use chat(). For system prompts or conversation history, use chat_messages() with the full messages list:

response = gate.chat_messages([
    {"role": "system", "content": "You are a helpful coding assistant. Be concise."},
    {"role": "user", "content": "What's a closure?"},
])
print(response.text)

Multi-Turn Conversations

Build up conversation history and pass it in:

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What's a derivative?"},
]

response = gate.chat_messages(messages)
print(response.text)

# Continue the conversation
messages.append({"role": "assistant", "content": response.text})
messages.append({"role": "user", "content": "Can you give me an example?"})

response = gate.chat_messages(messages)
print(response.text)

Streaming works with full message lists too:

for chunk in gate.stream_messages(messages):
    print(chunk, end="", flush=True)

Multi-Profile Config

active_profile: smart

defaults:
  temperature: 0.7
  max_tokens: 1024

profiles:
  smart:
    provider: anthropic
    model: claude-sonnet-4-20250514
    api_key: ${ANTHROPIC_API_KEY}

  fast:
    provider: groq
    model: llama-3.1-8b-instant
    api_key: ${GROQ_API_KEY}

  cheap:
    provider: deepseek
    model: deepseek-chat
    api_key: ${DEEPSEEK_API_KEY}

  local:
    provider: ollama
    model: llama3.2

Hot-swap profiles at runtime:

gate = LLMGate()                          # uses "smart" profile
gate.switch("fast")                       # swap to Groq
response = gate.chat("Hello", temperature=0.2)  # call-time overrides

Loading API Keys from .env

llmgate resolves ${ENV_VAR} from os.environ. To load keys from a .env file, use python-dotenv:

pip install python-dotenv
from dotenv import load_dotenv
load_dotenv()  # loads .env into os.environ

from llmgate import LLMGate
gate = LLMGate()  # now ${ANTHROPIC_API_KEY} etc. will resolve

Or use a .env file:

ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk_...
OPENAI_API_KEY=sk-...

See .env.example in the repo for all supported variables.

Environment Variable Interpolation

Any string value in the YAML can use ${ENV_VAR} syntax — not just api_key:

api_key: ${MY_API_KEY}
base_url: ${CUSTOM_ENDPOINT}
nested:
  deep:
    value: ${SOME_SECRET}

Variables are resolved from os.environ at load time. Missing vars resolve to empty string.

Supported Providers

Provider Example Models Env Var Streaming Notes
openai gpt-4o, gpt-4-turbo OPENAI_API_KEY
anthropic claude-sonnet-4-20250514, claude-opus-4-20250514 ANTHROPIC_API_KEY
gemini gemini-1.5-pro, gemini-1.5-flash GEMINI_API_KEY
cohere command-r-plus, command-r COHERE_API_KEY
groq llama-3.1-8b-instant, mixtral-8x7b GROQ_API_KEY OpenAI-compatible
mistral mistral-large, mistral-small MISTRAL_API_KEY OpenAI-compatible
openrouter meta-llama/llama-3.1-70b-instruct OPENROUTER_API_KEY OpenAI-compatible
together meta-llama/Llama-3-70b-chat-hf TOGETHER_API_KEY OpenAI-compatible
fireworks accounts/fireworks/models/llama-v3-70b FIREWORKS_API_KEY OpenAI-compatible
perplexity llama-3.1-sonar-large-128k PERPLEXITY_API_KEY OpenAI-compatible
deepseek deepseek-chat, deepseek-coder DEEPSEEK_API_KEY OpenAI-compatible
xai grok-2, grok-beta XAI_API_KEY OpenAI-compatible
ai21 jamba-1.5-large, jamba-1.5-mini AI21_API_KEY OpenAI-compatible
azure_openai gpt-4o (via deployment) AZURE_OPENAI_API_KEY See Azure setup
bedrock anthropic.claude-3, amazon.titan AWS credentials See Bedrock setup
vertexai gemini-1.5-pro (via Vertex) GCP ADC See Vertex setup
huggingface mistralai/Mixtral-8x7B-Instruct-v0.1 HUGGINGFACE_API_KEY Auto-detects chat models
replicate meta/llama-2-70b-chat REPLICATE_API_KEY Polling-based
nlpcloud chatdolphin, finetuned-llama-3 NLPCLOUD_API_KEY
ollama llama3.2, mistral, codellama none Local
lmstudio any GGUF model none Local, OpenAI-compatible

Providers marked ❌ for streaming will return the full response as a single chunk when you call stream().

Error Handling

llmgate raises standard exceptions you can catch:

import httpx
from llmgate import LLMGate

gate = LLMGate()

try:
    response = gate.chat("Hello")
except FileNotFoundError:
    # llmgate.yaml not found
    print("Create a llmgate.yaml config file first")
except ValueError as e:
    # Bad config: unknown provider, missing profile, missing 'provider' field
    print(f"Config error: {e}")
except httpx.HTTPStatusError as e:
    # API returned an error (401 unauthorized, 429 rate limited, 500 server error, etc.)
    print(f"API error {e.response.status_code}: {e.response.text}")
except httpx.ConnectError:
    # Can't reach the API (network issue, wrong base_url, Ollama not running)
    print("Connection failed — check your network or base_url")
except httpx.TimeoutException:
    # Request took longer than 60 seconds
    print("Request timed out")
except ImportError as e:
    # Missing optional dependency (boto3 for Bedrock, google-auth for Vertex)
    print(f"Missing dependency: {e}")

All API errors come through as httpx.HTTPStatusError with the full response body available at e.response.text — useful for debugging rate limits, auth issues, or quota problems.

Azure OpenAI Setup

profiles:
  azure:
    provider: azure_openai
    model: gpt-4o
    resource_name: my-azure-resource
    deployment_name: my-gpt4o-deployment
    api_version: "2024-02-01"
    api_key: ${AZURE_OPENAI_API_KEY}

AWS Bedrock Setup

pip install llmgt[aws]
profiles:
  aws:
    provider: bedrock
    model: anthropic.claude-3-sonnet-20240229-v1:0  # or amazon.titan-*, meta.*
    region: us-east-1

Requires AWS credentials configured via ~/.aws/credentials, env vars, or IAM role. Supports Anthropic Claude, Amazon Titan, and Meta Llama model families on Bedrock — detected automatically by model ID prefix.

Google Vertex AI Setup

pip install llmgt[gcp]
profiles:
  gcp:
    provider: vertexai
    model: gemini-1.5-pro
    project_id: my-gcp-project
    region: us-central1

Uses Google Application Default Credentials. Run gcloud auth application-default login or set GOOGLE_APPLICATION_CREDENTIALS.

LLMResponse

response = gate.chat("Hello")
response.text           # str — the generated text
response.model          # str — model name
response.provider       # str — provider name
response.tokens_used    # int | None — total tokens
response.finish_reason  # str | None — stop reason
response.raw            # dict — full API response

Async Support

Not yet — llmgate v0.1 is sync-only (httpx sync client). Async via httpx.AsyncClient is planned for v0.2. If this is blocking you, open an issue.

Contributing

git clone https://github.com/kesiee/llmgate.git
cd llmgate
pip install -e ".[dev]"
pytest

The codebase is intentionally simple. Provider files live in llmgate/providers/. OpenAI-compatible providers inherit from OpenAIProvider and only override BASE_URL + headers. Custom providers implement send() and stream() directly.

To add a new provider:

  1. Create llmgate/providers/yourprovider.py — inherit from BaseProvider (or OpenAIProvider if compatible)
  2. Add it to PROVIDER_REGISTRY in llmgate/gate.py
  3. Add a test and update this README

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmgt-0.1.0.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmgt-0.1.0-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file llmgt-0.1.0.tar.gz.

File metadata

  • Download URL: llmgt-0.1.0.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llmgt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d6eeb62e96f2f6256686657bab3c5c48f3df11bc1c2d609d078700fe478b06ca
MD5 26b4f1ead505be15679f81de57c39b71
BLAKE2b-256 dd5d3951cf0b8d7c50d793d34496329c56940c9bee386f5fa4478273fbb3d867

See more details on using hashes here.

File details

Details for the file llmgt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmgt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llmgt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 be78a4dc09254391ae0d323e04d17514f22d79ba1834d4f1691251ddccf465ff
MD5 77accf9c6fcf09460762ab48607d1591
BLAKE2b-256 b079b6cb0e8f98cc936d9f8139bec02faad2cef2702da52118b73c61c17ecef2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page