Plug-and-play LLM connector via YAML config

These details have not been verified by PyPI

Project links

Project description

llmgate

Plug-and-play LLM connector via YAML config. One interface, 21 providers, zero bloat.

Why llmgate?

You've probably seen LiteLLM. It's great — if you want a proxy server, Redis, PostgreSQL, a dashboard, and 50+ transitive dependencies. If you just want to call an LLM from Python without installing a framework, there's nothing lightweight out there.

llmgate is the opposite: pip install llmgt pulls in exactly two dependencies (httpx + pyyaml). Drop a YAML file in your project, set your API key, and call any model. No proxy server, no database, no SDK lock-in — just a Python library that reads a config and makes HTTP calls. Swap providers by changing one line in your YAML.

	llmgate	LiteLLM
Install size	~2 MB	~200 MB+
Dependencies	2 (`httpx`, `pyyaml`)	50+
Architecture	Library (import it)	Proxy server
Provider swap	Change 1 line in YAML	Change code
Latency overhead	~0 (direct HTTP)	Proxy hop + DB logging

Note: The PyPI package is llmgt (pip install llmgt), but the import is llmgate.

Install

pip install llmgt

Optional extras:

pip install llmgt[aws]    # AWS Bedrock (boto3)
pip install llmgt[gcp]    # Google Vertex AI (google-auth)
pip install llmgt[dev]    # pytest + dev tools

Quickstart

Create llmgate.yaml in your project:

provider: anthropic
model: claude-sonnet-4-20250514
api_key: ${ANTHROPIC_API_KEY}
temperature: 0.7
max_tokens: 1024

Use it:

from llmgate import LLMGate

gate = LLMGate()
response = gate.chat("Explain transformers in one sentence")
print(response.text)
print(response.tokens_used)

# Streaming
for chunk in gate.stream("Write a haiku"):
    print(chunk, end="", flush=True)

System Prompts & Multi-Turn

For simple prompts use chat(). For system prompts or conversation history, use chat_messages() with the full messages list:

response = gate.chat_messages([
    {"role": "system", "content": "You are a helpful coding assistant. Be concise."},
    {"role": "user", "content": "What's a closure?"},
])
print(response.text)

Multi-Turn Conversations

Build up conversation history and pass it in:

messages = [
    {"role": "system", "content": "You are a math tutor."},
    {"role": "user", "content": "What's a derivative?"},
]

response = gate.chat_messages(messages)
print(response.text)

# Continue the conversation
messages.append({"role": "assistant", "content": response.text})
messages.append({"role": "user", "content": "Can you give me an example?"})

response = gate.chat_messages(messages)
print(response.text)

Streaming works with full message lists too:

for chunk in gate.stream_messages(messages):
    print(chunk, end="", flush=True)

Multi-Profile Config

active_profile: smart

defaults:
  temperature: 0.7
  max_tokens: 1024

profiles:
  smart:
    provider: anthropic
    model: claude-sonnet-4-20250514
    api_key: ${ANTHROPIC_API_KEY}

  fast:
    provider: groq
    model: llama-3.1-8b-instant
    api_key: ${GROQ_API_KEY}

  cheap:
    provider: deepseek
    model: deepseek-chat
    api_key: ${DEEPSEEK_API_KEY}

  local:
    provider: ollama
    model: llama3.2

Hot-swap profiles at runtime:

gate = LLMGate()                          # uses "smart" profile
gate.switch("fast")                       # swap to Groq
response = gate.chat("Hello", temperature=0.2)  # call-time overrides

Loading API Keys from .env

llmgate resolves ${ENV_VAR} from os.environ. To load keys from a .env file, use python-dotenv:

pip install python-dotenv

from dotenv import load_dotenv
load_dotenv()  # loads .env into os.environ

from llmgate import LLMGate
gate = LLMGate()  # now ${ANTHROPIC_API_KEY} etc. will resolve

Or use a .env file:

ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk_...
OPENAI_API_KEY=sk-...

See .env.example in the repo for all supported variables.

Environment Variable Interpolation

Any string value in the YAML can use ${ENV_VAR} syntax — not just api_key:

api_key: ${MY_API_KEY}
base_url: ${CUSTOM_ENDPOINT}
nested:
  deep:
    value: ${SOME_SECRET}

Variables are resolved from os.environ at load time. Missing vars resolve to empty string.

Supported Providers

Provider	Example Models	Env Var	Streaming	Notes
`openai`	gpt-4o, gpt-4-turbo	`OPENAI_API_KEY`	✅
`anthropic`	claude-sonnet-4-20250514, claude-opus-4-20250514	`ANTHROPIC_API_KEY`	✅
`gemini`	gemini-1.5-pro, gemini-1.5-flash	`GEMINI_API_KEY`	✅
`cohere`	command-r-plus, command-r	`COHERE_API_KEY`	✅
`groq`	llama-3.1-8b-instant, mixtral-8x7b	`GROQ_API_KEY`	✅	OpenAI-compatible
`mistral`	mistral-large, mistral-small	`MISTRAL_API_KEY`	✅	OpenAI-compatible
`openrouter`	meta-llama/llama-3.1-70b-instruct	`OPENROUTER_API_KEY`	✅	OpenAI-compatible
`together`	meta-llama/Llama-3-70b-chat-hf	`TOGETHER_API_KEY`	✅	OpenAI-compatible
`fireworks`	accounts/fireworks/models/llama-v3-70b	`FIREWORKS_API_KEY`	✅	OpenAI-compatible
`perplexity`	llama-3.1-sonar-large-128k	`PERPLEXITY_API_KEY`	✅	OpenAI-compatible
`deepseek`	deepseek-chat, deepseek-coder	`DEEPSEEK_API_KEY`	✅	OpenAI-compatible
`xai`	grok-2, grok-beta	`XAI_API_KEY`	✅	OpenAI-compatible
`ai21`	jamba-1.5-large, jamba-1.5-mini	`AI21_API_KEY`	✅	OpenAI-compatible
`azure_openai`	gpt-4o (via deployment)	`AZURE_OPENAI_API_KEY`	✅	See Azure setup
`bedrock`	anthropic.claude-3, amazon.titan	AWS credentials	❌	See Bedrock setup
`vertexai`	gemini-1.5-pro (via Vertex)	GCP ADC	✅	See Vertex setup
`huggingface`	mistralai/Mixtral-8x7B-Instruct-v0.1	`HUGGINGFACE_API_KEY`	❌	Auto-detects chat models
`replicate`	meta/llama-2-70b-chat	`REPLICATE_API_KEY`	❌	Polling-based
`nlpcloud`	chatdolphin, finetuned-llama-3	`NLPCLOUD_API_KEY`	❌
`ollama`	llama3.2, mistral, codellama	none	✅	Local
`lmstudio`	any GGUF model	none	✅	Local, OpenAI-compatible

Providers marked ❌ for streaming will return the full response as a single chunk when you call stream().

Error Handling

llmgate raises standard exceptions you can catch:

import httpx
from llmgate import LLMGate

gate = LLMGate()

try:
    response = gate.chat("Hello")
except FileNotFoundError:
    # llmgate.yaml not found
    print("Create a llmgate.yaml config file first")
except ValueError as e:
    # Bad config: unknown provider, missing profile, missing 'provider' field
    print(f"Config error: {e}")
except httpx.HTTPStatusError as e:
    # API returned an error (401 unauthorized, 429 rate limited, 500 server error, etc.)
    print(f"API error {e.response.status_code}: {e.response.text}")
except httpx.ConnectError:
    # Can't reach the API (network issue, wrong base_url, Ollama not running)
    print("Connection failed — check your network or base_url")
except httpx.TimeoutException:
    # Request took longer than 60 seconds
    print("Request timed out")
except ImportError as e:
    # Missing optional dependency (boto3 for Bedrock, google-auth for Vertex)
    print(f"Missing dependency: {e}")

All API errors come through as httpx.HTTPStatusError with the full response body available at e.response.text — useful for debugging rate limits, auth issues, or quota problems.

Azure OpenAI Setup

profiles:
  azure:
    provider: azure_openai
    model: gpt-4o
    resource_name: my-azure-resource
    deployment_name: my-gpt4o-deployment
    api_version: "2024-02-01"
    api_key: ${AZURE_OPENAI_API_KEY}

AWS Bedrock Setup

pip install llmgt[aws]

profiles:
  aws:
    provider: bedrock
    model: anthropic.claude-3-sonnet-20240229-v1:0  # or amazon.titan-*, meta.*
    region: us-east-1

Requires AWS credentials configured via ~/.aws/credentials, env vars, or IAM role. Supports Anthropic Claude, Amazon Titan, and Meta Llama model families on Bedrock — detected automatically by model ID prefix.

Google Vertex AI Setup

pip install llmgt[gcp]

profiles:
  gcp:
    provider: vertexai
    model: gemini-1.5-pro
    project_id: my-gcp-project
    region: us-central1

Uses Google Application Default Credentials. Run gcloud auth application-default login or set GOOGLE_APPLICATION_CREDENTIALS.

LLMResponse

response = gate.chat("Hello")
response.text           # str — the generated text
response.model          # str — model name
response.provider       # str — provider name
response.tokens_used    # int | None — total tokens
response.finish_reason  # str | None — stop reason
response.raw            # dict — full API response

Async Support

Not yet — llmgate v0.1 is sync-only (httpx sync client). Async via httpx.AsyncClient is planned for v0.2. If this is blocking you, open an issue.

Contributing

git clone https://github.com/kesiee/llmgate.git
cd llmgate
pip install -e ".[dev]"
pytest

The codebase is intentionally simple. Provider files live in llmgate/providers/. OpenAI-compatible providers inherit from OpenAIProvider and only override BASE_URL + headers. Custom providers implement send() and stream() directly.

To add a new provider:

Create llmgate/providers/yourprovider.py — inherit from BaseProvider (or OpenAIProvider if compatible)
Add it to PROVIDER_REGISTRY in llmgate/gate.py
Add a test and update this README

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 9, 2026

0.1.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmgt-0.2.0.tar.gz (31.3 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmgt-0.2.0-py3-none-any.whl (35.8 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file llmgt-0.2.0.tar.gz.

File metadata

Download URL: llmgt-0.2.0.tar.gz
Upload date: Apr 9, 2026
Size: 31.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llmgt-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d53ca52dbd1b6f496dfc492f0084dc6edd814359ccee0f4562da447525d52675`
MD5	`945eadf4d639678fefa9fec69c7ec913`
BLAKE2b-256	`801e449d2c8513055b431cb94e0f0ac9586e4acfe88a3aa340d76ec2b5a5acf8`

See more details on using hashes here.

File details

Details for the file llmgt-0.2.0-py3-none-any.whl.

File metadata

Download URL: llmgt-0.2.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 35.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for llmgt-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba889a21577e2c37881644d6de768b4cc607b17939d141a2f255c6c45a323e29`
MD5	`dc47e36fa1ffababfbaaf4e9a4693303`
BLAKE2b-256	`e8f16399824f4fa01c2c040095aa02c25725dac833a27aa9985cfdfb0d4fcf58`

See more details on using hashes here.

llmgt 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

llmgate

Why llmgate?

Install

Quickstart

System Prompts & Multi-Turn

Multi-Turn Conversations

Multi-Profile Config

Loading API Keys from .env

Environment Variable Interpolation

Supported Providers

Error Handling

Azure OpenAI Setup

AWS Bedrock Setup

Google Vertex AI Setup

LLMResponse

Async Support

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes