One tiny model, every LLM API. Drop-in test server for OpenAI, Anthropic, Bedrock, and Vertex.
Project description
LLM Katan
One tiny model, every LLM API. A lightweight server that exposes real provider API formats (OpenAI, Anthropic, Vertex AI, AWS Bedrock, Azure OpenAI) backed by a single local model or an echo backend. Built for testing AI gateways, API translation layers, and multi-provider routing without burning API keys or cloud credits.
Katan means "small" in Hebrew.
Features
- Multi-Provider — OpenAI, Anthropic, Vertex AI, AWS Bedrock (all 8 model families), Azure OpenAI
- Real Inference — runs actual tiny models (Qwen3-0.6B) via HuggingFace transformers or vLLM
- Echo Mode — instant startup, no model download, no GPU, no torch dependency
- Auth Validation — each provider requires its native auth header
- Streaming — all providers support SSE streaming in their native format
- Live Dashboard — real-time WebSocket-powered view of every request/response at
/dashboard - Prometheus Metrics — request counts, token usage, latency at
/metrics - 192 Tests — extensive coverage for every provider, format, and edge case
Quick Start
pip install llm-katan
# Echo mode (instant, no dependencies)
llm-katan --model my-test-model --backend echo --providers openai,anthropic,vertexai,bedrock,azure_openai
# Real model (needs torch + transformers)
llm-katan --model Qwen/Qwen3-0.6B --providers openai,anthropic,vertexai,bedrock,azure_openai
Then open http://localhost:8000/dashboard to watch requests flow through in real-time.
How It Works
The server does not proxy to real providers. Each provider is a formatting layer around the same backend:
Request (any provider format)
|
Provider (openai / anthropic / vertexai / bedrock / azure_openai)
- Parses provider-specific request
- Extracts: messages, max_tokens, temperature
|
Backend (echo or real model)
- Generates text (or echoes request metadata)
|
Provider (same one)
- Formats response in provider's native format
- Returns to client
No translation chain, no SDK calls, no cloud API costs.
Supported Providers
OpenAI (--providers openai)
POST /v1/chat/completions— Auth:Authorization: Bearer <key>GET /v1/models
Anthropic (--providers anthropic)
POST /v1/messages— Auth:x-api-key: <key>
Vertex AI / Gemini (--providers vertexai)
POST /v1beta/models/{model}:generateContent— Auth:Authorization: Bearer <token>POST /v1beta/models/{model}:streamGenerateContent
AWS Bedrock (--providers bedrock)
POST /model/{modelId}/converse— Auth:Authorization: AWS4-HMAC-SHA256 <sig>POST /model/{modelId}/converse-streamPOST /model/{modelId}/invoke— auto-detects model family:
| Family | Model ID Prefix | Request Format |
|---|---|---|
| Anthropic Claude | anthropic.* |
messages[], max_tokens, system |
| Amazon Nova | amazon.nova* |
messages[].content[].text, inferenceConfig |
| Amazon Titan | amazon.titan* |
inputText, textGenerationConfig |
| Meta Llama | meta.llama* |
prompt, max_gen_len |
| Cohere Command | cohere.* |
message, chat_history[] |
| Mistral | mistral.* |
prompt, max_tokens |
| DeepSeek | deepseek.* |
prompt, max_tokens |
| AI21 Jamba | ai21.* |
messages[] (OpenAI-like) |
Azure OpenAI (--providers azure_openai)
POST /openai/deployments/{id}/chat/completions— Auth:api-key: <key>
Shared endpoints (no auth)
GET /— server infoGET /health— health checkGET /metrics— Prometheus metricsGET /dashboard— live request/response dashboardGET /docs— Swagger UI
Example Requests
# OpenAI
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer test-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'
# Anthropic
curl -X POST http://localhost:8000/v1/messages \
-H "x-api-key: test-key" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{"model":"claude-sonnet","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'
# Vertex AI
curl -X POST http://localhost:8000/v1beta/models/gemini-pro:generateContent \
-H "Authorization: Bearer test-token" \
-H "Content-Type: application/json" \
-d '{"contents":[{"role":"user","parts":[{"text":"Hello"}]}]}'
# Bedrock Converse
curl -X POST http://localhost:8000/model/anthropic.claude-v2/converse \
-H "Authorization: AWS4-HMAC-SHA256 Credential=test" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":[{"text":"Hello"}]}]}'
# Azure OpenAI
curl -X POST "http://localhost:8000/openai/deployments/gpt-4/chat/completions?api-version=2024-10-21" \
-H "api-key: test-key" \
-H "Content-Type: application/json" \
-d '{"messages":[{"role":"user","content":"Hello"}]}'
CLI Options
llm-katan [OPTIONS]
Required:
-m, --model TEXT Model name (or any string in echo mode)
Optional:
-b, --backend [transformers|vllm|echo] Backend (default: transformers)
--providers TEXT Comma-separated providers (default: openai)
-p, --port INTEGER Port (default: 8000)
-n, --served-model-name TEXT Model name in API responses
--max-tokens INTEGER Max tokens (default: 512)
-t, --temperature FLOAT Temperature (default: 0.7)
-d, --device [auto|cpu|cuda] Device (default: auto)
--quantize/--no-quantize CPU int8 quantization (default: enabled)
--max-concurrent INTEGER Concurrent requests (default: 1)
--log-level [debug|info|warning|error] Log level (default: INFO)
Development
git clone https://github.com/yossiovadia/llm-katan.git
cd llm-katan
pip install -e ".[dev]"
pytest tests/ -v
License
Apache-2.0
Created by Yossi Ovadia
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_katan-0.13.0.tar.gz.
File metadata
- Download URL: llm_katan-0.13.0.tar.gz
- Upload date:
- Size: 55.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea5fbbdf5d6414ef9585ee91722af312277012be75376dfcb219a66a0dd2fcdb
|
|
| MD5 |
7bc9ba652b1935193863204261b4e8cb
|
|
| BLAKE2b-256 |
bd2ac966f2e6b19f3a35036ad7caccf7e8a93bb9733a1ff396ddc635489a986d
|
File details
Details for the file llm_katan-0.13.0-py3-none-any.whl.
File metadata
- Download URL: llm_katan-0.13.0-py3-none-any.whl
- Upload date:
- Size: 72.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be776fa8fca77bdeb5c1dfb7a6f49738d3b6928505b791f8d1ce341019bb0772
|
|
| MD5 |
868a06bd787d703121ee53ab71cc70f7
|
|
| BLAKE2b-256 |
5948886133bb7e980773995a36d259ff3aa16711b8acc586824984c329d39196
|