Skip to main content

One tiny model, every LLM API. Drop-in test server for OpenAI, Anthropic, Bedrock, and Vertex.

Project description

LLM Katan

One tiny model, every LLM API. A lightweight server that exposes real provider API formats (OpenAI, Anthropic, Vertex AI, AWS Bedrock, Azure OpenAI) backed by a single local model or an echo backend. Built for testing AI gateways, API translation layers, and multi-provider routing without burning API keys or cloud credits.

Katan means "small" in Hebrew.

Features

  • Multi-Provider — OpenAI, Anthropic, Vertex AI, AWS Bedrock (all 8 model families), Azure OpenAI
  • Real Inference — runs actual tiny models (Qwen3-0.6B) via HuggingFace transformers or vLLM
  • Echo Mode — instant startup, no model download, no GPU, no torch dependency
  • Auth Validation — each provider requires its native auth header
  • Streaming — all providers support SSE streaming in their native format
  • Live Dashboard — real-time WebSocket-powered view of every request/response at /dashboard
  • Prometheus Metrics — request counts, token usage, latency at /metrics
  • 192 Tests — extensive coverage for every provider, format, and edge case

Quick Start

pip install llm-katan

# Echo mode (instant, no dependencies)
llm-katan --model my-test-model --backend echo --providers openai,anthropic,vertexai,bedrock,azure_openai

# Real model (needs torch + transformers)
llm-katan --model Qwen/Qwen3-0.6B --providers openai,anthropic,vertexai,bedrock,azure_openai

Then open http://localhost:8000/dashboard to watch requests flow through in real-time.

How It Works

The server does not proxy to real providers. Each provider is a formatting layer around the same backend:

Request (any provider format)
       |
Provider (openai / anthropic / vertexai / bedrock / azure_openai)
  - Parses provider-specific request
  - Extracts: messages, max_tokens, temperature
       |
Backend (echo or real model)
  - Generates text (or echoes request metadata)
       |
Provider (same one)
  - Formats response in provider's native format
  - Returns to client

No translation chain, no SDK calls, no cloud API costs.

Supported Providers

OpenAI (--providers openai)

  • POST /v1/chat/completions — Auth: Authorization: Bearer <key>
  • GET /v1/models

Anthropic (--providers anthropic)

  • POST /v1/messages — Auth: x-api-key: <key>

Vertex AI / Gemini (--providers vertexai)

  • POST /v1beta/models/{model}:generateContent — Auth: Authorization: Bearer <token>
  • POST /v1beta/models/{model}:streamGenerateContent

AWS Bedrock (--providers bedrock)

  • POST /model/{modelId}/converse — Auth: Authorization: AWS4-HMAC-SHA256 <sig>
  • POST /model/{modelId}/converse-stream
  • POST /model/{modelId}/invoke — auto-detects model family:
Family Model ID Prefix Request Format
Anthropic Claude anthropic.* messages[], max_tokens, system
Amazon Nova amazon.nova* messages[].content[].text, inferenceConfig
Amazon Titan amazon.titan* inputText, textGenerationConfig
Meta Llama meta.llama* prompt, max_gen_len
Cohere Command cohere.* message, chat_history[]
Mistral mistral.* prompt, max_tokens
DeepSeek deepseek.* prompt, max_tokens
AI21 Jamba ai21.* messages[] (OpenAI-like)

Azure OpenAI (--providers azure_openai)

  • POST /openai/deployments/{id}/chat/completions — Auth: api-key: <key>

Shared endpoints (no auth)

  • GET / — server info
  • GET /health — health check
  • GET /metrics — Prometheus metrics
  • GET /dashboard — live request/response dashboard
  • GET /docs — Swagger UI

Example Requests

# OpenAI
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer test-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

# Anthropic
curl -X POST http://localhost:8000/v1/messages \
  -H "x-api-key: test-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

# Vertex AI
curl -X POST http://localhost:8000/v1beta/models/gemini-pro:generateContent \
  -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"role":"user","parts":[{"text":"Hello"}]}]}'

# Bedrock Converse
curl -X POST http://localhost:8000/model/anthropic.claude-v2/converse \
  -H "Authorization: AWS4-HMAC-SHA256 Credential=test" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":[{"text":"Hello"}]}]}'

# Azure OpenAI
curl -X POST "http://localhost:8000/openai/deployments/gpt-4/chat/completions?api-version=2024-10-21" \
  -H "api-key: test-key" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}]}'

CLI Options

llm-katan [OPTIONS]

Required:
  -m, --model TEXT              Model name (or any string in echo mode)

Optional:
  -b, --backend [transformers|vllm|echo]  Backend (default: transformers)
  --providers TEXT              Comma-separated providers (default: openai)
  -p, --port INTEGER            Port (default: 8000)
  -n, --served-model-name TEXT  Model name in API responses
  --max-tokens INTEGER          Max tokens (default: 512)
  -t, --temperature FLOAT       Temperature (default: 0.7)
  -d, --device [auto|cpu|cuda]  Device (default: auto)
  --quantize/--no-quantize      CPU int8 quantization (default: enabled)
  --max-concurrent INTEGER      Concurrent requests (default: 1)
  --log-level [debug|info|warning|error]  Log level (default: INFO)

Development

git clone https://github.com/yossiovadia/llm-katan.git
cd llm-katan
pip install -e ".[dev]"
pytest tests/ -v

License

Apache-2.0


Created by Yossi Ovadia

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_katan-0.12.1.tar.gz (54.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_katan-0.12.1-py3-none-any.whl (71.1 kB view details)

Uploaded Python 3

File details

Details for the file llm_katan-0.12.1.tar.gz.

File metadata

  • Download URL: llm_katan-0.12.1.tar.gz
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for llm_katan-0.12.1.tar.gz
Algorithm Hash digest
SHA256 13f2e651fdeea2c4a9b8d49eb41d4c21b42de4be0673b9d04e2350e6ff078c1d
MD5 d2d27307b9c1c41036f560a53c9bc9ab
BLAKE2b-256 752940f6833a68a93904e0af9f4cfe379782968600004a1dcf1d88f6670aec8c

See more details on using hashes here.

File details

Details for the file llm_katan-0.12.1-py3-none-any.whl.

File metadata

  • Download URL: llm_katan-0.12.1-py3-none-any.whl
  • Upload date:
  • Size: 71.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for llm_katan-0.12.1-py3-none-any.whl
Algorithm Hash digest
SHA256 becde406cc2d222cca0a8cbb9c8f3f2cd8eea081d26c3e17b9bc3538f60a5ecd
MD5 1fbbf8fceb8ce656a1263fdc81fca1c2
BLAKE2b-256 a2bb143c3fe093a1755da5ebee31aa3c2bb77ac69c68a2d3c6c114158d3d5505

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page