One tiny model, every LLM API. Drop-in test server for OpenAI, Anthropic, Bedrock, and Vertex.

These details have not been verified by PyPI

Project links

Project description

LLM Katan

One tiny model, every LLM API. A lightweight server that exposes real provider API formats (OpenAI, Anthropic, Vertex AI, AWS Bedrock, Azure OpenAI) backed by a single local model or an echo backend. Built for testing AI gateways, API translation layers, and multi-provider routing without burning API keys or cloud credits.

Katan means "small" in Hebrew.

Features

Multi-Provider — OpenAI, Anthropic, Vertex AI, AWS Bedrock (all 8 model families), Azure OpenAI
Real Inference — runs actual tiny models (Qwen3-0.6B) via HuggingFace transformers or vLLM
Echo Mode — instant startup, no model download, no GPU, no torch dependency
Auth Validation — each provider requires its native auth header
Streaming — all providers support SSE streaming in their native format
Live Dashboard — real-time WebSocket-powered view of every request/response at /dashboard
Prometheus Metrics — request counts, token usage, latency at /metrics
192 Tests — extensive coverage for every provider, format, and edge case

Quick Start

pip install llm-katan

# Echo mode (instant, no dependencies)
llm-katan --model my-test-model --backend echo --providers openai,anthropic,vertexai,bedrock,azure_openai

# Real model (needs torch + transformers)
llm-katan --model Qwen/Qwen3-0.6B --providers openai,anthropic,vertexai,bedrock,azure_openai

Then open http://localhost:8000/dashboard to watch requests flow through in real-time.

How It Works

The server does not proxy to real providers. Each provider is a formatting layer around the same backend:

Request (any provider format)
       |
Provider (openai / anthropic / vertexai / bedrock / azure_openai)
  - Parses provider-specific request
  - Extracts: messages, max_tokens, temperature
       |
Backend (echo or real model)
  - Generates text (or echoes request metadata)
       |
Provider (same one)
  - Formats response in provider's native format
  - Returns to client

No translation chain, no SDK calls, no cloud API costs.

Supported Providers

OpenAI (--providers openai)

POST /v1/chat/completions — Auth: Authorization: Bearer <key>
GET /v1/models

Anthropic (--providers anthropic)

POST /v1/messages — Auth: x-api-key: <key>

Vertex AI / Gemini (--providers vertexai)

POST /v1beta/models/{model}:generateContent — Auth: Authorization: Bearer <token>
POST /v1beta/models/{model}:streamGenerateContent

AWS Bedrock (--providers bedrock)

POST /model/{modelId}/converse — Auth: Authorization: AWS4-HMAC-SHA256 <sig>
POST /model/{modelId}/converse-stream
POST /model/{modelId}/invoke — auto-detects model family:

Family	Model ID Prefix	Request Format
Anthropic Claude	`anthropic.*`	`messages[]`, `max_tokens`, `system`
Amazon Nova	`amazon.nova*`	`messages[].content[].text`, `inferenceConfig`
Amazon Titan	`amazon.titan*`	`inputText`, `textGenerationConfig`
Meta Llama	`meta.llama*`	`prompt`, `max_gen_len`
Cohere Command	`cohere.*`	`message`, `chat_history[]`
Mistral	`mistral.*`	`prompt`, `max_tokens`
DeepSeek	`deepseek.*`	`prompt`, `max_tokens`
AI21 Jamba	`ai21.*`	`messages[]` (OpenAI-like)

Azure OpenAI (--providers azure_openai)

POST /openai/deployments/{id}/chat/completions — Auth: api-key: <key>

Shared endpoints (no auth)

GET / — server info
GET /health — health check
GET /metrics — Prometheus metrics
GET /dashboard — live request/response dashboard
GET /docs — Swagger UI

Example Requests

# OpenAI
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer test-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}'

# Anthropic
curl -X POST http://localhost:8000/v1/messages \
  -H "x-api-key: test-key" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-sonnet","max_tokens":100,"messages":[{"role":"user","content":"Hello"}]}'

# Vertex AI
curl -X POST http://localhost:8000/v1beta/models/gemini-pro:generateContent \
  -H "Authorization: Bearer test-token" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"role":"user","parts":[{"text":"Hello"}]}]}'

# Bedrock Converse
curl -X POST http://localhost:8000/model/anthropic.claude-v2/converse \
  -H "Authorization: AWS4-HMAC-SHA256 Credential=test" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":[{"text":"Hello"}]}]}'

# Azure OpenAI
curl -X POST "http://localhost:8000/openai/deployments/gpt-4/chat/completions?api-version=2024-10-21" \
  -H "api-key: test-key" \
  -H "Content-Type: application/json" \
  -d '{"messages":[{"role":"user","content":"Hello"}]}'

CLI Options

llm-katan [OPTIONS]

Required:
  -m, --model TEXT              Model name (or any string in echo mode)

Optional:
  -b, --backend [transformers|vllm|echo]  Backend (default: transformers)
  --providers TEXT              Comma-separated providers (default: openai)
  -p, --port INTEGER            Port (default: 8000)
  -n, --served-model-name TEXT  Model name in API responses
  --max-tokens INTEGER          Max tokens (default: 512)
  -t, --temperature FLOAT       Temperature (default: 0.7)
  -d, --device [auto|cpu|cuda]  Device (default: auto)
  --quantize/--no-quantize      CPU int8 quantization (default: enabled)
  --max-concurrent INTEGER      Concurrent requests (default: 1)
  --log-level [debug|info|warning|error]  Log level (default: INFO)

Development

git clone https://github.com/yossiovadia/llm-katan.git
cd llm-katan
pip install -e ".[dev]"
pytest tests/ -v

License

Apache-2.0

Created by Yossi Ovadia

Contributors

Noy Itzikowitz

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.14.0

Apr 21, 2026

This version

0.13.0

Apr 18, 2026

0.12.3

Apr 18, 2026

0.12.2

Apr 18, 2026

0.12.1

Apr 18, 2026

0.12.0

Apr 18, 2026

0.11.0

Apr 13, 2026

0.10.0

Apr 10, 2026

0.9.0

Mar 26, 2026

0.8.2

Mar 26, 2026

0.8.1

Mar 26, 2026

0.8.0

Mar 26, 2026

0.7.3

Mar 20, 2026

0.7.2

Mar 20, 2026

0.7.1

Mar 20, 2026

0.7.0

Mar 20, 2026

0.6.0

Mar 19, 2026

0.5.2

Mar 19, 2026

0.5.1

Mar 19, 2026

0.5.0

Mar 19, 2026

0.4.0

Mar 19, 2026

0.3.2

Mar 19, 2026

0.3.1

Mar 19, 2026

0.3.0

Mar 19, 2026

0.2.1

Mar 19, 2026

0.2.0

Mar 19, 2026

0.1.10

Oct 29, 2025

0.1.9

Oct 6, 2025

0.1.8

Sep 26, 2025

0.1.7

Sep 26, 2025

0.1.6

Sep 26, 2025

0.1.5

Sep 26, 2025

0.1.4

Sep 26, 2025

0.1.3

Sep 26, 2025

0.1.2

Sep 26, 2025

0.1.1

Sep 26, 2025

0.1.0

Sep 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_katan-0.13.0.tar.gz (55.3 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_katan-0.13.0-py3-none-any.whl (72.0 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file llm_katan-0.13.0.tar.gz.

File metadata

Download URL: llm_katan-0.13.0.tar.gz
Upload date: Apr 18, 2026
Size: 55.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for llm_katan-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`ea5fbbdf5d6414ef9585ee91722af312277012be75376dfcb219a66a0dd2fcdb`
MD5	`7bc9ba652b1935193863204261b4e8cb`
BLAKE2b-256	`bd2ac966f2e6b19f3a35036ad7caccf7e8a93bb9733a1ff396ddc635489a986d`

See more details on using hashes here.

File details

Details for the file llm_katan-0.13.0-py3-none-any.whl.

File metadata

Download URL: llm_katan-0.13.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 72.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for llm_katan-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`be776fa8fca77bdeb5c1dfb7a6f49738d3b6928505b791f8d1ce341019bb0772`
MD5	`868a06bd787d703121ee53ab71cc70f7`
BLAKE2b-256	`5948886133bb7e980773995a36d259ff3aa16711b8acc586824984c329d39196`

See more details on using hashes here.

llm-katan 0.13.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LLM Katan

Features

Quick Start

How It Works

Supported Providers

Example Requests

CLI Options

Development

License

Contributors

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes