Expose OpenAI-compatible /v1/models aggregation plus /stats host metrics behind Traefik.
Project description
Routheon
Lightweight API_KEY router compatible with the OpenAI protocol.
Route multiple llama.cpp model servers through a single API endpoint using Traefik.
Overview
Routheon acts as a reverse proxy that exposes one unified API endpoint (OpenAI compatible) and routes incoming API
requests to different llama.cpp model servers based on the provided API key.
This enables per-user or per-model access control while keeping the architecture simple.
Use Cases
- Run multiple
llama.cppservers behind a single API - Provide per-user or per-model access via API keys
- Simplify client integration using an OpenAI-compatible endpoint
Optional Routheon Server
routheon-server is a lightweight companion process that exposes two helper endpoints:
/v1/models: Aggregates every reachablellama.cppbackend into one OpenAI-compatible list/stats: Shows basic host metrics (CPU, RAM, uptime) for the machine running the aggregator
The core router works without this service, but enabling it gives you instant visibility into which models are online and the metrics of the host that serves them.
Architecture
┌────────────────────────────┐
│ Traefik Router │
│ (port 8080, /v1/...) │
└──────────────┬─────────────┘
│
┌─────────────────────────┼─────────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌────────────────────────┐
│ llama-server-1 │ │ llama-server-2 │ │ routheon-server │
│ TinyLlama_Chat │ │ mistral-tiny │ │ (optional, port 9080) │
│ (API_KEY-1) │ │ (API_KEY-2) │ │ /v1/models + /stats │
└──────────────────┘ └──────────────────┘ │ aggregate + host info │
▲ ▲ └────────────────────────┘
│ │ ▲
│ │ │
with API_KEY: /v1/chat/completions, ... |
|
without API_KEY: /v1/models
The diagram above illustrates how Routheon routes incoming requests.
All traffic with an API key is forwarded by Traefik to the corresponding llama.cpp backend.
Requests to /v1/models or /stats without an API key are handled by the optional routheon-server,
which aggregates model metadata and host statistics across all reachable backends.
Routheon Demo
The Routheon demo stack sets up an API_KEY router using Traefik and includes two llama.cpp servers with small models.
Prerequisites
- Docker Compose
- requires less than 1 GB of disk space (see Clean-up after the Demo)
Demo Setup
Clone the Repository
git clone git@github.com-Wuodan:Wuodan/routheon.git
cd routheon
Run the Docker Compose File
docker compose up -d
Wait for all llama-server services to be healthy. The models must be downloaded before the services are fully
operational.
To check status, run:
docker compose ps
Demo Requests
Use the following curl commands to exercise the setup with API_KEY-1 and API_KEY-2.
For API_KEY-1 and routing to llama-server-1 (model=TinyLlama_Chat):
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_KEY-1" \
-d '{
"model": "TinyLlama_Chat",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Write a one-line Python function that prints hello."}
]
}'
For API_KEY-2 and routing to llama-server-2 (model=mistral-tiny):
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_KEY-2" \
-d '{
"model": "mistral-tiny",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Write a one-line Python function that prints hello."}
]
}'
Demo: Routheon Server /v1/models
In this demo stack, routheon-server is already enabled.
You can inspect which llama.cpp backends are up using /v1/models.
Without API key: See all active models
Returns all models from all reachable llama.cpp servers:
curl http://127.0.0.1:8080/v1/models
With API key: See model of one llama-server
Returns only the models available behind that specific key / backend:
curl http://127.0.0.1:8080/v1/models \
-H "Authorization: Bearer API_KEY-1"
Supports inactive llama-server
You can stop one of the llama-servers and the routheon-server endpoint will show only one model:
docker compose stop llama-server-2
curl http://127.0.0.1:8080/v1/models
docker compose start llama-server-2
Demo: Routheon Server /stats
routheon-server also serves /stats, which returns CPU, memory, disk, network, and uptime information for the host
running the aggregator.
This is configured to see only a subset of the available information.
See Configure /stats output below.
curl http://127.0.0.1:8080/stats | jq
Use it to monitor resource pressure before launching additional llama.cpp servers.
Clean-up after the Demo
The models are stored in a Docker volume. When you are done with the demo, delete images and the volume with:
# docker clean-up
docker compose down
docker image rm traefik:latest ghcr.io/ggml-org/llama.cpp:server python:slim routheon_routheon-server:demo
# archived during demo as it contains API_KEY-2
mv traefik/mappings/llama-server-3.yml{.bak.*,} 2>/dev/null
# created during demo
[ -f traefik/mappings/llama-server-2.yml ] && \
sudo rm traefik/mappings/llama-server-2.yml
Remove the volume with the models:
docker volume rm routheon_llama_cpp
Routheon in Production
This describes a bare-metal setup without Docker. Both traefik and llama.cpp run on the same computer.
Prerequisites
- Install Traefik
- Install llama.cpp to have one or several instances of
llama-serverwith dedicated ports - Python: if you want the Optional Routheon Server
Installation
Traefik Config: traefik.yml
- Copy
traefik/traefik.ymlto/etc/traefik/traefik.ymlsudo mkdir -p /etc/traefik sudo curl -LO --output-dir /etc/traefik https://raw.githubusercontent.com/Wuodan/routheon/refs/heads/main/traefik/traefik.yml sudo mkdir -p /etc/traefik//mappings
- Adapt the port to your needs.
- Add logging (
accessLog) and other Traefik settings as needed.
Traefik Config: Map API_KEY to llama.cpp instance
Here you have 2 choices:
- Dynamic Auto-Mapping: Let mappings be created/updated when
llama-serverstarts - Manual Mapping: Manage the mapping files manually
Dynamic Auto-Mapping with create-mapping.sh
Change your system daemon for llama-server to also call create-mapping.sh.
Download script:
sudo curl -LO --output-dir /usr/local/bin https://raw.githubusercontent.com/Wuodan/routheon/refs/heads/main/traefik/create-mapping/create-mapping.sh
sudo chmod +x /usr/local/bin/create-mapping.sh
Example: Chain create-mapping.sh and llama-server:
/usr/local/bin/create-mapping.sh \
--port 8011 \
--service TinyLlama_Chat \
--api_key 'my-api-key' && \
\
exec \
llama-server \
--hf-repo TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF:Q2_K \
--port 8011 \
--alias TinyLlama_Chat
This uses defaults for the mapping folder and the host, use
--mappingsand--hostto set them.
Manual Mapping
- Create mappings in
/etc/traefik/mappings/ - For each
llama.cppinstance, create amy-server.ymlfile like llama-server-1.yml- The
urlmust behttp://127.0.0.1:<LLAMA_PORT> - Replace
API_KEY-1with your own API key for eachllama-server
- The
Traefik Service
- Configure a system daemon for Traefik depending on your OS
- The path of the mappings folder can be changed in
traefik.yml - If you choose another path for
traefik.yml, use thetraefik --configFile <PATH>parameter
llama.cpp
Run llama-server without the --host parameter (so it defaults to 127.0.0.1) to prevent direct remote access to its
port.
Ready to Use
- All instances of
llama.cppcan now be accessed remotely via a single common port - Access to each instance is controlled by API_KEY
Optional Routheon Server
The routheon-server companion service collects the /v1/models information from all configured
targets and provides both /v1/models and /stats endpoints back to Traefik.
It’s optional — Routheon works normally without it.
Enable it only if you want /v1/models to aggregate all active model servers or to expose /stats.
The routheon-server aggregates the /v1/models output from all reachable llama.cpp servers and returns an OpenAI
compatible response as if one server was providing multiple models.
Installation
-
Copy
traefik/mappings/routheon-server.ymlto/etc/traefik/mappings/(same path as other mappings).
Inroutheon-server.yml, change the URL tohttp://127.0.0.1:9080.sudo mkdir -p /etc/traefik/mappings/ cd /etc/traefik/mappings/ sudo curl -LO https://raw.githubusercontent.com/Wuodan/routheon/refs/heads/main/traefik/mappings/routheon-server.yml sudo sed -i.bak 's#http://routheon-server:#http://127.0.0.1:#' routheon-server.yml sudo rm -f routheon-server.yml.bak
-
Install routheon-server into a virtual environment (either from PyPI or from the cloned repo):
python3 -m venv ~/.routheon/venv ~/.routheon/venv/bin/pip install routheon-server
For local development builds, run
~/.routheon/venv/bin/pip install .from the repository root instead. -
Set up a system daemon depending on your OS to run the installed console script.
The daemon should run this command:
~/.routheon/venv/bin/routheon-server
Customize
The defaults of routheon-server suit the described setup.
If your setup is different, then adapt the command with the following arguments:
--mappings: Directory containing Traefik mapping files (default:/etc/traefik/mappings)--host: Host to bind the HTTP server to. Use127.0.0.1(default) for remote access by Traefik only--port: Port to listen on (default:9080). Ensure this matches the URL in theroutheon-server.ymlfile--skip-mapping: YAML filenames to skip (regex patterns, default:["routheon-server.yml"])routheon-server.yml: The mapping file for the routheon-server itself must be in that list- Add patterns for other mapping files you want to exclude from the aggregation
--mapping-timeout: Timeout in seconds for requests to each mapping (default:2)--stats-config-file: Path to a YAML file that hides selected/statssections or fields--log-level: Logging level (DEBUG,INFO,WARNING,ERROR,CRITICAL; default:WARNING)ROUTHEON_VERSIONbuild arg: when building the Docker image (e.g., viadocker compose build), override this arg to stamp the desired package version (docker compose build --build-arg ROUTHEON_VERSION=0.4.0 routheon-server). The demo stack falls back to0.0.0.dev0, which is a valid development version for local use.
Example:
~/.routheon/venv/bin/routheon-server \
--mappings /etc/traefik/mappings \
--host 127.0.0.1 \
--port 9080
Configure /stats output
To see the full output of /stats run:
~/.routheon/venv/bin/routheon-server
and read the output in a second terminal with
curl http://127.0.0.1:9080/stats | jq
Limit /stats output
If /stats exposes information you do not want to share, create a YAML configuration file:
# ~/.routheon/stats-config.yml
enabled_sections:
- system
- cpu
- memory
enabled_fields:
memory:
- available
- percent
Start the service with:
~/.routheon/venv/bin/routheon-server \
--mappings /etc/traefik/mappings \
--stats-config-file ~/.routheon/stats-config.yml
When a config file is provided, only sections listed in enabled_sections are exposed.
Within each section, enabled_fields narrows the dictionary to the listed keys.
Omit enabled_sections to keep all sections but still restrict individual fields.
Releasing routheon-server
The Python helper package now derives its version from Git tags via hatch-vcs, so there's no manual edit to
pyproject.toml when cutting a release. To publish a new version:
- Ensure
mainalready contains the desired commits, then create a tag that matches the version you intend to push, e.g.git tag v0.2.0. - Push the tag (
git push origin v0.2.0). The GitHub Actions workflow runs lint/tests across supported Python versions and, if the push is a tag, builds and uploads to PyPI using that semantic version. - (Optional) Draft a GitHub Release pointing at the same tag for humans to discover the changes.
Because PyPI treats releases as immutable, bump the tag (e.g. v0.2.1) for any follow-up fixes instead of trying to
replace an existing version.
License & Status
License: Apache License 2.0
Status: Experimental / Proof of Concept
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file routheon_server-1.0.2.tar.gz.
File metadata
- Download URL: routheon_server-1.0.2.tar.gz
- Upload date:
- Size: 25.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e228bb083aa900c5acffcab546f17327ed06b4af4a58eafe16b3eba878a88322
|
|
| MD5 |
d2dbe0fd1342df35833b000a8de1faeb
|
|
| BLAKE2b-256 |
3bbd082223e200d5c6dea613ad5359c0420c317717211a89b9806918afbecd12
|
File details
Details for the file routheon_server-1.0.2-py3-none-any.whl.
File metadata
- Download URL: routheon_server-1.0.2-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
164215cd849b3dfb31438e2d968bd18a7e5d3b7b278e3393d2b870986b64c592
|
|
| MD5 |
e5cbf8e3655d3f93b31db5baf22f8446
|
|
| BLAKE2b-256 |
90c93e56393909fb67b851cdf6e0398c12a87470e96be5ed6cb3f2208f7f2139
|