Expose OpenAI-compatible /v1/models aggregation plus /stats host metrics behind Traefik.

Project description

Routheon

Lightweight API_KEY router compatible with the OpenAI protocol.
Route multiple llama.cpp model servers through a single API endpoint using Traefik.

Overview

Routheon acts as a reverse proxy that exposes one unified API endpoint (OpenAI compatible) and routes incoming API requests to different llama.cpp model servers based on the provided API key.

This enables per-user or per-model access control while keeping the architecture simple.

Use Cases

Run multiple llama.cpp servers behind a single API
Provide per-user or per-model access via API keys
Simplify client integration using an OpenAI-compatible endpoint

Optional Routheon Server

routheon-server is a lightweight companion process that exposes two helper endpoints:

/v1/models: Aggregates every reachable llama.cpp backend into one OpenAI-compatible list
/stats: Shows basic host metrics (CPU, RAM, uptime) for the machine running the aggregator

The core router works without this service, but enabling it gives you instant visibility into which models are online and the metrics of the host that serves them.

Architecture

                   ┌────────────────────────────┐
                   │        Traefik Router      │
                   │   (port 8080, /v1/...)     │
                   └──────────────┬─────────────┘
                                  │
        ┌─────────────────────────┼─────────────────────────────┐
        │                         │                             │
        ▼                         ▼                             ▼
┌──────────────────┐     ┌──────────────────┐     ┌────────────────────────┐
│ llama-server-1   │     │ llama-server-2   │     │    routheon-server     │
│ TinyLlama_Chat   │     │ mistral-tiny     │     │ (optional, port 9080)  │
│ (API_KEY-1)      │     │ (API_KEY-2)      │     │ /v1/models + /stats    │
└──────────────────┘     └──────────────────┘     │ aggregate + host info  │
        ▲                         ▲               └────────────────────────┘
        │                         │                             ▲
        │                         │                             │
     with API_KEY: /v1/chat/completions, ...                    |
                                                                |
                                                  without API_KEY: /v1/models

The diagram above illustrates how Routheon routes incoming requests.
All traffic with an API key is forwarded by Traefik to the corresponding llama.cpp backend.
Requests to /v1/models or /stats without an API key are handled by the optional routheon-server,
which aggregates model metadata and host statistics across all reachable backends.

Routheon Demo

The Routheon demo stack sets up an API_KEY router using Traefik and includes two llama.cpp servers with small models.

Prerequisites

Docker Compose
requires less than 1 GB of disk space (see Clean-up after the Demo)

Demo Setup

Clone the Repository

git clone git@github.com-Wuodan:Wuodan/routheon.git
cd routheon

Run the Docker Compose File

docker compose up -d

Wait for all llama-server services to be healthy. The models must be downloaded before the services are fully operational.

To check status, run:

docker compose ps

Demo Requests

Use the following curl commands to exercise the setup with API_KEY-1 and API_KEY-2.

For API_KEY-1 and routing to llama-server-1 (model=TinyLlama_Chat):

curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_KEY-1" \
-d '{
   "model": "TinyLlama_Chat",
   "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Write a one-line Python function that prints hello."}
   ]
 }'

For API_KEY-2 and routing to llama-server-2 (model=mistral-tiny):

curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer API_KEY-2" \
-d '{
   "model": "mistral-tiny",
   "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Write a one-line Python function that prints hello."}
   ]
 }'

Demo: Routheon Server `/v1/models`

In this demo stack, routheon-server is already enabled.
You can inspect which llama.cpp backends are up using /v1/models.

Without API key: See all active models

Returns all models from all reachable llama.cpp servers:

curl http://127.0.0.1:8080/v1/models

With API key: See model of one llama-server

Returns only the models available behind that specific key / backend:

curl http://127.0.0.1:8080/v1/models \
-H "Authorization: Bearer API_KEY-1"

Supports inactive llama-server

You can stop one of the llama-servers and the routheon-server endpoint will show only one model:

docker compose stop llama-server-2
curl http://127.0.0.1:8080/v1/models
docker compose start llama-server-2

Demo: Routheon Server `/stats`

routheon-server also serves /stats, which returns CPU, memory, disk, network, and uptime information for the host running the aggregator.

This is configured to see only a subset of the available information.
See Configure /stats output below.

curl http://127.0.0.1:8080/stats | jq

Use it to monitor resource pressure before launching additional llama.cpp servers.

Clean-up after the Demo

The models are stored in a Docker volume. When you are done with the demo, delete images and the volume with:

# docker clean-up
docker compose down
docker image rm traefik:latest ghcr.io/ggml-org/llama.cpp:server python:slim routheon_routheon-server:demo

# archived during demo as it contains API_KEY-2
mv traefik/mappings/llama-server-3.yml{.bak.*,} 2>/dev/null

# created during demo
[ -f traefik/mappings/llama-server-2.yml ] && \
  sudo rm traefik/mappings/llama-server-2.yml

Remove the volume with the models:

docker volume rm routheon_llama_cpp

Routheon in Production

This describes a bare-metal setup without Docker. Both traefik and llama.cpp run on the same computer.

Prerequisites

Install Traefik
Install llama.cpp to have one or several instances of llama-server with dedicated ports
Python: if you want the Optional Routheon Server

Installation

Traefik Config: `traefik.yml`

Copy traefik/traefik.yml to /etc/traefik/traefik.yml

sudo mkdir -p /etc/traefik
sudo curl -LO --output-dir /etc/traefik https://raw.githubusercontent.com/Wuodan/routheon/refs/heads/main/traefik/traefik.yml
sudo mkdir -p /etc/traefik//mappings

Adapt the port to your needs.
Add logging (accessLog) and other Traefik settings as needed.

Traefik Config: Map API_KEY to llama.cpp instance

Here you have 2 choices:

Dynamic Auto-Mapping: Let mappings be created/updated when llama-server starts
Manual Mapping: Manage the mapping files manually

Dynamic Auto-Mapping with `create-mapping.sh`

Change your system daemon for llama-server to also call create-mapping.sh.

Download script:

sudo curl -LO --output-dir /usr/local/bin https://raw.githubusercontent.com/Wuodan/routheon/refs/heads/main/traefik/create-mapping/create-mapping.sh
sudo chmod +x /usr/local/bin/create-mapping.sh

Example: Chain create-mapping.sh and llama-server:

/usr/local/bin/create-mapping.sh \
  --port 8011 \
  --service TinyLlama_Chat \
  --api_key 'my-api-key' && \
\
exec \
  llama-server \
      --hf-repo TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF:Q2_K \
      --port 8011 \
      --alias TinyLlama_Chat

This uses defaults for the mapping folder and the host, use --mappings and --host to set them.

Manual Mapping

Create mappings in /etc/traefik/mappings/
For each llama.cpp instance, create a my-server.yml file like llama-server-1.yml
- The url must be http://127.0.0.1:<LLAMA_PORT>
- Replace API_KEY-1 with your own API key for each llama-server

Traefik Service

Configure a system daemon for Traefik depending on your OS
The path of the mappings folder can be changed in traefik.yml
If you choose another path for traefik.yml, use the traefik --configFile <PATH> parameter

llama.cpp

Run llama-server without the --host parameter (so it defaults to 127.0.0.1) to prevent direct remote access to its port.

Ready to Use

All instances of llama.cpp can now be accessed remotely via a single common port
Access to each instance is controlled by API_KEY

Optional Routheon Server

The routheon-server companion service collects the /v1/models information from all configured targets and provides both /v1/models and /stats endpoints back to Traefik.

It’s optional — Routheon works normally without it. Enable it only if you want /v1/models to aggregate all active model servers or to expose /stats.

The routheon-server aggregates the /v1/models output from all reachable llama.cpp servers and returns an OpenAI compatible response as if one server was providing multiple models.

Installation

Copy traefik/mappings/routheon-server.yml to /etc/traefik/mappings/ (same path as other mappings).
In routheon-server.yml, change the URL to http://127.0.0.1:9080.

sudo mkdir -p /etc/traefik/mappings/
cd /etc/traefik/mappings/
sudo curl -LO https://raw.githubusercontent.com/Wuodan/routheon/refs/heads/main/traefik/mappings/routheon-server.yml
sudo sed -i.bak 's#http://routheon-server:#http://127.0.0.1:#' routheon-server.yml
sudo rm -f routheon-server.yml.bak

Install routheon-server into a virtual environment (either from PyPI or from the cloned repo):
```
python3 -m venv ~/.routheon/venv
~/.routheon/venv/bin/pip install routheon-server
```
For local development builds, run ~/.routheon/venv/bin/pip install . from the repository root instead.
Set up a system daemon depending on your OS to run the installed console script.

The daemon should run this command:
```
~/.routheon/venv/bin/routheon-server
```

Customize

The defaults of routheon-server suit the described setup.
If your setup is different, then adapt the command with the following arguments:

--mappings: Directory containing Traefik mapping files (default: /etc/traefik/mappings)
--host: Host to bind the HTTP server to. Use 127.0.0.1 (default) for remote access by Traefik only
--port: Port to listen on (default: 9080). Ensure this matches the URL in the routheon-server.yml file
--skip-mapping: YAML filenames to skip (regex patterns, default: ["routheon-server.yml"])
- routheon-server.yml: The mapping file for the routheon-server itself must be in that list
- Add patterns for other mapping files you want to exclude from the aggregation
--mapping-timeout: Timeout in seconds for requests to each mapping (default: 2)
--stats-config-file: Path to a YAML file that hides selected /stats sections or fields
--log-level: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL; default: WARNING)
ROUTHEON_VERSION build arg: when building the Docker image (e.g., via docker compose build), override this arg to stamp the desired package version (docker compose build --build-arg ROUTHEON_VERSION=0.4.0 routheon-server). The demo stack falls back to 0.0.0.dev0, which is a valid development version for local use.

Example:

~/.routheon/venv/bin/routheon-server \
  --mappings /etc/traefik/mappings \
  --host 127.0.0.1 \
  --port 9080

Configure `/stats` output

To see the full output of /stats run:

~/.routheon/venv/bin/routheon-server

and read the output in a second terminal with

curl http://127.0.0.1:9080/stats | jq

Limit `/stats` output

If /stats exposes information you do not want to share, create a YAML configuration file:

# ~/.routheon/stats-config.yml
enabled_sections:
  - system
  - cpu
  - memory
enabled_fields:
  memory:
    - available
    - percent

Start the service with:

~/.routheon/venv/bin/routheon-server \
  --mappings /etc/traefik/mappings \
  --stats-config-file ~/.routheon/stats-config.yml

When a config file is provided, only sections listed in enabled_sections are exposed.
Within each section, enabled_fields narrows the dictionary to the listed keys.
Omit enabled_sections to keep all sections but still restrict individual fields.

Releasing `routheon-server`

The Python helper package now derives its version from Git tags via hatch-vcs, so there's no manual edit to pyproject.toml when cutting a release. To publish a new version:

Ensure main already contains the desired commits, then create a tag that matches the version you intend to push, e.g. git tag v0.2.0.
Push the tag (git push origin v0.2.0). The GitHub Actions workflow runs lint/tests across supported Python versions and, if the push is a tag, builds and uploads to PyPI using that semantic version.
(Optional) Draft a GitHub Release pointing at the same tag for humans to discover the changes.

Because PyPI treats releases as immutable, bump the tag (e.g. v0.2.1) for any follow-up fixes instead of trying to replace an existing version.

License & Status

License: Apache License 2.0
Status: Experimental / Proof of Concept

Project details

Release history Release notifications | RSS feed

1.0.7

Nov 17, 2025

1.0.6

Nov 16, 2025

1.0.5

Nov 7, 2025

1.0.4

Nov 7, 2025

1.0.3

Nov 7, 2025

This version

1.0.2

Nov 7, 2025

1.0.1

Nov 7, 2025

1.0.0

Nov 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

routheon_server-1.0.2.tar.gz (25.7 kB view details)

Uploaded Nov 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

routheon_server-1.0.2-py3-none-any.whl (19.2 kB view details)

Uploaded Nov 7, 2025 Python 3

File details

Details for the file routheon_server-1.0.2.tar.gz.

File metadata

Download URL: routheon_server-1.0.2.tar.gz
Upload date: Nov 7, 2025
Size: 25.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for routheon_server-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`e228bb083aa900c5acffcab546f17327ed06b4af4a58eafe16b3eba878a88322`
MD5	`d2dbe0fd1342df35833b000a8de1faeb`
BLAKE2b-256	`3bbd082223e200d5c6dea613ad5359c0420c317717211a89b9806918afbecd12`

See more details on using hashes here.

File details

Details for the file routheon_server-1.0.2-py3-none-any.whl.

File metadata

Download URL: routheon_server-1.0.2-py3-none-any.whl
Upload date: Nov 7, 2025
Size: 19.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for routheon_server-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`164215cd849b3dfb31438e2d968bd18a7e5d3b7b278e3393d2b870986b64c592`
MD5	`e5cbf8e3655d3f93b31db5baf22f8446`
BLAKE2b-256	`90c93e56393909fb67b851cdf6e0398c12a87470e96be5ed6cb3f2208f7f2139`

See more details on using hashes here.

routheon-server 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Routheon

Overview

Use Cases

Optional Routheon Server

Architecture

Routheon Demo

Prerequisites

Demo Setup

Clone the Repository

Run the Docker Compose File

Demo Requests

Demo: Routheon Server /v1/models

Without API key: See all active models

With API key: See model of one llama-server

Supports inactive llama-server

Demo: Routheon Server /stats

Clean-up after the Demo

Routheon in Production

Prerequisites

Installation

Traefik Config: traefik.yml

Traefik Config: Map API_KEY to llama.cpp instance

Dynamic Auto-Mapping with create-mapping.sh

Manual Mapping

Traefik Service

llama.cpp

Ready to Use

Optional Routheon Server

Installation

Customize

Configure /stats output

Limit /stats output

Releasing routheon-server

License & Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Demo: Routheon Server `/v1/models`

Demo: Routheon Server `/stats`

Traefik Config: `traefik.yml`

Dynamic Auto-Mapping with `create-mapping.sh`

Configure `/stats` output

Limit `/stats` output

Releasing `routheon-server`