Local-first LLM gateway with token usage analytics for AI developers

These details have not been verified by PyPI

Project links

Project description

🚀 Flow-LLM-Router: The Ultimate "Token Saver"

Python License

👉 To drastically reduce your AI Agent costs, you need to optimize both Token Price and Token Usage.

We provide the complete 2-step solution:

⚡️ The Most Cost-Effective LLM Platformn at FlowAPI Platform.
🐈 Flow-LLM-Router Your Local AI Control Plane to Save Token.

🔥 Core Features of the flow-llm-router

📊 Automated Token Analytics Dashboard
🧠 Smart API Routing (Rules & Classifiers)
🧰 Dynamic Skill Loading (Eliminate the "Token Tax")
🔒 Secure, Local API Key Management

Why Flow-LLM-Router

OpenAI-compatible by default: point existing SDKs to http://host:7798/v1 and keep most client code unchanged.
Local-first observability: request logs, token usage, latency, and routing metadata stay in local SQLite.
Encrypted provider vault: provider API keys are stored encrypted at rest and unlocked only when needed.
Multi-provider routing: route across OpenAI, Anthropic, Gemini, DeepSeek, Qwen, Groq, custom OpenAI-compatible endpoints, and more.
Operator-friendly dashboard: manage providers, models, caller tokens, logs, analytics, router settings, and integration snippets from one UI.
Optional smart router: use fast rule-based complexity scoring or RouteLLM-powered classifier routing.

Who It Is For

teams building AI agents, copilots, or workflow automation on top of OpenAI-style SDKs
developers who want local logs and routing visibility without adopting a hosted observability layer
operators who need one stable endpoint in front of multiple model providers
builders who want to combine lower-cost upstream pricing with local request optimization

Use Cases

AI agents and copilots: centralize routing and provider auth behind one OpenAI-compatible endpoint
Multi-model workflows: send easy tasks to cheap models and reserve premium models for high-value reasoning
Private internal tooling: keep prompts, logs, and credentials inside your own environment
Cost debugging: identify which models, prompts, and traffic patterns are silently increasing your bill
Gateway standardization: give multiple applications one stable base URL even when upstream providers differ

What It Ships

Area	What FlowGate provides
Proxy	`POST /v1/chat/completions`, streaming chat completions, `POST /v1/embeddings`, and `GET /v1/models`.
Dashboard	Analytics, request logs, provider management, model catalog, router configuration, caller token management, and integration help.
Security	Encrypted provider key storage, master-password unlock flow, caller token access control, IP allowlisting, and log redaction.
Routing	Pass-through mode, rule-based complexity routing, and optional RouteLLM classifier routing with graceful fallback.
Catalog	Sync model lists from provider `/models` endpoints and reuse them in the UI and router configuration.
Packaging	Python package, FastAPI app, Typer CLI, and a statically exported Next.js dashboard served by the API process.

Architecture

Your App / Agent / SDK
        |
        | OpenAI-compatible requests
        v
  FlowGate Proxy  ------------------------------+
        |                                       |
        | auth + routing + logging              |
        v                                       |
  LiteLLM forwarding layer                      |
        |                                       |
        +--> Provider vault (encrypted keys)    |
        +--> Smart router service               |
        +--> SQLite logs + model catalog        |
        +--> Static dashboard UI                |
        |
        v
OpenAI / Anthropic / Gemini / DeepSeek / Qwen / custom OpenAI-compatible backends

Why Not Direct SDK Calls

Approach	What you miss
Call OpenAI or Anthropic directly	No unified routing, no local gateway, no provider abstraction, no centralized logs
Use a generic proxy only	Basic forwarding is not enough if you also want routing policy, encrypted key management, and operator-friendly analytics
Use FlowGate	Keep one OpenAI-compatible endpoint while adding routing, logs, local security controls, and provider portability

Why Pair It With FlowAPI

FlowGate and FlowAPI solve different layers of the cost stack:

Layer	What optimizes it
Token unit price	FlowAPI.net as the upstream endpoint
Token usage efficiency	FlowGate's routing, request visibility, and local control plane

If you are serious about cost control, you usually want both.

Where The Savings Come From

FlowGate is not magic. It improves cost structure through a few concrete levers:

Cost lever	What FlowGate changes	Why it matters
Model selection	Routes lower-complexity work to cheaper models	Prevents expensive models from handling trivial tasks
Operational visibility	Exposes local logs, model usage, latency, and routing data	Makes waste visible so teams can actually fix it
Provider abstraction	Lets you point one app surface at different upstream providers	Makes it easier to optimize for economics without rewriting client code
Prompt discipline foundation	Adds a local control layer for future prompt and skills optimization	Reduces the chance that token inefficiency stays hidden in agent stacks

Key Capabilities

1. Drop-in OpenAI compatibility

FlowGate exposes an OpenAI-style API surface so existing clients can usually switch by changing only the base URL and token source.

Supported endpoints today:

POST /v1/chat/completions
POST /v1/embeddings
GET /v1/models

Streaming chat completions are forwarded as SSE.

2. Local observability that is actually usable

Every proxied request can be logged with:

requested model and routed model
provider
prompt, response, and error status
token usage
latency
smart-router score and tier
session metadata

The dashboard surfaces this through overview metrics, timelines, provider/model breakdowns, and per-request log detail views.

3. Built-in security controls

FlowGate separates caller access from provider credentials:

Caller tokens control who is allowed to use the proxy.
Provider keys are stored encrypted in SQLite.
Master-password unlock protects the provider vault.
IP allowlisting limits where requests may come from.
Log redaction helps prevent accidental credential leakage in persisted logs.

If no caller tokens exist yet, proxy access remains open for easier local setup. Once tokens are created, valid caller tokens become mandatory.

4. Smart routing without leaving the box

FlowGate supports three routing modes:

off: pass through the requested model unchanged
complexity: local rule-based routing using a 7-dimension prompt complexity scorer
classifier: RouteLLM-based routing, mapped back into FlowGate's four-tier model layout

Both routing strategies use the same tier mapping:

SIMPLE
MEDIUM
COMPLEX
REASONING

If RouteLLM is unavailable or fails at runtime, FlowGate falls back to rule-based routing instead of breaking requests.

5. Analytics for finding your token assassins

FlowGate includes a built-in local dashboard so you can inspect:

which models are used most often
which providers consume the most tokens
which requests are slow, error-prone, or over-routed
how routing tiers are distributed over time

Cost optimization gets much easier once the waste is visible.

6. Skills-ready foundation

FlowGate includes optional skills-related configuration and package extras for teams exploring retrieval- and tool-oriented prompt optimization.

In the current repository, this is best understood as a foundation for prompt-efficiency work rather than a fully productized dynamic skill-routing system.

Install

Base install

pip install flow-llm-router

Optional extras

# RouteLLM-based classifier routing
pip install 'flow-llm-router[classifier]'

# ChromaDB-backed skills support
pip install 'flow-llm-router[skills]'

Quick Start

1. Start FlowGate

flow-router start

Default endpoints:

Dashboard: http://127.0.0.1:7798
Proxy base URL: http://127.0.0.1:7798/v1
OpenAPI docs: http://127.0.0.1:7798/docs

To load a custom config file:

export FLOWGATE_CONFIG=/path/to/flowgate.yaml
flow-router start

2. Open the dashboard

Go to http://127.0.0.1:7798 and:

set up or unlock the vault
add one or more provider keys
optionally sync provider models
optionally create caller tokens
optionally configure the smart router

3. Point your SDK at FlowGate

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:7798/v1",
    api_key="fgt_your_caller_token_or_dummy",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello from FlowGate"}],
)

print(response.choices[0].message.content)

If you have not created any caller tokens yet, FlowGate accepts requests without token enforcement. In production, create caller tokens and restrict access explicitly.

Configuration

Copy the example file and adjust it for your environment:

cp flowgate.yaml.example flowgate.yaml

Main configuration areas:

server: bind host and port
smart_router: routing strategy and tier mappings
skills: optional retrieval support
database: SQLite database path
logging: prompt/response logging and secret redaction
security: vault, auth token TTL, persisted master key path, and IP allowlisting

Example smart-router configuration:

smart_router:
  enabled: true
  strategy: complexity   # complexity | classifier | off
  tiers:
    SIMPLE: gpt-4o-mini
    MEDIUM: gpt-4o
    COMPLEX: claude-sonnet
    REASONING: o1-preview

Environment references such as ${OPENAI_API_KEY} are supported in YAML values.

CLI

Command	Purpose
`flow-router start`	Start the FastAPI server and static dashboard
`flow-router add-key`	Interactively add a provider key to the encrypted vault
`flow-router version`	Print the installed FlowGate version

Use flow-router --help for all flags and options.

Development

git clone <your-repo-url>
cd flow-llm-router
python -m venv .venv
source .venv/bin/activate
pip install -e '.[dev,classifier]'
pytest -q

Frontend workflow:

cd frontend
npm ci
npm run build

To build the static dashboard and copy it into the package:

bash scripts/build_frontend.sh

Documentation

Document	Focus
docs/DESIGN.md	Design entry point and guide to the current documentation structure
docs/ARCHITECTURE.md	System layout, runtime components, request flow, and storage model
docs/API_AND_OPERATIONS.md	API surface, auth flow, provider onboarding, and operational notes
docs/SMART_ROUTER.md	Smart-router strategies, config schema, API payloads, and fallback behavior
docs/DEVELOPMENT.md	Local development, frontend build flow, testing, and contribution notes
docs/TESTING.md	Backend test coverage, smoke-test entry points, and verification guidance

FAQ

Is FlowGate a hosted proxy?

No. This repository is the local-first gateway layer you run yourself.

Does FlowGate replace LiteLLM?

No. FlowGate uses LiteLLM as the forwarding layer and adds local routing, security, and observability on top.

Does FlowGate require me to rewrite my OpenAI SDK integration?

Usually no. In most cases you only change the base URL and, if enabled, the caller token.

Can I use FlowGate without FlowAPI.net?

Yes. FlowGate works independently with direct provider APIs and custom OpenAI-compatible endpoints. FlowAPI is an optional upstream pairing for better token pricing.

Does FlowGate already implement full dynamic top-k skill injection?

Not as a finished production feature in the current repository. The project includes skills-related configuration and extension hooks, but the README positions this today as an optimization direction and foundation rather than a fully shipped headline workflow.

Where do my logs and keys live?

Request metadata and catalog data live in local SQLite. Provider API keys are stored encrypted and unlocked only when needed.

Project Status

FlowGate is currently alpha and focused on shipping a tight local gateway experience for AI developers and small teams.

Current strengths:

OpenAI-compatible local proxying
encrypted provider credential handling
integrated analytics and logs
configurable smart routing
model catalog sync
local dashboard operations

Current gaps:

no formal multi-node deployment story yet
no official container or Helm distribution in this repository yet
no guaranteed backward-compatibility promise across early releases

Roadmap

OpenAI-compatible local proxy
Encrypted provider vault
Model catalog sync
Smart router with rule-based and classifier modes
Local analytics dashboard
Richer cost attribution across agents and workflows
Better router evaluation and decision explainability
Production-ready packaging and deployment examples
Stronger admin auth and multi-user operations
More polished benchmarking, demos, and screenshots

Roadmap Direction

The current codebase suggests a clear next path for the project:

broaden provider ergonomics for custom OpenAI-compatible gateways
deepen router evaluation and cost/quality observability
improve deployment and packaging workflows
harden authentication and admin-session handling
expand documentation and operator guidance

Contributing

Pull requests are welcome.

Good contribution areas:

provider integrations and OpenAI-compatible endpoint handling
routing strategy improvements and classifier evaluation
analytics and dashboard polish
packaging, deployment, and operator workflows
documentation, examples, and benchmark material

If you are extending behavior, keep the docs aligned with the implementation and prefer changes that remain inspectable and local-first.

Star History

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flow_llm_router-0.1.0.tar.gz (144.4 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flow_llm_router-0.1.0-py3-none-any.whl (52.6 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file flow_llm_router-0.1.0.tar.gz.

File metadata

Download URL: flow_llm_router-0.1.0.tar.gz
Upload date: Apr 13, 2026
Size: 144.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for flow_llm_router-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2099423f55b48b55a6381c80e8562c4b4e0527fb1004aa36f6629e061f26701f`
MD5	`bdd5e09aa2da5bb8b2ed69e9d51c3d0b`
BLAKE2b-256	`5a0c72bce8f7a3a1671d76614125b740bf24af54268c3d37c45bbaecf3b279ed`

See more details on using hashes here.

File details

Details for the file flow_llm_router-0.1.0-py3-none-any.whl.

File metadata

Download URL: flow_llm_router-0.1.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 52.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for flow_llm_router-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`349fe94f2075c473213c5bfbc2e7da431368a66fc32f8784df601c99bb21386a`
MD5	`fbbea8e3d13cd047630e0b8d13d3dcb0`
BLAKE2b-256	`168f1ae1043f760897b7ca3fc3c6132b6163a15f4869b37752c74cc57f0e8c3b`

See more details on using hashes here.

flow-llm-router 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🚀 Flow-LLM-Router: The Ultimate "Token Saver"

👉 To drastically reduce your AI Agent costs, you need to optimize both Token Price and Token Usage.

We provide the complete 2-step solution:

⚡️ The Most Cost-Effective LLM Platformn at FlowAPI Platform.

🐈 Flow-LLM-Router Your Local AI Control Plane to Save Token.

🔥 Core Features of the flow-llm-router

📊 Automated Token Analytics Dashboard

🧠 Smart API Routing (Rules & Classifiers)

🧰 Dynamic Skill Loading (Eliminate the "Token Tax")

🔒 Secure, Local API Key Management

At A Glance

Why Flow-LLM-Router

Who It Is For

Use Cases

What It Ships

Architecture

Why Not Direct SDK Calls

Why Pair It With FlowAPI

Where The Savings Come From

Key Capabilities

1. Drop-in OpenAI compatibility

2. Local observability that is actually usable

3. Built-in security controls

4. Smart routing without leaving the box

5. Analytics for finding your token assassins

6. Skills-ready foundation

Install

Base install

Optional extras

Quick Start

1. Start FlowGate

2. Open the dashboard

3. Point your SDK at FlowGate

Configuration

CLI

Development

Documentation

FAQ

Is FlowGate a hosted proxy?

Does FlowGate replace LiteLLM?

Does FlowGate require me to rewrite my OpenAI SDK integration?

Can I use FlowGate without FlowAPI.net?

Does FlowGate already implement full dynamic top-k skill injection?

Where do my logs and keys live?

Project Status

Roadmap

Roadmap Direction

Contributing

Star History

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes