SLA/QoS-aware reverse proxy for ML inference workloads (batching, routing, latency metrics).

These details have not been verified by PyPI

Project description

mlproxy-py

mlproxy-py is a minimal ML inference reverse proxy with QoS-aware routing.

Designed for LLM / ML inference workloads where routing decisions should be based on latency, SLA targets, backend health, queue depth, and batching potential.

Features

Reverse proxy for JSON inference requests
Backends grouped into model pools
SLA-aware routing (choose lowest latency backend)
Optional micro-batching (collect requests for N ms)
Concurrent health checks with connection pooling
Prometheus metrics (request count, latency, backend latency)

Quickstart

Install

pip install mlproxy-py

Run proxy

mlproxy run -c examples/config.yml

Send request

curl -X POST http://localhost:7000/infer/modelA \
  -H "Content-Type: application/json" \
  -d '{"text":"hello"}'

Architecture

Client ──POST /infer/{model}──► FastAPI
                                    │
                          ┌─────────▼──────────┐
                          │  ModelRouter       │
                          │  choose_backend()  │
                          │  (score = latency  │
                          │   + active_req*5)  │
                          └─────────┬──────────┘
                                    │ backend URL
                          ┌─────────▼──────────┐
                          │  forward_json()    │
                          │  (httpx conn pool) │
                          └─────────┬──────────┘
                                    ▼
                            Backend ML server

       ┌──────────────────┐    ┌──────────────────┐
       │  BatchQueue      │    │  Healthcheck     │
       │  (optional per   │    │  (concurrent,    │
       │   model pool)    │    │   per-backend)   │
       └──────────────────┘    └──────────────────┘

Config

See examples/config.yml.

Changelog

0.1.1

Lifespan pattern: Migrated from deprecated @app.on_event("startup") to FastAPI lifespan context manager.
Graceful shutdown: Batch workers and healthcheck loop are properly cancelled on shutdown.
Connection pooling: Shared httpx.AsyncClient singletons for proxy and healthcheck (was creating a client per request/check).
Concurrent health checks: Backends checked in parallel via asyncio.gather (was sequential).
Logging: Added structured logging throughout; --log-level CLI option.
Bare except fixes: All except Exception blocks re-raise asyncio.CancelledError.
Deprecated API fixes: Replaced asyncio.get_event_loop() with asyncio.get_running_loop() in batching module.
Build system: Migrated from setuptools to hatchling. Added classifiers, keywords, optional dev/test deps, ruff/pytest config.
Tests: Expanded from 1 test to 15+ tests covering config, router, batching, proxy, healthcheck, and backends.

0.1.0

Initial release: JSON inference proxy, model pools, SLA-aware routing, micro-batching, health checks, Prometheus metrics.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlproxy_py-0.1.1.tar.gz (10.9 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlproxy_py-0.1.1-py3-none-any.whl (10.8 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file mlproxy_py-0.1.1.tar.gz.

File metadata

Download URL: mlproxy_py-0.1.1.tar.gz
Upload date: May 11, 2026
Size: 10.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for mlproxy_py-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`f7e46458f8e6784aeb9b6eca66a18917142d482e42c724449583622146544a75`
MD5	`f6437bbca98d219dcafae27d3ae52b01`
BLAKE2b-256	`6075eab5d2b9c807832afa51a197fd6c60347efae7c50233ec265b20d45ebb99`

See more details on using hashes here.

File details

Details for the file mlproxy_py-0.1.1-py3-none-any.whl.

File metadata

Download URL: mlproxy_py-0.1.1-py3-none-any.whl
Upload date: May 11, 2026
Size: 10.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for mlproxy_py-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1462ec15c8c6a32b52055dd2001b6d529d43eb7e4325c38d2a2d6b1f574d11fa`
MD5	`941cb5a9772c79e0656b552e2b1d334e`
BLAKE2b-256	`65838f1246c756fedacb0dccd34e62f4c92d9b2b8ad10ca5a77ba0abce8db6f6`

See more details on using hashes here.

mlproxy-py 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

mlproxy-py

Features

Quickstart

Install

Run proxy

Send request

Architecture

Config

Changelog

0.1.1

0.1.0

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes