Skip to main content

Robotics-aware inference orchestration on top of Ray Serve

Project description

Inferential

Multi-client inference orchestration on top of Ray Serve.

Inferential sits between your clients and your ML models. It receives observations over ZMQ, schedules inference requests using cadence-aware priority scoring, dispatches to Ray Serve, and streams results back — all with sub-millisecond transport overhead. Built for any scenario where multiple clients need concurrent access to shared models: robotics fleets, game agents, IoT devices, real-time ML pipelines.

Inferential data flow

Features

  • ZMQ transport — ROUTER/DEALER sockets with automatic reconnection and zero-copy tensor payloads
  • Pluggable schedulers — Deadline-aware (default), batch-optimized, priority-tiered, round-robin
  • Cadence learning — EMA-based tracking of each client's request pattern to predict urgency
  • Protobuf wire protocol — Typed tensor metadata (dtype, shape, encoding) with binary payload
  • Queue management — Request TTL, drop-oldest overflow policy, dispatch retry
  • In-memory metrics — Ring-buffer storage with label filtering and percentile stats (p50/p95/p99)
  • Lightweight client SDK — No Ray dependency; just pyzmq, protobuf, and numpy

Install

# Client SDK only
pip install inferential

# Server with Ray Serve
pip install inferential[server]

# Development
pip install inferential[dev]

Quick Start

Server

import asyncio
from inferential import Server

server = Server(bind="tcp://*:5555", models=["policy-v2"])

@server.on_metric
def log(name, value, labels):
    if name == "inference_latency_ms":
        print(f"Client {labels.get('client')}: {value:.1f}ms")

asyncio.run(server.run())

Client

import numpy as np
from inferential import Connection

conn = Connection(server="tcp://localhost:5555", client_id="agent-01", client_type="sensor")
model = conn.model("policy-v2", latency_budget_ms=30.0)

state = np.random.randn(7).astype(np.float32)
model.observe(urgency=0.8, state=state)

result = model.get_result(timeout_ms=50)
if result is not None:
    actions = result["actions"]  # np.ndarray

Documentation

  • Quick Start — Install, run server + client, get your first result
  • Architecture — System design, wire protocol, schedulers, queue management, metrics, configuration
  • Examples — Multi-client demos, metric callbacks, extending with custom models and schedulers
  • Contributing — Commit conventions, branching, code style, pre-commit hooks, releases

Development

# Generate protobuf code
make proto

# Run tests
make test

# Lint
make lint

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inferential-0.3.1.tar.gz (17.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inferential-0.3.1-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file inferential-0.3.1.tar.gz.

File metadata

  • Download URL: inferential-0.3.1.tar.gz
  • Upload date:
  • Size: 17.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.3.1.tar.gz
Algorithm Hash digest
SHA256 ef537a18614154b0d82c8341c433342969209eecb20d38e00ebc8515085bfe1a
MD5 dfc25e1746d720ad85d642a40242e033
BLAKE2b-256 748971781a058c0ebc8c68d1bd754de206a01b0fbc532caf67e4654956262f61

See more details on using hashes here.

File details

Details for the file inferential-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: inferential-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae2fe58324e3d1f0e3a2f6ba7edd0cf3350e41d924c57981446d3393ef41107c
MD5 bd7fcbee192f7659ba27d3129a98a950
BLAKE2b-256 2330bb9a844bb8dda2eb8a15671c33aa1e2f6b41cb50cf10811f2d2aac57ab72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page