Skip to main content

Robotics-aware inference orchestration on top of Ray Serve

Project description

Inferential

Multi-client inference orchestration on top of Ray Serve.

Inferential sits between your clients and your ML models. It receives observations over ZMQ, schedules inference requests using cadence-aware priority scoring, dispatches to Ray Serve, and streams results back — all with sub-millisecond transport overhead. Built for any scenario where multiple clients need concurrent access to shared models: robotics fleets, game agents, IoT devices, real-time ML pipelines.

Features

  • ZMQ transport — ROUTER/DEALER sockets with automatic reconnection and zero-copy tensor payloads
  • Pluggable schedulers — Deadline-aware (default), batch-optimized, priority-tiered, round-robin
  • Cadence learning — EMA-based tracking of each client's request pattern to predict urgency
  • Protobuf wire protocol — Typed tensor metadata (dtype, shape, encoding) with binary payload
  • Queue management — Request TTL, drop-oldest overflow policy, dispatch retry
  • In-memory metrics — Ring-buffer storage with label filtering and percentile stats (p50/p95/p99)
  • Lightweight client SDK — No Ray dependency; just pyzmq, protobuf, and numpy

Install

# Client SDK only
pip install inferential

# Server with Ray Serve
pip install inferential[server]

# Development
pip install inferential[dev]

Quick Start

Server

import asyncio
from inferential import Server

server = Server(bind="tcp://*:5555", models=["policy-v2"])

@server.on_metric
def log(name, value, labels):
    if name == "inference_latency_ms":
        print(f"Client {labels.get('client')}: {value:.1f}ms")

asyncio.run(server.run())

Client

import numpy as np
from inferential import Connection

conn = Connection(server="tcp://localhost:5555", client_id="agent-01", client_type="sensor")
model = conn.model("policy-v2", latency_budget_ms=30.0)

state = np.random.randn(7).astype(np.float32)
model.observe(urgency=0.8, state=state)

result = model.get_result(timeout_ms=50)
if result is not None:
    actions = result["actions"]  # np.ndarray

Documentation

  • Quick Start — Install, run server + client, get your first result
  • Architecture — System design, wire protocol, schedulers, queue management, metrics, configuration
  • Examples — Multi-client demos, metric callbacks, extending with custom models and schedulers
  • Contributing — Commit conventions, branching, code style, pre-commit hooks, releases

Development

# Generate protobuf code
make proto

# Run tests
make test

# Lint
make lint

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inferential-0.2.0.tar.gz (212.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inferential-0.2.0-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file inferential-0.2.0.tar.gz.

File metadata

  • Download URL: inferential-0.2.0.tar.gz
  • Upload date:
  • Size: 212.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.2.0.tar.gz
Algorithm Hash digest
SHA256 807f7691b348ffcac7fb35be51c2bc50b362898a0e13a38f5f28c5f322e0ec94
MD5 3aca58c283eeb0352cf7b681f4b1c02c
BLAKE2b-256 3472fa7b3648487ada0361c824e83342a9690543b4e8fa46c51dc1a61f63b8bd

See more details on using hashes here.

File details

Details for the file inferential-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: inferential-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c114920052902f8c3ccaba26d45a10b02ab304bfce5f6561f00a5404db7a8355
MD5 cb7889f78469abc83afc459b86c89cb2
BLAKE2b-256 3c229c2434e0871d813e7b1647cfcc7ed1edf4736778567dc1c2cfa594bf8dd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page