Skip to main content

Robotics-aware inference orchestration on top of Ray Serve

Project description

Inferential

Multi-client inference orchestration on top of Ray Serve.

Inferential sits between your clients and your ML models. It receives observations over ZMQ, schedules inference requests using cadence-aware priority scoring, dispatches to Ray Serve, and streams results back — all with sub-millisecond transport overhead. Built for any scenario where multiple clients need concurrent access to shared models: robotics fleets, game agents, IoT devices, real-time ML pipelines.

Inferential data flow

Features

  • ZMQ transport — ROUTER/DEALER sockets with automatic reconnection and zero-copy tensor payloads
  • Pluggable schedulers — Deadline-aware (default), batch-optimized, priority-tiered, round-robin
  • Cadence learning — EMA-based tracking of each client's request pattern to predict urgency
  • Protobuf wire protocol — Typed tensor metadata (dtype, shape, encoding) with binary payload
  • Queue management — Request TTL, drop-oldest overflow policy, dispatch retry
  • In-memory metrics — Ring-buffer storage with label filtering and percentile stats (p50/p95/p99)
  • Lightweight client SDK — No Ray dependency; just pyzmq, protobuf, and numpy

Install

# Client SDK only
pip install inferential

# Server with Ray Serve
pip install inferential[server]

# Development
pip install inferential[dev]

Quick Start

Server

import asyncio
from inferential import Server

server = Server(bind="tcp://*:5555", models=["policy-v2"])

@server.on_metric
def log(name, value, labels):
    if name == "inference_latency_ms":
        print(f"Client {labels.get('client')}: {value:.1f}ms")

asyncio.run(server.run())

Client

import numpy as np
from inferential import Connection

conn = Connection(server="tcp://localhost:5555", client_id="agent-01", client_type="sensor")
model = conn.model("policy-v2", latency_budget_ms=30.0)

state = np.random.randn(7).astype(np.float32)
model.observe(urgency=0.8, state=state)

result = model.get_result(timeout_ms=50)
if result is not None:
    actions = result["actions"]  # np.ndarray

Documentation

  • Quick Start — Install, run server + client, get your first result
  • Architecture — System design, wire protocol, schedulers, queue management, metrics, configuration
  • Examples — Multi-client demos, metric callbacks, extending with custom models and schedulers
  • Contributing — Commit conventions, branching, code style, pre-commit hooks, releases

Development

# Generate protobuf code
make proto

# Run tests
make test

# Lint
make lint

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inferential-0.3.0.tar.gz (17.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inferential-0.3.0-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file inferential-0.3.0.tar.gz.

File metadata

  • Download URL: inferential-0.3.0.tar.gz
  • Upload date:
  • Size: 17.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.3.0.tar.gz
Algorithm Hash digest
SHA256 0957bfbc5263f19ce1d25089bb703e73cc4ec92691ed3c59bbd7d3f892fa4ae4
MD5 bc7714d2901dcc5b6b398174daad0dd3
BLAKE2b-256 c5dedd5a6a771f2715797013aaa0fe664b97c11c44ff2dcc4cadad2c68920d8e

See more details on using hashes here.

File details

Details for the file inferential-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: inferential-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 55a52e3a3aba1f286fa9aff8dbd4095b1fed8fff0aad459239429d80cdf19a12
MD5 ea056520b80a40d2ef202d2dd6ee2255
BLAKE2b-256 154060b5e9ae3ef3cddcdb3cec3b79e022c3cd7d7e81dfe8be6060fbfb28bd6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page