Skip to main content

Robotics-aware inference orchestration on top of Ray Serve

Project description

Inferential

Multi-client inference orchestration on top of Ray Serve.

Inferential sits between your clients and your ML models. It receives observations over ZMQ, schedules inference requests using cadence-aware priority scoring, dispatches to Ray Serve, and streams results back — all with sub-millisecond transport overhead. Built for any scenario where multiple clients need concurrent access to shared models: robotics fleets, game agents, IoT devices, real-time ML pipelines.

Inferential data flow

Features

  • ZMQ transport — ROUTER/DEALER sockets with automatic reconnection and zero-copy tensor payloads
  • Pluggable schedulers — Deadline-aware (default), batch-optimized, priority-tiered, round-robin
  • Cadence learning — EMA-based tracking of each client's request pattern to predict urgency
  • Protobuf wire protocol — Typed tensor metadata (dtype, shape, encoding) with binary payload
  • Queue management — Request TTL, drop-oldest overflow policy, dispatch retry
  • In-memory metrics — Ring-buffer storage with label filtering and percentile stats (p50/p95/p99)
  • Lightweight client SDK — No Ray dependency; just pyzmq, protobuf, and numpy

Install

# Client SDK only
pip install inferential

# Server with Ray Serve
pip install inferential[server]

# Development
pip install inferential[dev]

Quick Start

Server

import asyncio
from inferential import Server

server = Server(bind="tcp://*:5555", models=["policy-v2"])

@server.on_metric
def log(name, value, labels):
    if name == "inference_latency_ms":
        print(f"Client {labels.get('client')}: {value:.1f}ms")

asyncio.run(server.run())

Client

import numpy as np
from inferential import Connection

conn = Connection(server="tcp://localhost:5555", client_id="agent-01", client_type="sensor")
model = conn.model("policy-v2", latency_budget_ms=30.0)

state = np.random.randn(7).astype(np.float32)
model.observe(urgency=0.8, state=state)

result = model.get_result(timeout_ms=50)
if result is not None:
    actions = result["actions"]  # np.ndarray

Documentation

  • Quick Start — Install, run server + client, get your first result
  • Architecture — System design, wire protocol, schedulers, queue management, metrics, configuration
  • Examples — Multi-client demos, metric callbacks, extending with custom models and schedulers
  • Contributing — Commit conventions, branching, code style, pre-commit hooks, releases

Development

# Generate protobuf code
make proto

# Run tests
make test

# Lint
make lint

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inferential-0.3.2.tar.gz (17.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inferential-0.3.2-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file inferential-0.3.2.tar.gz.

File metadata

  • Download URL: inferential-0.3.2.tar.gz
  • Upload date:
  • Size: 17.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.3.2.tar.gz
Algorithm Hash digest
SHA256 997bbbba8be4ccc44da522c636f647135d636b7ea7f84ea9a1d2643b9503d7d6
MD5 5ee6abbc6140d5d05b09f66b51818ee9
BLAKE2b-256 b8d18569cb395ae9d19c2d6af280d14232c8ae8084fb7ccdd388b32fed26026f

See more details on using hashes here.

File details

Details for the file inferential-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: inferential-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ab295717689095537b41ae56d6b3179ecaec4dc6b964070bab33688ee21b5f32
MD5 1cf46acbcb3ea4655063b7c884a4b513
BLAKE2b-256 b8675b23f1bad7c33ba8dc7ca9f925041af0ae34d53e925f9bb6b0e811b51f95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page