Skip to main content

Robotics-aware inference orchestration on top of Ray Serve

Project description

Inferential Python SDK

Python client and server SDK for Inferential inference orchestration. The Python package includes both the client SDK (for sending observations and receiving results) and the server (Ray Serve-based scheduling and dispatch).

Install

# Client SDK only (pyzmq, protobuf, numpy)
pip install inferential

# Server with Ray Serve
pip install inferential[server]

# Development
pip install inferential[dev]

Quick Start

See the full Quick Start guide for step-by-step setup.

Server

import asyncio
import numpy as np
from ray import serve
from inferential import Server

@serve.deployment
class MockPolicy:
    def infer(self, obs: dict) -> dict:
        dim = 7
        for v in obs.values():
            if isinstance(v, np.ndarray) and v.ndim == 1:
                dim = v.shape[0]
                break
        return {"actions": np.random.randn(dim).astype(np.float32)}

serve.run(MockPolicy.bind(), name="policy-v2")

server = Server(bind="tcp://*:5555", models=["policy-v2"])

@server.on_metric
def log(name, value, labels):
    if name == "inference_latency_ms":
        print(f"Client {labels.get('client')}: {value:.1f}ms")

asyncio.run(server.run())

Client (sync)

import numpy as np
from inferential import Connection

conn = Connection(server="tcp://localhost:5555", client_id="agent-01", client_type="franka")
model = conn.model("policy-v2", latency_budget_ms=30.0)

state = np.random.randn(7).astype(np.float32)
model.observe(urgency=0.8, state=state)

result = model.get_result(timeout_ms=50)
if result is not None:
    actions = result["actions"]  # np.ndarray

conn.close()

Client (async)

import asyncio
import numpy as np
from inferential import AsyncConnection

async def main():
    async with AsyncConnection(server="tcp://localhost:5555", client_id="agent-01") as conn:
        model = conn.model("policy-v2", latency_budget_ms=30.0)

        state = np.random.randn(7).astype(np.float32)
        await model.observe(urgency=0.8, state=state)

        result = await model.get_result(timeout_ms=50)
        if result is not None:
            actions = result["actions"]  # np.ndarray

asyncio.run(main())

API Reference

Connection(server, client_id, client_type, reconnect_ivl_ms=100, reconnect_max_ms=5000)

Creates a ZMQ DEALER connection to the server. The server address can be with or without the tcp:// prefix.

AsyncConnection(server, client_id, client_type, ...)

Async variant using zmq.asyncio.Context. Supports async with for automatic cleanup.

conn.model(model_id, latency_budget_ms=50.0, priority=1) → Model / AsyncModel

Creates a handle to a specific model on the server.

model.observe(urgency=0.0, steps_remaining=None, **kwargs)

Sends an observation to the server. Keyword arguments are automatically dispatched:

  • np.ndarray values → serialized as tensors (dtype/shape preserved)
  • str values → passed as metadata key-value pairs
  • urgency (float, 0.0–1.0) → scheduling priority hint
  • steps_remaining (int) → remaining steps in trajectory
model.observe(
    urgency=0.5,
    steps_remaining=120,
    state_vector=np.zeros(7, dtype=np.float32),
    image=np.zeros((3, 224, 224), dtype=np.uint8),
    prompt="describe the scene",  # → metadata
)

model.get_result(timeout_ms=100) → dict | None

Waits for a response. Returns a dict mapping tensor keys to numpy arrays, or None on timeout. Also includes response_id, model_id, inference_latency_ms, and any metadata from the server.

conn.close()

Closes the ZMQ socket. Called automatically by AsyncConnection.__aexit__.

Server Configuration

See Architecture for full details on schedulers, queue management, metrics, and configuration schema.

Documentation

  • Quick Start — Install, run server + client, get your first result
  • Architecture — System design, wire protocol, schedulers, metrics
  • Examples — Multi-language client demos, server extensions
  • Contributing — Commit conventions, branching, code style

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inferential-0.9.0.tar.gz (201.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inferential-0.9.0-py3-none-any.whl (32.9 kB view details)

Uploaded Python 3

File details

Details for the file inferential-0.9.0.tar.gz.

File metadata

  • Download URL: inferential-0.9.0.tar.gz
  • Upload date:
  • Size: 201.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.9.0.tar.gz
Algorithm Hash digest
SHA256 3d86d4e942418227c34a9d4720b0e9ba57fd9eebea44d5aa39bafbb89cbad36d
MD5 76c79da5ea580b03be1d0061f6bd6e5e
BLAKE2b-256 5bd44b1aeb48b1ad4fd5cb98f429c8f4c9476af7df7b3eabcabf2ca53e319b60

See more details on using hashes here.

File details

Details for the file inferential-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: inferential-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 32.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f755030e1efa2803cf1837f87e9e233263ed74f06cd5d20bb4d113a5b1805ef5
MD5 64ceac3842fce165f5124e49025afabf
BLAKE2b-256 0c34bd48e5bb21cd73890a3d10554fdd88f0c9ac113271d6cab146fb15abee0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page