Robotics-aware inference orchestration on top of Ray Serve
Project description
Inferential Python SDK
Python client and server SDK for Inferential inference orchestration. The Python package includes both the client SDK (for sending observations and receiving results) and the server (Ray Serve-based scheduling and dispatch).
Install
# Client SDK only (pyzmq, protobuf, numpy)
pip install inferential
# Server with Ray Serve
pip install inferential[server]
# Development
pip install inferential[dev]
Quick Start
See the full Quick Start guide for step-by-step setup.
Server
import asyncio
import numpy as np
from ray import serve
from inferential import Server
@serve.deployment
class MockPolicy:
def infer(self, obs: dict) -> dict:
dim = 7
for v in obs.values():
if isinstance(v, np.ndarray) and v.ndim == 1:
dim = v.shape[0]
break
return {"actions": np.random.randn(dim).astype(np.float32)}
serve.run(MockPolicy.bind(), name="policy-v2")
server = Server(bind="tcp://*:5555", models=["policy-v2"])
@server.on_metric
def log(name, value, labels):
if name == "inference_latency_ms":
print(f"Client {labels.get('client')}: {value:.1f}ms")
asyncio.run(server.run())
Client (sync)
import numpy as np
from inferential import Connection
conn = Connection(server="tcp://localhost:5555", client_id="agent-01", client_type="franka")
model = conn.model("policy-v2", latency_budget_ms=30.0)
state = np.random.randn(7).astype(np.float32)
model.observe(urgency=0.8, state=state)
result = model.get_result(timeout_ms=50)
if result is not None:
actions = result["actions"] # np.ndarray
conn.close()
Client (async)
import asyncio
import numpy as np
from inferential import AsyncConnection
async def main():
async with AsyncConnection(server="tcp://localhost:5555", client_id="agent-01") as conn:
model = conn.model("policy-v2", latency_budget_ms=30.0)
state = np.random.randn(7).astype(np.float32)
await model.observe(urgency=0.8, state=state)
result = await model.get_result(timeout_ms=50)
if result is not None:
actions = result["actions"] # np.ndarray
asyncio.run(main())
API Reference
Connection(server, client_id, client_type, reconnect_ivl_ms=100, reconnect_max_ms=5000)
Creates a ZMQ DEALER connection to the server. The server address can be with or without the tcp:// prefix.
AsyncConnection(server, client_id, client_type, ...)
Async variant using zmq.asyncio.Context. Supports async with for automatic cleanup.
conn.model(model_id, latency_budget_ms=50.0, priority=1) → Model / AsyncModel
Creates a handle to a specific model on the server.
model.observe(urgency=0.0, steps_remaining=None, **kwargs)
Sends an observation to the server. Keyword arguments are automatically dispatched:
np.ndarrayvalues → serialized as tensors (dtype/shape preserved)strvalues → passed as metadata key-value pairsurgency(float, 0.0–1.0) → scheduling priority hintsteps_remaining(int) → remaining steps in trajectory
model.observe(
urgency=0.5,
steps_remaining=120,
state_vector=np.zeros(7, dtype=np.float32),
image=np.zeros((3, 224, 224), dtype=np.uint8),
prompt="describe the scene", # → metadata
)
model.get_result(timeout_ms=100) → dict | None
Waits for a response. Returns a dict mapping tensor keys to numpy arrays, or None on timeout. Also includes response_id, model_id, inference_latency_ms, and any metadata from the server.
conn.close()
Closes the ZMQ socket. Called automatically by AsyncConnection.__aexit__.
Server Configuration
See Architecture for full details on schedulers, queue management, metrics, and configuration schema.
Documentation
- Quick Start — Install, run server + client, get your first result
- Architecture — System design, wire protocol, schedulers, metrics
- Examples — Multi-language client demos, server extensions
- Contributing — Commit conventions, branching, code style
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inferential-0.6.0.tar.gz.
File metadata
- Download URL: inferential-0.6.0.tar.gz
- Upload date:
- Size: 201.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bde5cb7e981391e391e77c7fb525023046aa4694debf1bacefd4a036ad4f3b5f
|
|
| MD5 |
70f66efb50e89ef432f1bc2505548207
|
|
| BLAKE2b-256 |
a7e58d593ec048de9ca0700f2e08a16408c1cc9507dd99c80405e26581b173d0
|
File details
Details for the file inferential-0.6.0-py3-none-any.whl.
File metadata
- Download URL: inferential-0.6.0-py3-none-any.whl
- Upload date:
- Size: 32.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16b2dddff2b80c510706c9bf3dc848216f20ad9e2b6aff13beae182ed31ac3d3
|
|
| MD5 |
be50795bda86ef619305e9d30178b204
|
|
| BLAKE2b-256 |
94e15e73c99f0230da7cea731a31120f190cbb9fbb6686558124fff58791e793
|