Robotics-aware inference orchestration on top of Ray Serve

These details have not been verified by PyPI

Project links

Project description

Inferential Python SDK

Python client and server SDK for Inferential inference orchestration. The Python package includes both the client SDK (for sending observations and receiving results) and the server (Ray Serve-based scheduling and dispatch).

Install

# Client SDK only (pyzmq, protobuf, numpy)
pip install inferential

# Server with Ray Serve
pip install inferential[server]

# Development
pip install inferential[dev]

Quick Start

See the full Quick Start guide for step-by-step setup.

Server

import asyncio
import numpy as np
from ray import serve
from inferential import Server

@serve.deployment
class MockPolicy:
    def infer(self, obs: dict) -> dict:
        dim = 7
        for v in obs.values():
            if isinstance(v, np.ndarray) and v.ndim == 1:
                dim = v.shape[0]
                break
        return {"actions": np.random.randn(dim).astype(np.float32)}

serve.run(MockPolicy.bind(), name="policy-v2")

server = Server(bind="tcp://*:5555", models=["policy-v2"])

@server.on_metric
def log(name, value, labels):
    if name == "inference_latency_ms":
        print(f"Client {labels.get('client')}: {value:.1f}ms")

asyncio.run(server.run())

Client (sync)

import numpy as np
from inferential import Connection

conn = Connection(server="tcp://localhost:5555", client_id="agent-01", client_type="franka")
model = conn.model("policy-v2", latency_budget_ms=30.0)

state = np.random.randn(7).astype(np.float32)
model.observe(urgency=0.8, state=state)

result = model.get_result(timeout_ms=50)
if result is not None:
    actions = result["actions"]  # np.ndarray

conn.close()

Client (async)

import asyncio
import numpy as np
from inferential import AsyncConnection

async def main():
    async with AsyncConnection(server="tcp://localhost:5555", client_id="agent-01") as conn:
        model = conn.model("policy-v2", latency_budget_ms=30.0)

        state = np.random.randn(7).astype(np.float32)
        await model.observe(urgency=0.8, state=state)

        result = await model.get_result(timeout_ms=50)
        if result is not None:
            actions = result["actions"]  # np.ndarray

asyncio.run(main())

API Reference

`Connection(server, client_id, client_type, reconnect_ivl_ms=100, reconnect_max_ms=5000)`

Creates a ZMQ DEALER connection to the server. The server address can be with or without the tcp:// prefix.

`AsyncConnection(server, client_id, client_type, ...)`

Async variant using zmq.asyncio.Context. Supports async with for automatic cleanup.

`conn.model(model_id, latency_budget_ms=50.0, priority=1) → Model / AsyncModel`

Creates a handle to a specific model on the server.

`model.observe(urgency=0.0, steps_remaining=None, **kwargs)`

Sends an observation to the server. Keyword arguments are automatically dispatched:

np.ndarray values → serialized as tensors (dtype/shape preserved)
str values → passed as metadata key-value pairs
urgency (float, 0.0–1.0) → scheduling priority hint
steps_remaining (int) → remaining steps in trajectory

model.observe(
    urgency=0.5,
    steps_remaining=120,
    state_vector=np.zeros(7, dtype=np.float32),
    image=np.zeros((3, 224, 224), dtype=np.uint8),
    prompt="describe the scene",  # → metadata
)

`model.get_result(timeout_ms=100) → dict | None`

Waits for a response. Returns a dict mapping tensor keys to numpy arrays, or None on timeout. Also includes response_id, model_id, inference_latency_ms, and any metadata from the server.

`conn.close()`

Closes the ZMQ socket. Called automatically by AsyncConnection.__aexit__.

Server Configuration

See Architecture for full details on schedulers, queue management, metrics, and configuration schema.

Documentation

Quick Start — Install, run server + client, get your first result
Architecture — System design, wire protocol, schedulers, metrics
Examples — Multi-language client demos, server extensions
Contributing — Commit conventions, branching, code style

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.4.0

Mar 25, 2026

1.3.2

Mar 19, 2026

1.3.1

Mar 19, 2026

1.3.0

Mar 19, 2026

1.2.1

Mar 17, 2026

This version

1.2.0

Mar 17, 2026

1.0.1

Mar 14, 2026

0.9.1

Mar 14, 2026

0.9.0

Mar 14, 2026

0.8.0

Mar 14, 2026

0.7.0

Mar 14, 2026

0.6.0

Mar 14, 2026

0.5.0

Mar 14, 2026

0.4.0

Mar 14, 2026

0.3.3

Mar 12, 2026

0.3.2

Mar 12, 2026

0.3.1

Mar 12, 2026

0.3.0

Mar 12, 2026

0.2.0

Mar 11, 2026

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inferential-1.2.0.tar.gz (202.1 kB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

inferential-1.2.0-py3-none-any.whl (33.3 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file inferential-1.2.0.tar.gz.

File metadata

Download URL: inferential-1.2.0.tar.gz
Upload date: Mar 17, 2026
Size: 202.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`33b5ea1db1bb3531c8bae1e77ba7af9dcaeb1fd53f6294f09bebffd0f16d60bd`
MD5	`3fae8e83e0c1a1dbd06ee785dde9d473`
BLAKE2b-256	`bdf6e33f8a5bc54eea0dbfddaa1b8b6b746f0585ff3ecce9bd94c81f1d684f34`

See more details on using hashes here.

File details

Details for the file inferential-1.2.0-py3-none-any.whl.

File metadata

Download URL: inferential-1.2.0-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 33.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for inferential-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`24472102045f0059cb574620bf60bd519b6f920f7242aef86de3701aba9679c0`
MD5	`a2ed96217ba1875c65297890ffd60737`
BLAKE2b-256	`5d06dcd58c3497a1ab33034c8bae10de515e8e9622f61d8b418732fe29620e9e`

See more details on using hashes here.

inferential 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Inferential Python SDK

Install

Quick Start

Server

Client (sync)

Client (async)

API Reference

Connection(server, client_id, client_type, reconnect_ivl_ms=100, reconnect_max_ms=5000)

AsyncConnection(server, client_id, client_type, ...)

conn.model(model_id, latency_budget_ms=50.0, priority=1) → Model / AsyncModel

model.observe(urgency=0.0, steps_remaining=None, **kwargs)

model.get_result(timeout_ms=100) → dict | None

conn.close()

Server Configuration

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`Connection(server, client_id, client_type, reconnect_ivl_ms=100, reconnect_max_ms=5000)`

`AsyncConnection(server, client_id, client_type, ...)`

`conn.model(model_id, latency_budget_ms=50.0, priority=1) → Model / AsyncModel`

`model.observe(urgency=0.0, steps_remaining=None, **kwargs)`

`model.get_result(timeout_ms=100) → dict | None`

`conn.close()`