Official Python WebSocket client for Triton Client Manager.
Project description
tcm-client SDK (WebSocket) — v2.0.0-GOLDEN ready
This SDK is the supported client for the Triton Client Manager WebSocket API.
It provides a typed, ergonomic interface for:
- authentication (
auth) - operational queries (
info.*) - management jobs (
management.*) - inference requests (
inference)
Connection
- WebSocket endpoint:
ws://<host>:<port>/ws - Health endpoints:
GET /health(liveness)GET /ready(readiness)
GET /ready may return 503 with a sanitized payload if core dependencies are not healthy (or if the
probe itself fails). In that case, use error_id to correlate server logs:
{
"status": "not_ready",
"reason": "readiness_probe_failed",
"detail": "internal_error",
"error_id": "..."
}
Message envelope (wire contract)
All messages share the same top-level envelope:
{
"uuid": "client-uuid",
"type": "auth|info|management|inference",
"payload": {}
}
The server may also emit:
type="error"for system-level conditions (including shutdown)
Inference request payload (wire contract)
For type="inference", the payload must include:
vm_ip(string) — required (routing target)container_id(string) — required (routing target)model_name(string) — requiredrequest.inputs(list) — required
Tensor inputs (JSON path)
The manager accepts two equivalent shapes in payload.request.inputs[*]:
- SDK-friendly:
{name, shape, datatype, data} - Manager/internal:
{name, dims, type, value}
SHM inputs (zero‑copy metadata)
For SHM, each input item is an SHMReference dict:
{name, shm_key, offset, byte_size, shape, dtype}
Notes:
- SHM is currently supported only for HTTP inference.
- SHM is rejected for gRPC streaming requests.
Error handling model (v2.0.0-GOLDEN)
The manager has two main error shapes you must handle:
A) System-level errors (type="error")
These represent conditions where the manager cannot or will not process work.
SYSTEM_SHUTDOWN
During shutdown draining (SIGTERM / deployment restarts), the manager explicitly NACKs queued/in-flight work:
{
"type": "error",
"payload": {
"code": "SYSTEM_SHUTDOWN",
"message": "Manager is shutting down"
}
}
Client guidance
- Treat as a stop-the-world signal: do not retry immediately.
- Close the socket and reconnect with backoff.
- Resume work only after
GET /readyreturns ready again.
Operational detail:
- The manager enforces a hard 2.0s SIGTERM deadline for draining (best-effort). Plan for NACKs under deploy restarts.
B) Inference job failures (type="inference", payload.status="FAILED")
Inference responses always come back as:
{
"type": "inference",
"uuid": "client-uuid",
"payload": {
"status": "COMPLETED|FAILED",
"model_name": "my-model",
"data": {}
}
}
When status="FAILED", payload.data may be either:
- A typed Triton-facing error object (recommended contract):
{
"code": "TRITON_TIMEOUT",
"message": "[TritonThread] TRITON_TIMEOUT: model='my-model' retriable=True reason=Timeout",
"retriable": true,
"retry_after_seconds": 2
}
- A string for validation/contract errors (missing fields, unknown container, etc.):
"Missing required field 'vm_ip'"
Client guidance
- If
payload.datais an object:- Use
code+retriableto implement retry policy (do not parsemessage). TRITON_TIMEOUTis retriable: retry with exponential backoff + jitter.- If
retry_after_secondsis present, respect it.
- Use
- If
payload.datais a string:- Treat as a client-side contract error (fix request formation).
Admission Control (413 Payload Too Large)
If the manager is configured with a payload budget (e.g. TCM_MAX_REQUEST_PAYLOAD_MB>0),
requests that exceed the estimated decoded payload limit fail fast with an error reason containing:
413 Payload Too Large
Example failure reason:
{
"code": "TRITON_INFERENCE_FAILED",
"message": "[TritonThread] TRITON_INFERENCE_FAILED: model='my-model' retriable=False reason=413 Payload Too Large: estimated_bytes=... limit_bytes=...",
"retriable": false
}
Client guidance
- This is not retriable as-is. Reduce tensor dimensions / datatype size.
Zero‑Copy Shared Memory (POSIX System SHM)
For large tensors, the recommended v2.0.0-GOLDEN path is to avoid sending tensor bytes over WebSocket JSON and instead:
- Write the tensor into POSIX shared memory (e.g.
/dev/shm) - Send an
SHMReferenceobject as the inference input payload (metadata only)
Capability negotiation
During auth, clients may request SHM support:
{
"uuid": "client-uuid",
"type": "auth",
"payload": {
"capability": ["json", "shm"]
}
}
- If the environment supports SHM, the manager replies with
auth.okandpayload.capabilityincluding"shm". - If the client does not send
capability, the manager replies with the legacy shape{"type":"auth.ok"}(nopayload) to avoid breaking older clients.
SHMReference shape
Send SHM inputs in payload.request.inputs:
{
"name": "INPUT__0",
"shm_key": "/tcm_demo_input0",
"offset": 0,
"byte_size": 602112,
"shape": [1, 3, 224, 224],
"dtype": "FP32"
}
SHM error codes
TRITON_SHM_UNAVAILABLE(fatal): SHM not supported or shm key missing/inaccessible.TRITON_SHM_REGISTRATION_FAILED(fatal): SHM registration failed on the Triton side.
Recommended retry policy (high-level)
- System errors
SYSTEM_SHUTDOWN: reconnect with backoff; wait for readiness
- Retriable Triton errors (
retriable=true)TRITON_TIMEOUT,TRITON_NETWORK,TRITON_OVERLOADED,TRITON_CIRCUIT_OPEN- retry with exponential backoff + jitter; cap max attempts
- Fatal Triton errors (
retriable=false)- do not retry; fix request or intervene operationally (model/shape/config)
Install
python -m pip install --upgrade pip
python -m pip install tcm-client
Minimal usage example (Python)
import asyncio
from tcm_client import AuthContext, TcmWebSocketClient
async def main() -> None:
uri = "ws://127.0.0.1:8000/ws"
ctx = AuthContext(
uuid="client-1",
token="opaque-or-jwt-token",
sub="user-123",
tenant_id="tenant-abc",
roles=["inference"],
)
async with TcmWebSocketClient(uri, ctx) as client:
await client.auth()
# Example: call your inference helper (depends on SDK surface)
# resp = await client.infer_http(...)
# if resp.status == "FAILED": handle as described above
if __name__ == "__main__":
asyncio.run(main())
CLI
tcm-client-cli --uri "ws://127.0.0.1:8000/ws" queue-stats
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tcm_client-2.0.0.tar.gz.
File metadata
- Download URL: tcm_client-2.0.0.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcd9e1900ef5b404390e1d85a96a3cbfd757fbcdb483f85cf776e1f32a12ad9b
|
|
| MD5 |
f1610dea4baf774066af31b9dbfda295
|
|
| BLAKE2b-256 |
660917a8d694295c249f3c8e20d5bb1918f16fb61532cc807bbbce97ac8a93bb
|
File details
Details for the file tcm_client-2.0.0-py3-none-any.whl.
File metadata
- Download URL: tcm_client-2.0.0-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4025c64ef8456cd4a8032cd97dd378d277bf63e469521b49a6cab948d3304626
|
|
| MD5 |
f7d09467ef562e48a7f2f49692d1c787
|
|
| BLAKE2b-256 |
25e3f3c61bf99a45f2e3ea47f34cae9271cd9124abe82de35ea2380e05fddfc6
|