Skip to main content

Official Python WebSocket client for Triton Client Manager.

Project description

tcm-client SDK (WebSocket) — v2.0.0-GOLDEN ready

This SDK is the supported client for the Triton Client Manager WebSocket API.

It provides a typed, ergonomic interface for:

  • authentication (auth)
  • operational queries (info.*)
  • management jobs (management.*)
  • inference requests (inference)

Connection

  • WebSocket endpoint: ws://<host>:<port>/ws
  • Health endpoints:
    • GET /health (liveness)
    • GET /ready (readiness)

GET /ready may return 503 with a sanitized payload if core dependencies are not healthy (or if the probe itself fails). In that case, use error_id to correlate server logs:

{
  "status": "not_ready",
  "reason": "readiness_probe_failed",
  "detail": "internal_error",
  "error_id": "..."
}

Message envelope (wire contract)

All messages share the same top-level envelope:

{
  "uuid": "client-uuid",
  "type": "auth|info|management|inference",
  "payload": {}
}

The server may also emit:

  • type="error" for system-level conditions (including shutdown)

Inference request payload (wire contract)

For type="inference", the payload must include:

  • vm_ip (string) — required (routing target)
  • container_id (string) — required (routing target)
  • model_name (string) — required
  • request.inputs (list) — required

Tensor inputs (JSON path)

The manager accepts two equivalent shapes in payload.request.inputs[*]:

  • SDK-friendly: {name, shape, datatype, data}
  • Manager/internal: {name, dims, type, value}

SHM inputs (zero‑copy metadata)

For SHM, each input item is an SHMReference dict:

  • {name, shm_key, offset, byte_size, shape, dtype}

Notes:

  • SHM is currently supported only for HTTP inference.
  • SHM is rejected for gRPC streaming requests.

Error handling model (v2.0.0-GOLDEN)

The manager has two main error shapes you must handle:

A) System-level errors (type="error")

These represent conditions where the manager cannot or will not process work.

SYSTEM_SHUTDOWN

During shutdown draining (SIGTERM / deployment restarts), the manager explicitly NACKs queued/in-flight work:

{
  "type": "error",
  "payload": {
    "code": "SYSTEM_SHUTDOWN",
    "message": "Manager is shutting down"
  }
}

Client guidance

  • Treat as a stop-the-world signal: do not retry immediately.
  • Close the socket and reconnect with backoff.
  • Resume work only after GET /ready returns ready again.

Operational detail:

  • The manager enforces a hard 2.0s SIGTERM deadline for draining (best-effort). Plan for NACKs under deploy restarts.

B) Inference job failures (type="inference", payload.status="FAILED")

Inference responses always come back as:

{
  "type": "inference",
  "uuid": "client-uuid",
  "payload": {
    "status": "COMPLETED|FAILED",
    "model_name": "my-model",
    "data": {}
  }
}

When status="FAILED", payload.data may be either:

  1. A typed Triton-facing error object (recommended contract):
{
  "code": "TRITON_TIMEOUT",
  "message": "[TritonThread] TRITON_TIMEOUT: model='my-model' retriable=True reason=Timeout",
  "retriable": true,
  "retry_after_seconds": 2
}
  1. A string for validation/contract errors (missing fields, unknown container, etc.):
"Missing required field 'vm_ip'"

Client guidance

  • If payload.data is an object:
    • Use code + retriable to implement retry policy (do not parse message).
    • TRITON_TIMEOUT is retriable: retry with exponential backoff + jitter.
    • If retry_after_seconds is present, respect it.
  • If payload.data is a string:
    • Treat as a client-side contract error (fix request formation).

Admission Control (413 Payload Too Large)

If the manager is configured with a payload budget (e.g. TCM_MAX_REQUEST_PAYLOAD_MB>0), requests that exceed the estimated decoded payload limit fail fast with an error reason containing:

413 Payload Too Large

Example failure reason:

{
  "code": "TRITON_INFERENCE_FAILED",
  "message": "[TritonThread] TRITON_INFERENCE_FAILED: model='my-model' retriable=False reason=413 Payload Too Large: estimated_bytes=... limit_bytes=...",
  "retriable": false
}

Client guidance

  • This is not retriable as-is. Reduce tensor dimensions / datatype size.

Zero‑Copy Shared Memory (POSIX System SHM)

For large tensors, the recommended v2.0.0-GOLDEN path is to avoid sending tensor bytes over WebSocket JSON and instead:

  • Write the tensor into POSIX shared memory (e.g. /dev/shm)
  • Send an SHMReference object as the inference input payload (metadata only)

Capability negotiation

During auth, clients may request SHM support:

{
  "uuid": "client-uuid",
  "type": "auth",
  "payload": {
    "capability": ["json", "shm"]
  }
}
  • If the environment supports SHM, the manager replies with auth.ok and payload.capability including "shm".
  • If the client does not send capability, the manager replies with the legacy shape {"type":"auth.ok"} (no payload) to avoid breaking older clients.

SHMReference shape

Send SHM inputs in payload.request.inputs:

{
  "name": "INPUT__0",
  "shm_key": "/tcm_demo_input0",
  "offset": 0,
  "byte_size": 602112,
  "shape": [1, 3, 224, 224],
  "dtype": "FP32"
}

SHM error codes

  • TRITON_SHM_UNAVAILABLE (fatal): SHM not supported or shm key missing/inaccessible.
  • TRITON_SHM_REGISTRATION_FAILED (fatal): SHM registration failed on the Triton side.

Recommended retry policy (high-level)

  • System errors
    • SYSTEM_SHUTDOWN: reconnect with backoff; wait for readiness
  • Retriable Triton errors (retriable=true)
    • TRITON_TIMEOUT, TRITON_NETWORK, TRITON_OVERLOADED, TRITON_CIRCUIT_OPEN
    • retry with exponential backoff + jitter; cap max attempts
  • Fatal Triton errors (retriable=false)
    • do not retry; fix request or intervene operationally (model/shape/config)

Install

python -m pip install --upgrade pip
python -m pip install tcm-client

Minimal usage example (Python)

import asyncio

from tcm_client import AuthContext, TcmWebSocketClient


async def main() -> None:
    uri = "ws://127.0.0.1:8000/ws"

    ctx = AuthContext(
        uuid="client-1",
        token="opaque-or-jwt-token",
        sub="user-123",
        tenant_id="tenant-abc",
        roles=["inference"],
    )

    async with TcmWebSocketClient(uri, ctx) as client:
        await client.auth()

        # Example: call your inference helper (depends on SDK surface)
        # resp = await client.infer_http(...)
        # if resp.status == "FAILED": handle as described above


if __name__ == "__main__":
    asyncio.run(main())

CLI

tcm-client-cli --uri "ws://127.0.0.1:8000/ws" queue-stats

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcm_client-2.0.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tcm_client-2.0.0-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file tcm_client-2.0.0.tar.gz.

File metadata

  • Download URL: tcm_client-2.0.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tcm_client-2.0.0.tar.gz
Algorithm Hash digest
SHA256 bcd9e1900ef5b404390e1d85a96a3cbfd757fbcdb483f85cf776e1f32a12ad9b
MD5 f1610dea4baf774066af31b9dbfda295
BLAKE2b-256 660917a8d694295c249f3c8e20d5bb1918f16fb61532cc807bbbce97ac8a93bb

See more details on using hashes here.

File details

Details for the file tcm_client-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: tcm_client-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tcm_client-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4025c64ef8456cd4a8032cd97dd378d277bf63e469521b49a6cab948d3304626
MD5 f7d09467ef562e48a7f2f49692d1c787
BLAKE2b-256 25e3f3c61bf99a45f2e3ea47f34cae9271cd9124abe82de35ea2380e05fddfc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page