Official Python WebSocket client for Triton Client Manager.

These details have not been verified by PyPI

Project links

Project description

`tcm-client` SDK (WebSocket) — v2.0.0-GOLDEN ready

This SDK is the supported client for the Triton Client Manager WebSocket API.

It provides a typed, ergonomic interface for:

authentication (auth)
operational queries (info.*)
management jobs (management.*)
inference requests (inference)

Connection

WebSocket endpoint: ws://<host>:<port>/ws
Health endpoints:
- GET /health (liveness)
- GET /ready (readiness)

GET /ready may return 503 with a sanitized payload if core dependencies are not healthy (or if the probe itself fails). In that case, use error_id to correlate server logs:

{
  "status": "not_ready",
  "reason": "readiness_probe_failed",
  "detail": "internal_error",
  "error_id": "..."
}

Message envelope (wire contract)

All messages share the same top-level envelope:

{
  "uuid": "client-uuid",
  "type": "auth|info|management|inference",
  "payload": {}
}

The server may also emit:

type="error" for system-level conditions (including shutdown)

Inference request payload (wire contract)

For type="inference", the payload must include:

vm_ip (string) — required (routing target)
container_id (string) — required (routing target)
model_name (string) — required
request.inputs (list) — required

Tensor inputs (JSON path)

The manager accepts two equivalent shapes in payload.request.inputs[*]:

SDK-friendly: {name, shape, datatype, data}
Manager/internal: {name, dims, type, value}

SHM inputs (zero‑copy metadata)

For SHM, each input item is an SHMReference dict:

{name, shm_key, offset, byte_size, shape, dtype}

Notes:

SHM is currently supported only for HTTP inference.
SHM is rejected for gRPC streaming requests.

Error handling model (v2.0.0-GOLDEN)

The manager has two main error shapes you must handle:

A) System-level errors (`type="error"`)

These represent conditions where the manager cannot or will not process work.

`SYSTEM_SHUTDOWN`

During shutdown draining (SIGTERM / deployment restarts), the manager explicitly NACKs queued/in-flight work:

{
  "type": "error",
  "payload": {
    "code": "SYSTEM_SHUTDOWN",
    "message": "Manager is shutting down"
  }
}

Client guidance

Treat as a stop-the-world signal: do not retry immediately.
Close the socket and reconnect with backoff.
Resume work only after GET /ready returns ready again.

Operational detail:

The manager enforces a hard 2.0s SIGTERM deadline for draining (best-effort). Plan for NACKs under deploy restarts.

B) Inference job failures (`type="inference"`, `payload.status="FAILED"`)

Inference responses always come back as:

{
  "type": "inference",
  "uuid": "client-uuid",
  "payload": {
    "status": "COMPLETED|FAILED",
    "model_name": "my-model",
    "data": {}
  }
}

When status="FAILED", payload.data may be either:

A typed Triton-facing error object (recommended contract):

{
  "code": "TRITON_TIMEOUT",
  "message": "[TritonThread] TRITON_TIMEOUT: model='my-model' retriable=True reason=Timeout",
  "retriable": true,
  "retry_after_seconds": 2
}

A string for validation/contract errors (missing fields, unknown container, etc.):

"Missing required field 'vm_ip'"

Client guidance

If payload.data is an object:
- Use code + retriable to implement retry policy (do not parse message).
- TRITON_TIMEOUT is retriable: retry with exponential backoff + jitter.
- If retry_after_seconds is present, respect it.
If payload.data is a string:
- Treat as a client-side contract error (fix request formation).

Admission Control (413 Payload Too Large)

If the manager is configured with a payload budget (e.g. TCM_MAX_REQUEST_PAYLOAD_MB>0), requests that exceed the estimated decoded payload limit fail fast with an error reason containing:

413 Payload Too Large

Example failure reason:

{
  "code": "TRITON_INFERENCE_FAILED",
  "message": "[TritonThread] TRITON_INFERENCE_FAILED: model='my-model' retriable=False reason=413 Payload Too Large: estimated_bytes=... limit_bytes=...",
  "retriable": false
}

Client guidance

This is not retriable as-is. Reduce tensor dimensions / datatype size.

Zero‑Copy Shared Memory (POSIX System SHM)

For large tensors, the recommended v2.0.0-GOLDEN path is to avoid sending tensor bytes over WebSocket JSON and instead:

Write the tensor into POSIX shared memory (e.g. /dev/shm)
Send an SHMReference object as the inference input payload (metadata only)

Capability negotiation

During auth, clients may request SHM support:

{
  "uuid": "client-uuid",
  "type": "auth",
  "payload": {
    "capability": ["json", "shm"]
  }
}

If the environment supports SHM, the manager replies with auth.ok and payload.capability including "shm".
If the client does not send capability, the manager replies with the legacy shape {"type":"auth.ok"} (no payload) to avoid breaking older clients.

`SHMReference` shape

Send SHM inputs in payload.request.inputs:

{
  "name": "INPUT__0",
  "shm_key": "/tcm_demo_input0",
  "offset": 0,
  "byte_size": 602112,
  "shape": [1, 3, 224, 224],
  "dtype": "FP32"
}

SHM error codes

TRITON_SHM_UNAVAILABLE (fatal): SHM not supported or shm key missing/inaccessible.
TRITON_SHM_REGISTRATION_FAILED (fatal): SHM registration failed on the Triton side.

Recommended retry policy (high-level)

System errors
- SYSTEM_SHUTDOWN: reconnect with backoff; wait for readiness
Retriable Triton errors (retriable=true)
- TRITON_TIMEOUT, TRITON_NETWORK, TRITON_OVERLOADED, TRITON_CIRCUIT_OPEN
- retry with exponential backoff + jitter; cap max attempts
Fatal Triton errors (retriable=false)
- do not retry; fix request or intervene operationally (model/shape/config)

Install

python -m pip install --upgrade pip
python -m pip install tcm-client

Minimal usage example (Python)

import asyncio

from tcm_client import AuthContext, TcmWebSocketClient


async def main() -> None:
    uri = "ws://127.0.0.1:8000/ws"

    ctx = AuthContext(
        uuid="client-1",
        token="opaque-or-jwt-token",
        sub="user-123",
        tenant_id="tenant-abc",
        roles=["inference"],
    )

    async with TcmWebSocketClient(uri, ctx) as client:
        await client.auth()

        # Example: call your inference helper (depends on SDK surface)
        # resp = await client.infer_http(...)
        # if resp.status == "FAILED": handle as described above


if __name__ == "__main__":
    asyncio.run(main())

CLI

tcm-client-cli --uri "ws://127.0.0.1:8000/ws" queue-stats

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Apr 22, 2026

0.1.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tcm_client-2.0.0.tar.gz (14.9 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tcm_client-2.0.0-py3-none-any.whl (13.2 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file tcm_client-2.0.0.tar.gz.

File metadata

Download URL: tcm_client-2.0.0.tar.gz
Upload date: Apr 22, 2026
Size: 14.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tcm_client-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`bcd9e1900ef5b404390e1d85a96a3cbfd757fbcdb483f85cf776e1f32a12ad9b`
MD5	`f1610dea4baf774066af31b9dbfda295`
BLAKE2b-256	`660917a8d694295c249f3c8e20d5bb1918f16fb61532cc807bbbce97ac8a93bb`

See more details on using hashes here.

File details

Details for the file tcm_client-2.0.0-py3-none-any.whl.

File metadata

Download URL: tcm_client-2.0.0-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for tcm_client-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4025c64ef8456cd4a8032cd97dd378d277bf63e469521b49a6cab948d3304626`
MD5	`f7d09467ef562e48a7f2f49692d1c787`
BLAKE2b-256	`25e3f3c61bf99a45f2e3ea47f34cae9271cd9124abe82de35ea2380e05fddfc6`

See more details on using hashes here.

tcm-client 2.0.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

tcm-client SDK (WebSocket) — v2.0.0-GOLDEN ready

Connection

Message envelope (wire contract)

Inference request payload (wire contract)

Tensor inputs (JSON path)

SHM inputs (zero‑copy metadata)

Error handling model (v2.0.0-GOLDEN)

A) System-level errors (type="error")

SYSTEM_SHUTDOWN

B) Inference job failures (type="inference", payload.status="FAILED")

Admission Control (413 Payload Too Large)

Zero‑Copy Shared Memory (POSIX System SHM)

Capability negotiation

SHMReference shape

SHM error codes

Recommended retry policy (high-level)

Install

Minimal usage example (Python)

CLI

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`tcm-client` SDK (WebSocket) — v2.0.0-GOLDEN ready

A) System-level errors (`type="error"`)

`SYSTEM_SHUTDOWN`

B) Inference job failures (`type="inference"`, `payload.status="FAILED"`)

`SHMReference` shape