Skip to main content

Rust-backed performance metrics and request tracing

Project description

llm-runtime-metrics (Python)

Python bindings for request metrics and Prometheus export.

Install from PyPI as llm-runtime-metrics.

Import in Python as:

import llm_runtime_metrics

The supported package root focuses on the request-metrics workflow.

Add LLM Metrics To An Existing Prometheus Server

from prometheus_client import CollectorRegistry, start_http_server
from llm_runtime_metrics import (
    REQUEST_FEATURE_IMAGE,
    REQUEST_FEATURE_TOOLS,
    RequestMetricsCollector,
    RequestMetricsFactory,
)

# Reuse your existing registry if you already have one.
registry = CollectorRegistry()

factory = RequestMetricsFactory(
    request_log_enabled=False,
    metric_prefix="llm_runtime",
    metrics_window_seconds=60.0,
    metrics_quantiles=[0.5, 0.9, 0.99],
)

# Registers a custom collector that pulls fresh samples from `factory` at scrape time.
RequestMetricsCollector(
    factory,
    base_labels={"service": "text-generation", "engine": "vllm"},
    registry=registry,
)

# If your app already exposes /metrics, wire this into that server instead.
start_http_server(8000, registry=registry)


# Example lifecycle hooks in your inference code:
def on_request_start(prompt_token_ids: list[int]):
    features = REQUEST_FEATURE_TOOLS | REQUEST_FEATURE_IMAGE
    return factory.new_request(prompt_token_ids, features=features)


def on_stream_step(req_metrics, full_output_token_ids: list[int], cached_tokens: int | None):
    # Use `is_diff=False` when passing cumulative token ids.
    req_metrics.record_tokens(full_output_token_ids, cached_tokens=cached_tokens, is_diff=False)


def on_request_success(req_metrics):
    req_metrics.success()


def on_request_cancel(req_metrics):
    req_metrics.cancel()

The exported request latency distributions include request_metrics_ttft_ms, request_metrics_time_to_fifth_token_ms, request_metrics_first_to_fifth_token_itl_ms, and request_metrics_itl_ms. request_metrics_time_to_fifth_token_ms is the elapsed time from request start until the fifth output token is observed by record_tokens. request_metrics_first_to_fifth_token_itl_ms is the elapsed time between observing the first and fifth output tokens.

The inflight gauges also include early-decode pressure between the first and fifth output token: current_inflight_first_to_fifth_token_requests, mean_inflight_first_to_fifth_token_requests, current_inflight_first_to_fifth_token_input_tokens, and mean_inflight_first_to_fifth_token_input_tokens. The input-token gauges count prompt tokens attached to requests that have emitted at least one output token but have not emitted their fifth token yet.

Available request feature bits:

  • REQUEST_FEATURE_NONE
  • REQUEST_FEATURE_XGRAMMAR
  • REQUEST_FEATURE_TOOLS
  • REQUEST_FEATURE_IMAGE

If you need plain text output instead of a collector, call:

text = factory.prometheus_strfmt({"service": "text-generation", "engine": "vllm"})

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_runtime_metrics-0.0.13.tar.gz (86.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ x86-64

llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_i686.whl (7.1 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ i686

llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_armv7l.whl (6.8 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ ARMv7l

llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.14tmusllinux: musl 1.2+ ARM64

llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64

llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (8.1 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ ppc64le

llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_i686.manylinux2014_i686.whl (7.3 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ i686

llm_runtime_metrics-0.0.13-cp314-cp314t-macosx_11_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.14tmacOS 11.0+ ARM64

llm_runtime_metrics-0.0.13-cp314-cp314t-macosx_10_12_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.14tmacOS 10.12+ x86-64

llm_runtime_metrics-0.0.13-cp310-abi3-win_amd64.whl (5.6 MB view details)

Uploaded CPython 3.10+Windows x86-64

llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ x86-64

llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_i686.whl (7.1 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ i686

llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_armv7l.whl (6.8 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARMv7l

llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_aarch64.whl (6.9 MB view details)

Uploaded CPython 3.10+musllinux: musl 1.2+ ARM64

llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_ppc64le.whl (8.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ppc64le

llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_armv7l.whl (6.6 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARMv7l

llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_aarch64.whl (6.7 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (7.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ i686

llm_runtime_metrics-0.0.13-cp310-abi3-macosx_11_0_arm64.whl (6.0 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

llm_runtime_metrics-0.0.13-cp310-abi3-macosx_10_12_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file llm_runtime_metrics-0.0.13.tar.gz.

File metadata

  • Download URL: llm_runtime_metrics-0.0.13.tar.gz
  • Upload date:
  • Size: 86.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.14.0

File hashes

Hashes for llm_runtime_metrics-0.0.13.tar.gz
Algorithm Hash digest
SHA256 6a2662d168f8a943c71457314f3a909ff8b4c9fa242271f7bd98bfcaee31302d
MD5 e3a42b7ce47003b5582c8d1c5e7980da
BLAKE2b-256 d30cda0d33c36f7661650450e578981913097019d29b5c6e59b2c11f383b1ce0

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 dae3ee5718ddb577be444c394a4ee174e63368a2b847da90db6d6902268f4815
MD5 7d23eaba1e9f4d2eced564e9c86af38a
BLAKE2b-256 74ea412c79f9f9946638e9f52e05dfd93bc32048a06daa8fa694e31b3decf460

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 469d27a7b6c67ad11b9a7439bdd369348514e26da029adea1050eb92912ef1f9
MD5 f67ca700c49bcdcc5e35024480b717fa
BLAKE2b-256 65b6af16b794885016751a62377c329dce24fb155bf36b93c7cb80f5fb2ddac0

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 0a25224a1696752a1bcef9570a2479b1131a0546557b570c28f8b929c2e5bf18
MD5 6e11ef02ed75333f9826385a97d2b855
BLAKE2b-256 85a65c3296ca8b53717fa4d533e7bc0a7b01d1f832b620a94625a2094c6432dd

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 8354493f4ccdaf1a3dd4856cf86bba263141d460ae0a6e867293154b30e32970
MD5 cfcc2297c74565b58e02bbc4213e15b7
BLAKE2b-256 30cbf1fa6ee2a9cd9abdce1ab2f6e6c8e9f2f1730e18feb066e2ca6053a8ebf0

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0624df75e33df5e7fd5b08ac9e800727a1246c150a0af585ee3b3cc5821e4b74
MD5 de095f53c85d14e4a862cc638ce7887b
BLAKE2b-256 a33c31163b511f77b02ce1eca806361c2b23a54758d33eb01adc21cd284ca7de

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 dd20590edec3f8c0b7f5f13678b5932134d4c7149bb8442228bf9b2d5c3f8ae1
MD5 f2525eecfe7795c86ba39eebab41bfb4
BLAKE2b-256 eead4905ac487966167f6944f14b1ec30dd5033af0c55f9a86d8f5ee87d177ff

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 2159b6792bffbebdcaa76bad7fea9e24691fcb9b28721404135450d46c672016
MD5 cd2f5d8be5c68861aacf2e1d5a9557d6
BLAKE2b-256 7a2d874f23aae1d6979c57ae3b406950a459bad4b292cd90d65be43b41cb6ea9

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c65b571f87fe8cad1b5d70ebb0afa7009aa390542370f986a6c9443bdad8315f
MD5 de3a1dabfcd81e84913bd2d46366e12b
BLAKE2b-256 b2add217ccd368022f5173a4fe8f81149099c8794dffec4982af568581e290e9

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp314-cp314t-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp314-cp314t-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 07bad66c7427490abac21f1978f239ff048eac594ef60c2b0470bc8940643e8e
MD5 4e15d123d6d0e2f4ae4e90d75edae339
BLAKE2b-256 d0507a78cabb6bdda281057945f98a7ac4952aabb9937572dc48aab8b522ebc6

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 bdbd9b94e53c52dcb1e9bbdb6e63b01af9086d207791e58e1b992e87d147ccb9
MD5 dcd57000e2445edbb0b94ac4ae4747d5
BLAKE2b-256 f555546bb02d25fad3018e297fb2be53c59994b3ed90a26d7ba4041507c69a7b

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 a2e7d8f052f9d531a287a88fddf2f20370bc661cbd47609e5e64184cee7f4ba0
MD5 3400768f35df0f414af90c1aeee71959
BLAKE2b-256 f69579c8596b4a43b04b0b8498963f934887644f0115972e16ca44ce29d4016b

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 0d806aad9999da154cb71c366d5ede6bea66fbd05bc9f992c2f75cad2249efec
MD5 14ea3634f6be773a5acfabf456985eee
BLAKE2b-256 856f639d66d8b5a5f24eac11e09e731acf50fb6ef0009b918ef756b6839f1ddc

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 87d3604b73d3e03b171c1d0769e5e42d0089a8c2a7761f818014fea0ec9b641a
MD5 55029b6e457d0e2f4882a658bde0e466
BLAKE2b-256 56a2608b58409ca6ea70a0cba105fe9ddb5cc34d8fd46d995efe6109d3cdd9cd

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 d2612d93da55a3c28737c124795b6ac2edd4502a592efd28fc56a0d4625c2723
MD5 bd039c1a2aecf4fa23cc4becad2a8629
BLAKE2b-256 949d61a181d825c77b3433a7b81425371e37681c7240311fa70f7709fd49c4e1

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_ppc64le.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_ppc64le.whl
Algorithm Hash digest
SHA256 6e624aa2a53dbfd17860934053d708a372c3559a5e95dcd6125a3d7b8fd06a66
MD5 7c5f14ae4295ce18c6f69c30d45f0489
BLAKE2b-256 c5c34a566078457744593ef91676846dfae86c6a69fb18739033d163f3a41f59

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_armv7l.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_armv7l.whl
Algorithm Hash digest
SHA256 451b44a6149dca0ba3430c79854cbcbafd72b9adf590916baf8d01e2ab25acfb
MD5 c72066d409a3cf550a91af543267c8a7
BLAKE2b-256 3c9bd849ab29a7c168c612e93790ece422968a14f73c291a25b31e9e83a6f82e

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bc92167656b05bddc6a45665422b6d1a579b85f9c589b755a248466bfcd89720
MD5 fdfe1a5380a8d966b12689d3851bf38b
BLAKE2b-256 4ffb78a540352b57b3e7efa75dff516ddb9dd4f731596c73f2a7286ec3384759

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 132ef9830bc28d46a7f9ea8a47bc823129f748d44545fae76baf790c255d6af3
MD5 478622c18bb45a677dd0f58b3c368152
BLAKE2b-256 c169d0e6173fc870ec292876ab149b95e27056ad7822698cf67b2a3acb3112e0

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 55297982bafbb795c65b9047e34d447220ce16944edaab68409b688b0a153078
MD5 f11454488fe6d385253500af354451d1
BLAKE2b-256 7de00255abf09378a87c3a1487c9c1c969916e0645cf6daea36ce95f1601cd64

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b4a00a9ba7d533b78b6da4b1b49b661d694c702cdaabc66b7308a411e0c1b9b6
MD5 4972c437cc7b6e1b097580d760d4bc7b
BLAKE2b-256 e6e6b36470411339997fbcb91bbaf07c71d3160f9426710c204b9db13f619d83

See more details on using hashes here.

File details

Details for the file llm_runtime_metrics-0.0.13-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for llm_runtime_metrics-0.0.13-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0ed8986a2934289659098595bc2b0f849de8963fb0648ab742b54bf23dfa5f95
MD5 a988bc7b02ff40e355a2fdb78341f1df
BLAKE2b-256 4cb8bd4ae3e9011a8e0d91c6ab60a5649e517e3116293786653bafb318083dcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page