Skip to main content

Profile vLLM inference under RL-style rollout workloads.

Project description

hotpath

Profiler for LLM inference. Kernel timing, request lifecycle tracing, and disaggregation analysis for vLLM and SGLang.

What it does

Profile a live vLLM or SGLang endpoint with real traffic: capture CUDA kernel timing, Prometheus server metrics, and per-request latency breakdowns.

Analyze the results: prefill vs decode phase breakdown, KV cache efficiency, prefix sharing patterns, queue depth over time, TTFT and decode-per-token distributions.

Advise on disaggregation: an analytical M/G/1 queueing model estimates whether splitting prefill and decode onto separate GPU pools improves throughput. If recommended, hotpath generates ready-to-use deployment configs for vLLM, llm-d, and Dynamo.

Install

pip install hotpath

Quick start

Profile a live vLLM server:

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --concurrency 4 \
  --duration 300 \
  --output .hotpath/run

View results:

hotpath serve-report .hotpath/run/serve_profile.db

Generate disaggregation deployment configs:

hotpath disagg-config .hotpath/run/serve_profile.db --format all

For full server-side timing (queue wait, prefill, decode phases), start vLLM with debug logging and pass the log file:

VLLM_LOGGING_LEVEL=DEBUG vllm serve <model> 2>vllm.log &

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --server-log vllm.log \
  --concurrency 4 \
  --duration 300

For kernel-level GPU phase breakdown, add --nsys:

hotpath serve-profile --endpoint http://localhost:8000 --traffic prompts.jsonl --nsys

Traffic file format

JSONL, one request per line:

{"prompt": "Explain KV cache eviction policy.", "max_tokens": 256}
{"prompt": "Write a Python retry decorator with exponential backoff.", "max_tokens": 400}

ShareGPT format is also accepted.

Commands

Command Description
serve-profile Profile a live vLLM/SGLang server with traffic replay
serve-report Print a serving analysis report
disagg-config Generate deployment configs for disaggregated serving
profile GPU kernel profiling under RL-style rollout workloads
report View a saved kernel profile
diff Compare two kernel profiles
bench Benchmark individual GPU kernel implementations
export Export profile data to JSON, CSV, or OTLP
doctor Check local profiling environment
lock-clocks Lock GPU clocks for reproducible measurements

System requirements

  • Linux
  • NVIDIA GPU with CUDA driver
  • nsys (for kernel profiling; not required for serving analysis)
  • vLLM or SGLang (for serving analysis)

Build from source

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
ctest --test-dir build --output-on-failure

Install from source:

python3 -m venv .venv && . .venv/bin/activate
pip install .

Requirements: CMake 3.28+, C++20 compiler, SQLite3.

How it works

hotpath is a single C++ binary with no runtime dependencies beyond SQLite3.

Data is collected from three sources:

  1. Kernel traces -- nsys captures GPU kernel execution. hotpath parses the output, categorizes kernels (GEMM, attention, MoE, etc.), and classifies them as prefill or decode phase by timing correlation with server events.

  2. Server metrics -- Prometheus metrics from vLLM or SGLang /metrics endpoints are polled at 1 Hz. Batch size, queue depth, KV cache utilization, and preemption counts are tracked over the profiling window.

  3. Request lifecycle -- vLLM debug logs are parsed to extract per-request timestamps: arrival, queue wait, prefill start, decode start, completion. These are stored as structured traces and can be exported as OpenTelemetry spans.

The disaggregation advisor uses a simplified M/G/1 queueing model to estimate whether splitting prefill and decode onto separate GPU pools would improve throughput. It searches over P:D ratios and accounts for KV transfer overhead to produce a concrete recommendation with estimated throughput improvement.

All data is stored in SQLite databases for offline analysis and comparison across runs.

Release notes

See CHANGELOG.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hotpath-0.2.0.tar.gz (207.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

hotpath-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

hotpath-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

hotpath-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

File details

Details for the file hotpath-0.2.0.tar.gz.

File metadata

  • Download URL: hotpath-0.2.0.tar.gz
  • Upload date:
  • Size: 207.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hotpath-0.2.0.tar.gz
Algorithm Hash digest
SHA256 37f7dae481bef070c4411df324e5443733ed92f861df1ffa282141120d7c9315
MD5 2bb0b60c9cda0f7a2e15c61d0634724e
BLAKE2b-256 4e44bb054d3e079e1270ec9045b9d684001f952b771bdd3e8bebbc7f92026652

See more details on using hashes here.

Provenance

The following attestation bundles were made for hotpath-0.2.0.tar.gz:

Publisher: release.yml on alityb/hotpath

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hotpath-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hotpath-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 47da4a215fc45256dcf822950e541c8245a41c356c552cab2a835d58b24e61e4
MD5 f99d9f254b36d3e595b77615d0df70c6
BLAKE2b-256 56c7b35b6ad0091237a235c551d032e3bf5a0ffd45b0f00f9bdb97ab30b96d6b

See more details on using hashes here.

Provenance

The following attestation bundles were made for hotpath-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: release.yml on alityb/hotpath

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hotpath-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hotpath-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e13b1e926b790ca5e6a4726f19b8cd9610af6e43c7c0b68aed98cfcff5ba3750
MD5 c9a86ddd6428fe2e4c0c790f2c654365
BLAKE2b-256 2f62067f0fefcbd1466400393c505ffab65a6638ef9bd6dff569b17c0beb7e90

See more details on using hashes here.

Provenance

The following attestation bundles were made for hotpath-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: release.yml on alityb/hotpath

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hotpath-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for hotpath-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 db5d1d76e4bd418a4fed7bceb92030aff2a84573d6cbbed414355bdf45796ffd
MD5 07264bffbbad2a5369b09778681ab623
BLAKE2b-256 be3bcabfbc83fcc2b5440f7d03adc3ce8661a0b0aff63088520d38b8a2cb627a

See more details on using hashes here.

Provenance

The following attestation bundles were made for hotpath-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: release.yml on alityb/hotpath

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page