Profile vLLM inference under RL-style rollout workloads.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alityb

These details have not been verified by PyPI

Project description

hotpath

Profiler for LLM inference. Kernel timing, request lifecycle tracing, and disaggregation analysis for vLLM and SGLang.

What it does

Profile a live vLLM or SGLang endpoint with real traffic: capture CUDA kernel timing, Prometheus server metrics, and per-request latency breakdowns.

Analyze the results: prefill vs decode phase breakdown, KV cache efficiency, prefix sharing patterns, queue depth over time, TTFT and decode-per-token distributions.

Advise on disaggregation: an analytical M/G/1 queueing model estimates whether splitting prefill and decode onto separate GPU pools improves throughput. If recommended, hotpath generates ready-to-use deployment configs for vLLM, llm-d, and Dynamo.

Install

pip install hotpath

Quick start

Profile a live vLLM server:

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --concurrency 4 \
  --duration 300 \
  --output .hotpath/run

View results:

hotpath serve-report .hotpath/run/serve_profile.db

Generate disaggregation deployment configs:

hotpath disagg-config .hotpath/run/serve_profile.db --format all

For full server-side timing (queue wait, prefill, decode phases), start vLLM with debug logging and pass the log file:

VLLM_LOGGING_LEVEL=DEBUG vllm serve <model> 2>vllm.log &

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --server-log vllm.log \
  --concurrency 4 \
  --duration 300

For kernel-level GPU phase breakdown, add --nsys:

hotpath serve-profile --endpoint http://localhost:8000 --traffic prompts.jsonl --nsys

Traffic file format

JSONL, one request per line:

{"prompt": "Explain KV cache eviction policy.", "max_tokens": 256}
{"prompt": "Write a Python retry decorator with exponential backoff.", "max_tokens": 400}

ShareGPT format is also accepted.

Commands

Command	Description
`serve-profile`	Profile a live vLLM/SGLang server with traffic replay
`serve-report`	Print a serving analysis report
`disagg-config`	Generate deployment configs for disaggregated serving
`profile`	GPU kernel profiling under RL-style rollout workloads
`report`	View a saved kernel profile
`diff`	Compare two kernel profiles
`bench`	Benchmark individual GPU kernel implementations
`export`	Export profile data to JSON, CSV, or OTLP
`doctor`	Check local profiling environment
`lock-clocks`	Lock GPU clocks for reproducible measurements

System requirements

Linux
NVIDIA GPU with CUDA driver
nsys (for kernel profiling; not required for serving analysis)
vLLM or SGLang (for serving analysis)

Build from source

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
ctest --test-dir build --output-on-failure

Install from source:

python3 -m venv .venv && . .venv/bin/activate
pip install .

Requirements: CMake 3.28+, C++20 compiler, SQLite3.

How it works

hotpath is a single C++ binary with no runtime dependencies beyond SQLite3.

Data is collected from three sources:

Kernel traces -- nsys captures GPU kernel execution. hotpath parses the output, categorizes kernels (GEMM, attention, MoE, etc.), and classifies them as prefill or decode phase by timing correlation with server events.
Server metrics -- Prometheus metrics from vLLM or SGLang /metrics endpoints are polled at 1 Hz. Batch size, queue depth, KV cache utilization, and preemption counts are tracked over the profiling window.
Request lifecycle -- vLLM debug logs are parsed to extract per-request timestamps: arrival, queue wait, prefill start, decode start, completion. These are stored as structured traces and can be exported as OpenTelemetry spans.

The disaggregation advisor uses a simplified M/G/1 queueing model to estimate whether splitting prefill and decode onto separate GPU pools would improve throughput. It searches over P:D ratios and accounts for KV transfer overhead to produce a concrete recommendation with estimated throughput improvement.

All data is stored in SQLite databases for offline analysis and comparison across runs.

Release notes

See CHANGELOG.md.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alityb

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.9

Apr 7, 2026

0.3.8

Apr 6, 2026

0.3.3

Apr 6, 2026

0.3.2

Apr 6, 2026

0.3.1

Apr 6, 2026

0.3.0

Apr 6, 2026

0.2.9

Apr 5, 2026

0.2.8

Apr 5, 2026

0.2.7

Apr 5, 2026

0.2.6

Apr 5, 2026

0.2.5

Apr 5, 2026

0.2.4

Apr 5, 2026

This version

0.2.3

Apr 5, 2026

0.2.2

Apr 5, 2026

0.2.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hotpath-0.2.3.tar.gz (208.2 kB view details)

Uploaded Apr 5, 2026 Source

File details

Details for the file hotpath-0.2.3.tar.gz.

File metadata

Download URL: hotpath-0.2.3.tar.gz
Upload date: Apr 5, 2026
Size: 208.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hotpath-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`0eddbd1a305bfea01b79f5a413eb1fce5cddd0edc8081ecfacb84d8455737bb1`
MD5	`1cd9146d695f47a766a4447220ebaf7b`
BLAKE2b-256	`e65ac9293448bc14dd541d30ec3197ec83742a95a85599c4558d27b5cefa7a33`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hotpath-0.2.3.tar.gz:

Publisher: release.yml on alityb/hotpath

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hotpath-0.2.3.tar.gz
- Subject digest: 0eddbd1a305bfea01b79f5a413eb1fce5cddd0edc8081ecfacb84d8455737bb1
- Sigstore transparency entry: 1239448768
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: alityb/hotpath@a43807f84c659e413b41a56925d017b66e0bf25d
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/alityb
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a43807f84c659e413b41a56925d017b66e0bf25d
- Trigger Event: release

hotpath 0.2.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

hotpath

What it does

Install

Quick start

Traffic file format

Commands

System requirements

Build from source

How it works

Release notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance