Profile vLLM inference under RL-style rollout workloads.
Project description
hotpath
Profiler for LLM inference.
hotpath profiles live vLLM and SGLang servers, analyzes request and GPU behavior, and recommends when to split prefill and decode.
What it does
- Profile a live endpoint with real traffic
- Analyze queueing, prefill, decode, cache, and batching
- Recommend disaggregation and generate deployment configs
Install
uv tool install hotpath
Quick start
Profile a live vLLM server:
hotpath serve-profile \
--endpoint http://localhost:8000 \
--traffic prompts.jsonl \
--concurrency 4 \
--duration 60 \
--output .hotpath/run
View the report:
hotpath serve-report .hotpath/run/serve_profile.db
Generate deployment configs:
hotpath disagg-config .hotpath/run/serve_profile.db --format all
If you want server-side request timing, start vLLM with debug logs and pass the log file:
VLLM_LOGGING_LEVEL=DEBUG vllm serve <model> 2>vllm.log &
hotpath serve-profile \
--endpoint http://localhost:8000 \
--traffic prompts.jsonl \
--server-log vllm.log \
--concurrency 4 \
--duration 60
If you want kernel-level GPU traces, add --nsys:
hotpath serve-profile \
--endpoint http://localhost:8000 \
--traffic prompts.jsonl \
--nsys
Traffic format
JSONL, one request per line:
{"prompt": "Explain KV cache eviction policy.", "max_tokens": 256}
{"prompt": "Write a Python retry decorator with exponential backoff.", "max_tokens": 400}
ShareGPT format is also supported.
Commands
| Command | Description |
|---|---|
serve-profile |
Profile a live vLLM or SGLang server |
serve-report |
Print a serving analysis report |
disagg-config |
Generate deployment configs for disaggregated serving |
profile |
Run GPU kernel profiling under RL-style traffic |
report |
View a saved kernel profile |
diff |
Compare two kernel profiles |
bench |
Benchmark individual GPU kernel implementations |
export |
Export profile data to JSON, CSV, or OTLP |
doctor |
Check local profiling environment |
lock-clocks |
Lock GPU clocks for reproducible measurements |
System requirements
- Linux
- NVIDIA GPU with CUDA driver
nsysfor kernel profiling- vLLM or SGLang for serving analysis
Build from source
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
ctest --test-dir build --output-on-failure
Install from source:
uv tool install .
Requirements: CMake 3.28+, C++20 compiler, SQLite3.
How it works
hotpath stores results in SQLite and combines three data sources:
- Kernel traces from
nsys - Server metrics from
/metrics - Request lifecycle timing from client traces and vLLM debug logs
The report turns those signals into latency breakdowns, cache analysis, prefix-sharing analysis, and a disaggregation recommendation.
Release notes
See CHANGELOG.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file hotpath-0.2.8.tar.gz.
File metadata
- Download URL: hotpath-0.2.8.tar.gz
- Upload date:
- Size: 216.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd17a605649a69322c80cc95d603057a5d8d250e575c678dfd3759f31cb2589f
|
|
| MD5 |
16ba4409f11e70cea2024a895f6c6013
|
|
| BLAKE2b-256 |
8c2a6eb9cf319b15c6267e7811d4d637ab128eb0d34f7bde3135f84cf2642184
|
Provenance
The following attestation bundles were made for hotpath-0.2.8.tar.gz:
Publisher:
release.yml on alityb/hotpath
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hotpath-0.2.8.tar.gz -
Subject digest:
fd17a605649a69322c80cc95d603057a5d8d250e575c678dfd3759f31cb2589f - Sigstore transparency entry: 1239498523
- Sigstore integration time:
-
Permalink:
alityb/hotpath@e0cee529f710b9b3576ae114312c6e49d82d15d7 -
Branch / Tag:
refs/tags/v0.2.8 - Owner: https://github.com/alityb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e0cee529f710b9b3576ae114312c6e49d82d15d7 -
Trigger Event:
release
-
Statement type: