Skip to main content

Profile vLLM inference under RL-style rollout workloads.

Project description

hotpath

Profiler for LLM inference.

hotpath profiles live vLLM and SGLang servers, analyzes request and GPU behavior, and recommends when to split prefill and decode.

What it does

  • Profile a live endpoint with real traffic
  • Analyze queueing, prefill, decode, cache, and batching
  • Recommend disaggregation and generate deployment configs

Install

uv tool install hotpath

Quick start

Profile a live vLLM server:

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --concurrency 4 \
  --duration 60 \
  --output .hotpath/run

View the report:

hotpath serve-report .hotpath/run/serve_profile.db

Generate deployment configs:

hotpath disagg-config .hotpath/run/serve_profile.db --format all

If you want server-side request timing, start vLLM with debug logs and pass the log file:

VLLM_LOGGING_LEVEL=DEBUG vllm serve <model> 2>vllm.log &

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --server-log vllm.log \
  --concurrency 4 \
  --duration 60

If you want kernel-level GPU traces, add --nsys:

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --nsys

Traffic format

JSONL, one request per line:

{"prompt": "Explain KV cache eviction policy.", "max_tokens": 256}
{"prompt": "Write a Python retry decorator with exponential backoff.", "max_tokens": 400}

ShareGPT format is also supported.

Commands

Command Description
serve-profile Profile a live vLLM or SGLang server
serve-report Print a serving analysis report
disagg-config Generate deployment configs for disaggregated serving
profile Run GPU kernel profiling under RL-style traffic
report View a saved kernel profile
diff Compare two kernel profiles
bench Benchmark individual GPU kernel implementations
export Export profile data to JSON, CSV, or OTLP
doctor Check local profiling environment
lock-clocks Lock GPU clocks for reproducible measurements

System requirements

  • Linux
  • NVIDIA GPU with CUDA driver
  • nsys for kernel profiling
  • vLLM or SGLang for serving analysis

Build from source

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
ctest --test-dir build --output-on-failure

Install from source:

uv tool install .

Requirements: CMake 3.28+, C++20 compiler, SQLite3.

How it works

hotpath stores results in SQLite and combines three data sources:

  1. Kernel traces from nsys
  2. Server metrics from /metrics
  3. Request lifecycle timing from client traces and vLLM debug logs

The report turns those signals into latency breakdowns, cache analysis, prefix-sharing analysis, and a disaggregation recommendation.

Release notes

See CHANGELOG.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hotpath-0.3.9.tar.gz (558.5 kB view details)

Uploaded Source

File details

Details for the file hotpath-0.3.9.tar.gz.

File metadata

  • Download URL: hotpath-0.3.9.tar.gz
  • Upload date:
  • Size: 558.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hotpath-0.3.9.tar.gz
Algorithm Hash digest
SHA256 f9b391b4b8a8a2e8aa0ebc62f3ed5c867073c1bb3538790bccd0f97202fc0c5d
MD5 fa3ad835f4cd82722c3d6785a39e0db1
BLAKE2b-256 b8007a2d5be5ae34dd4fa025d8944f15f3245a475c82d5dd3a6a7d62d18ee6c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for hotpath-0.3.9.tar.gz:

Publisher: release.yml on alityb/hotpath

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page