Profile vLLM inference under RL-style rollout workloads.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alityb

These details have not been verified by PyPI

Project description

hotpath

Profiler for LLM inference.

hotpath profiles live vLLM and SGLang servers, analyzes request and GPU behavior, and recommends when to split prefill and decode.

What it does

Profile a live endpoint with real traffic
Analyze queueing, prefill, decode, cache, and batching
Recommend disaggregation and generate deployment configs

Install

uv tool install hotpath

Quick start

Profile a live vLLM server:

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --concurrency 4 \
  --duration 60 \
  --output .hotpath/run

View the report:

hotpath serve-report .hotpath/run/serve_profile.db

Generate deployment configs:

hotpath disagg-config .hotpath/run/serve_profile.db --format all

If you want server-side request timing, start vLLM with debug logs and pass the log file:

VLLM_LOGGING_LEVEL=DEBUG vllm serve <model> 2>vllm.log &

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --server-log vllm.log \
  --concurrency 4 \
  --duration 60

If you want kernel-level GPU traces, add --nsys:

hotpath serve-profile \
  --endpoint http://localhost:8000 \
  --traffic prompts.jsonl \
  --nsys

Traffic format

JSONL, one request per line:

{"prompt": "Explain KV cache eviction policy.", "max_tokens": 256}
{"prompt": "Write a Python retry decorator with exponential backoff.", "max_tokens": 400}

ShareGPT format is also supported.

Commands

Command	Description
`serve-profile`	Profile a live vLLM or SGLang server
`serve-report`	Print a serving analysis report
`disagg-config`	Generate deployment configs for disaggregated serving
`profile`	Run GPU kernel profiling under RL-style traffic
`report`	View a saved kernel profile
`diff`	Compare two kernel profiles
`bench`	Benchmark individual GPU kernel implementations
`export`	Export profile data to JSON, CSV, or OTLP
`doctor`	Check local profiling environment
`lock-clocks`	Lock GPU clocks for reproducible measurements

System requirements

Linux
NVIDIA GPU with CUDA driver
nsys for kernel profiling
vLLM or SGLang for serving analysis

Build from source

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
ctest --test-dir build --output-on-failure

Install from source:

uv tool install .

Requirements: CMake 3.28+, C++20 compiler, SQLite3.

How it works

hotpath stores results in SQLite and combines three data sources:

Kernel traces from nsys
Server metrics from /metrics
Request lifecycle timing from client traces and vLLM debug logs

The report turns those signals into latency breakdowns, cache analysis, prefix-sharing analysis, and a disaggregation recommendation.

Release notes

See CHANGELOG.md.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

alityb

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.9

Apr 7, 2026

This version

0.3.8

Apr 6, 2026

0.3.3

Apr 6, 2026

0.3.2

Apr 6, 2026

0.3.1

Apr 6, 2026

0.3.0

Apr 6, 2026

0.2.9

Apr 5, 2026

0.2.8

Apr 5, 2026

0.2.7

Apr 5, 2026

0.2.6

Apr 5, 2026

0.2.5

Apr 5, 2026

0.2.4

Apr 5, 2026

0.2.3

Apr 5, 2026

0.2.2

Apr 5, 2026

0.2.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hotpath-0.3.8.tar.gz (558.5 kB view details)

Uploaded Apr 6, 2026 Source

File details

Details for the file hotpath-0.3.8.tar.gz.

File metadata

Download URL: hotpath-0.3.8.tar.gz
Upload date: Apr 6, 2026
Size: 558.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hotpath-0.3.8.tar.gz
Algorithm	Hash digest
SHA256	`52079d95a04c70f001415f0d9cb839cae3f7f1f5fe7841a0dcc0a378fa972a6b`
MD5	`0b315a3a0e96d798ca120d71e05291ca`
BLAKE2b-256	`e89ff0c1ab67f07201cd3cc78d43cc72b39e3dfb413aa39b82d759b6caad3ee6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hotpath-0.3.8.tar.gz:

Publisher: release.yml on alityb/hotpath

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hotpath-0.3.8.tar.gz
- Subject digest: 52079d95a04c70f001415f0d9cb839cae3f7f1f5fe7841a0dcc0a378fa972a6b
- Sigstore transparency entry: 1242498735
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: alityb/hotpath@ae6f4758c1712ab26ad3c93917c06dfa42f8806c
- Branch / Tag: refs/tags/v0.3.8
- Owner: https://github.com/alityb
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@ae6f4758c1712ab26ad3c93917c06dfa42f8806c
- Trigger Event: release

hotpath 0.3.8

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

hotpath

What it does

Install

Quick start

Traffic format

Commands

System requirements

Build from source

How it works

Release notes

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance