AI-powered analysis for NVIDIA Nsight Systems profiles โ web viewer, timeline, kernel navigator, NVTX hierarchy
Project description
๐ฌ nsys-ai
AI-powered analysis for NVIDIA Nsight Systems profiles
Navigate GPU kernel timelines, diagnose performance bottlenecks with AI, and explore NVTX hierarchies โ from your browser or terminal.
Mission: Build an intelligent agent that truly understands GPU performance from first principles. An agent that can identify pipeline bubbles, calculate MFU, assess arithmetic intensity, and diagnose the root causes that cost millions of dollars in GPU hours โ turning months of expert debugging into minutes.
โก Install
pip install nsys-ai
That's it. No system dependencies, no CUDA required. Just Python 3.10+.
๐ Web UI First (Default)
nsys-ai is web-first. The default command opens the timeline UI in your browser.
# Default: open web timeline UI
nsys-ai my_training.nsys-rep
# Explicit command (same web UI)
nsys-ai timeline-web my_training.nsys-rep
Use TUI/CLI modes when you specifically want terminal workflows.
๐ฏ What It Does
nsys-ai reads .nsys-rep or .sqlite profile exports from NVIDIA Nsight Systems and gives you a web-first workflow plus terminal and export tools:
๐ Web TimelineMulti-GPU browser viewer with progressive rendering |
๐ฅ๏ธ Timeline TUIPerfetto-style horizontal timeline in your terminal |
๐ฒ Tree TUIInteractive NVTX hierarchy browser with kernel details |
๐ HTML ExportExportable interactive visualizations for sharing |
|
Browser-based viewer: |
|
|
Interactive HTML exports: |
๐ Quick Start
1. Get a profile
# Option A: Profile your own PyTorch training
nsys profile -o my_training python train.py
# โ produces my_training.nsys-rep (or .sqlite via --export sqlite)
# Option B: Download an example profile
cd examples/example-20-megatron-distca
python download_data.py
# โ downloads output/megatron_distca.nsys-rep
2. Explore it
# Start here: one command opens the web timeline in your browser
nsys-ai my_training.nsys-rep
# Or explicitly:
nsys-ai timeline-web my_training.nsys-rep
# Then use overview/summaries as needed
nsys-ai info my_training.nsys-rep
# GPU kernel summary
nsys-ai summary my_training.nsys-rep --gpu 0
Prefer a terminal? nsys-ai also has full TUI support:
nsys-ai timeline my_training.nsys-rep --gpu 0 --trim 39 42 # horizontal timeline nsys-ai tui my_training.nsys-rep --gpu 0 --trim 39 42 # tree browser
3. Export & share
# Perfetto JSON (open in ui.perfetto.dev)
nsys-ai export my_training.sqlite -o traces/
# Interactive HTML viewer
nsys-ai viewer my_training.sqlite --gpu 0 --trim 39 42 -o report.html
# Flat CSV/JSON for scripting
nsys-ai export-csv my_training.sqlite --gpu 0 --trim 39 42 -o kernels.csv
๐ Web Timeline
The web timeline is a browser-based multi-GPU viewer with progressive rendering โ no --trim required. This is the default view when you run nsys-ai <profile>.
# Just give it a profile โ opens in your browser
nsys-ai my_training.nsys-rep
# Or explicitly with GPU selection:
nsys-ai timeline-web my_training.nsys-rep --gpu 0 1 2 3
Features
- Multi-GPU stacked view โ all GPUs shown simultaneously with color-coded separators
- Progressive rendering โ pre-builds full NVTX tree at startup, then serves tiles instantly (~1ms per tile)
- NVTX hierarchy โ layered bars (L0โL5) showing annotation nesting per GPU
- AI chat sidebar โ press
Ato ask questions about the profile - Kernel search โ press
/to search by kernel name
Navigation
| Input | Action |
|---|---|
| Swipe left/right | Pan through time |
| Swipe up/down | Scroll through GPU streams |
| Pinch | Zoom in / out |
Shift+scroll |
Zoom in / out |
h l or โ โ |
Pan left / right |
j k or โ โ |
Select stream |
+ - |
Zoom in / out |
f or 0 |
Fit all (full time range) |
Tab |
Next kernel |
/ |
Search kernels |
n |
Toggle NVTX |
a |
AI Chat |
? |
Help overlay |
โจ๏ธ Timeline TUI
Prefer working in the terminal? The timeline TUI is a Perfetto-style horizontal viewer with per-stream kernel visualization, NVTX hierarchy bars, and a time-cursor navigation model.
Navigation
| Key | Action |
|---|---|
โ โ |
Pan through time |
Shift+โ/โ |
Page pan (1/4 viewport) |
โ โ |
Select stream |
Tab |
Snap to next kernel |
+ - |
Zoom in / out |
a |
Toggle absolute โ relative time |
Analysis
| Key | Action |
|---|---|
/ |
Filter kernels by name |
m |
Set minimum duration threshold |
d |
Toggle demangled kernel names |
C |
Open config panel |
h |
Full help overlay |
Bookmarks
| Key | Action |
|---|---|
B |
Save bookmark (with kernel + NVTX context) |
' |
Bookmark list โ press 1-9 to jump |
, . |
Cycle through bookmarks |
` |
Jump back to previous position |
[ ] |
Set range start / end |
Config Panel (C)
Tweak settings live with โ/โ to select and โ/โ to adjust:
- Selected stream rows (1-6)
- Other stream rows (1-4)
- Time tick density (2-20)
- NVTX depth levels (0-8)
- Min kernel duration filter
๐ค Claude Code plugin
nsys-ai ships as a Claude Code plugin: one slash command
/nsys-ai turns a profile into a root-cause + fix + annotated timeline. See
docs/claude-plugin-quickstart.md to install and
docs/claude-plugin.md for the full 9-mode reference.
๐ Documentation
The docs/ directory includes comprehensive guides for Nsight Systems profiling:
| Guide | Topic |
|---|---|
| CLI Reference | Full nsys command reference |
| SQLite Schema | Database tables & relationships |
| NVTX Annotations | Adding markers to your code |
| CUDA Trace | GPU kernel tracing |
| NCCL Tracing | Multi-GPU collective analysis |
| Python/PyTorch | Profiling PyTorch workloads |
| Containers | Profiling inside Docker/Slurm |
| Focused Profiling | Targeted profiling strategies |
| CUTracer Instruction Analysis | Instruction-level drill-down for top kernels |
๐ Interactive SQLite Schema Explorer
The docs/sqlite-explorer/ contains an interactive HTML tool for exploring the Nsight SQLite schema โ tables, foreign keys, example queries, and key concepts. Open docs/sqlite-explorer/index.html in a browser:
- Browse all Nsight SQLite tables with column types
- See foreign key relationships visualized
- Copy-paste ready SQL query examples
- Cross-highlighted concept explanations
๐ Profile Diff
Compare two profiles side-by-side โ spot regressions and improvements from a single command.
# Compare before and after a code change
nsys-ai diff before.sqlite after.sqlite
# Open interactive side-by-side web comparison
nsys-ai diff-web before.sqlite after.sqlite
# Focus on a specific GPU
nsys-ai diff before.sqlite after.sqlite --gpu 0
# Compare a specific time window
nsys-ai diff before.sqlite after.sqlite --trim 39 42
# Export as markdown (for GitHub issues)
nsys-ai diff before.sqlite after.sqlite --format markdown -o diff.md
# JSON output for scripting
nsys-ai diff before.sqlite after.sqlite --format json --no-ai
The report shows:
- Top regressions โ kernels that got slower (by ฮ time, %, or total)
- Top improvements โ kernels that got faster
- New / removed kernels โ workload changes across runs
- NVTX region diff โ wall-time delta for annotated regions
- Overlap diff โ compute/NCCL overlap and idle gap changes
- Per-GPU breakdown โ when no
--gpuis specified, shows every device
Options:
| Flag | Default | Description |
|---|---|---|
--gpu N |
all GPUs | Focus on a specific device |
--trim START END |
โ | Compare only this time window (seconds) |
--format |
terminal |
terminal | markdown | json |
-o / --output |
stdout | Write output to file |
--limit N |
15 | Top regressions/improvements to show |
--sort |
delta |
delta | percent | total |
--no-ai |
โ | Skip AI narration (numeric diff only) |
๐ ๏ธ All Commands
| Command | Description |
|---|---|
info |
Profile metadata & GPU hardware |
summary |
Top kernels, stream breakdown, auto-commentary |
overlap |
Compute / NCCL overlap analysis |
nccl |
NCCL collective breakdown by type |
iters |
Auto-detect training iterations |
tree |
NVTX hierarchy as text |
diff |
Before/after profile comparison (CLI) |
diff-web |
Side-by-side comparison web viewer |
tui |
Interactive tree TUI |
timeline |
Interactive timeline TUI |
timeline-web |
Web-based multi-GPU timeline (progressive rendering) |
cutracer |
Instruction-level drill-down (check, install, plan, run, analyze) |
search |
Search kernels / NVTX by name |
export |
Perfetto JSON traces |
export-csv |
Flat CSV for spreadsheets |
export-json |
Flat JSON for scripting |
viewer |
Interactive HTML report |
markdown |
NVTX hierarchy as markdown |
๐งฉ Skills (Analysis Building Blocks)
nsys-ai ships with built-in analysis skills โ self-contained analysis units that work without any LLM:
# List all available skills
nsys-ai skill list
# Run a specific skill
nsys-ai skill run top_kernels profile.sqlite
nsys-ai skill run gpu_idle_gaps profile.sqlite
nsys-ai skill run nccl_breakdown profile.sqlite
| Skill | What it does |
|---|---|
top_kernels |
Heaviest GPU kernels by total time |
memory_transfers |
H2D/D2H/D2D transfer breakdown |
nvtx_kernel_map |
NVTX annotation โ kernel mapping |
gpu_idle_gaps |
Pipeline bubbles between kernels |
nccl_breakdown |
NCCL collective operation summary |
kernel_launch_overhead |
CPUโGPU dispatch latency |
thread_utilization |
CPU thread bottleneck detection |
schema_inspect |
Database tables and columns |
module_loading |
JIT compilation & module loading stalls |
gc_impact |
GC & memory allocation stalls |
pipeline_bubble_metrics |
True GPU idle percentage per device |
cutracer_analysis |
Correlate CUTracer instruction mix with nsys/NVTX context |
Skills are extensible โ add your own by creating a Python file that exports a SKILL constant.
๐ค AI Agent
The agent is a CUDA ML systems expert that runs skills automatically and diagnoses problems:
# Full auto-analysis
nsys-ai agent analyze profile.sqlite
# Ask a specific question
nsys-ai agent ask profile.sqlite "why are there bubbles in the pipeline?"
nsys-ai agent ask profile.sqlite "is NCCL overlapping with compute?"
With pip install nsys-ai[agent], the agent can use an LLM to synthesize natural language analysis from skill results.
๐ฆ Install Tiers
pip install nsys-ai # Core: CLI + TUI + skills (rich + textual)
pip install nsys-ai[agent] # + LLM-backed agent analysis (requires anthropic)
pip install nsys-ai[cutracer] # + CUTracer instruction-level workflow
pip install nsys-ai[all] # Everything
๐ค AI Analysis (Optional)
nsys-ai includes an optional AI module that can analyze your profiles:
pip install nsys-ai[ai]
- Auto-commentary on kernel distributions and performance patterns
- NVTX annotation suggestions for un-annotated code regions
- Performance bottleneck detection with actionable recommendations
- Framework Fingerprinting statically identifies distributed stacks (vLLM, Megatron-LM, DeepSpeed) and cluster networking hardware (Mellanox, Broadcom) to context-align AI recommendations.
๐งโ๐ป Development
git clone https://github.com/GindaChen/nsys-ai.git
cd nsys-ai
pip install -e '.[dev]'
pytest tests/ -v
๐ License
MIT โ see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nsys_ai-0.2.2.tar.gz.
File metadata
- Download URL: nsys_ai-0.2.2.tar.gz
- Upload date:
- Size: 461.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7197119e68cf4786b36bdf23512dbe2ebdc7656e83e8f0ae25c8cbf7a494b925
|
|
| MD5 |
702bb8a829d6abfc801da342955768bd
|
|
| BLAKE2b-256 |
1a2789d9a0e7c4aee51ac17ea98e796e04f4ebe3ebf151392f7edd60fdcda9fc
|
File details
Details for the file nsys_ai-0.2.2-py3-none-any.whl.
File metadata
- Download URL: nsys_ai-0.2.2-py3-none-any.whl
- Upload date:
- Size: 435.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b50ad0bfaf49eb754803774598c6f2a9715be0a1d1c8ac96c01aa1e6b151310
|
|
| MD5 |
f77c60abec1c885d5ecf46f19718cda5
|
|
| BLAKE2b-256 |
7426ff84e2eae726786fc8b6fa8a627a6af179ccca08bb6df852bb274bc83721
|