Skip to main content

AI-powered analysis for NVIDIA Nsight Systems profiles โ€” web viewer, timeline, kernel navigator, NVTX hierarchy

Project description

๐Ÿ”ฌ nsys-ai

AI-powered analysis for NVIDIA Nsight Systems profiles

Navigate GPU kernel timelines, diagnose performance bottlenecks with AI, and explore NVTX hierarchies โ€” from your browser or terminal.

Mission: Build an intelligent agent that truly understands GPU performance from first principles. An agent that can identify pipeline bubbles, calculate MFU, assess arithmetic intensity, and diagnose the root causes that cost millions of dollars in GPU hours โ€” turning months of expert debugging into minutes.

CI PyPI Python 3.10+ License: MIT


โšก Install

pip install nsys-ai

That's it. No system dependencies, no CUDA required. Just Python 3.10+.


๐ŸŒ Web UI First (Default)

nsys-ai is web-first. The default command opens the timeline UI in your browser.

# Default: open web timeline UI
nsys-ai my_training.nsys-rep

# Explicit command (same web UI)
nsys-ai timeline-web my_training.nsys-rep

Use TUI/CLI modes when you specifically want terminal workflows.


๐ŸŽฏ What It Does

nsys-ai reads .nsys-rep or .sqlite profile exports from NVIDIA Nsight Systems and gives you a web-first workflow plus terminal and export tools:

๐ŸŒ Web Timeline

Multi-GPU browser viewer with progressive rendering

๐Ÿ–ฅ๏ธ Timeline TUI

Perfetto-style horizontal timeline in your terminal

๐ŸŒฒ Tree TUI

Interactive NVTX hierarchy browser with kernel details

๐Ÿ“„ HTML Export

Exportable interactive visualizations for sharing

Browser-based viewer:
โ€ข Multi-GPU stacked streams
โ€ข NVTX hierarchy bars
โ€ข Pinch-to-zoom, trackpad pan
โ€ข AI chat sidebar

S21 โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–ˆโ–ˆโ–ˆ
S56 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ
S60 โ–‘โ–‘โ–‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–ˆโ–ˆ
    |         โ”‚
    39.1s   39.5s
โ–ผ Iteration (324ms)
  โ–ผ forward (180ms)
    โ–ผ Attention (89ms)
      โ–  flash_fwd  26ms
      โ–  flash_bwd  63ms

Interactive HTML exports:
โ€ข NVTX stack viewer
โ€ข SQLite schema explorer
โ€ข Perfetto JSON traces


๐Ÿš€ Quick Start

1. Get a profile

# Option A: Profile your own PyTorch training
nsys profile -o my_training python train.py
# โ†’ produces my_training.nsys-rep  (or .sqlite via --export sqlite)

# Option B: Download an example profile
cd examples/example-20-megatron-distca
python download_data.py
# โ†’ downloads output/megatron_distca.nsys-rep

2. Explore it

# Start here: one command opens the web timeline in your browser
nsys-ai my_training.nsys-rep

# Or explicitly:
nsys-ai timeline-web my_training.nsys-rep

# Then use overview/summaries as needed
nsys-ai info my_training.nsys-rep

# GPU kernel summary
nsys-ai summary my_training.nsys-rep --gpu 0

Prefer a terminal? nsys-ai also has full TUI support:

nsys-ai timeline my_training.nsys-rep --gpu 0 --trim 39 42  # horizontal timeline
nsys-ai tui my_training.nsys-rep --gpu 0 --trim 39 42       # tree browser

3. Export & share

# Perfetto JSON (open in ui.perfetto.dev)
nsys-ai export my_training.sqlite -o traces/

# Interactive HTML viewer
nsys-ai viewer my_training.sqlite --gpu 0 --trim 39 42 -o report.html

# Flat CSV/JSON for scripting
nsys-ai export-csv my_training.sqlite --gpu 0 --trim 39 42 -o kernels.csv

๐ŸŒ Web Timeline

The web timeline is a browser-based multi-GPU viewer with progressive rendering โ€” no --trim required. This is the default view when you run nsys-ai <profile>.

# Just give it a profile โ€” opens in your browser
nsys-ai my_training.nsys-rep

# Or explicitly with GPU selection:
nsys-ai timeline-web my_training.nsys-rep --gpu 0 1 2 3

Features

  • Multi-GPU stacked view โ€” all GPUs shown simultaneously with color-coded separators
  • Progressive rendering โ€” pre-builds full NVTX tree at startup, then serves tiles instantly (~1ms per tile)
  • NVTX hierarchy โ€” layered bars (L0โ€“L5) showing annotation nesting per GPU
  • AI chat sidebar โ€” press A to ask questions about the profile
  • Kernel search โ€” press / to search by kernel name

Navigation

Input Action
Swipe left/right Pan through time
Swipe up/down Scroll through GPU streams
Pinch Zoom in / out
Shift+scroll Zoom in / out
h l or โ† โ†’ Pan left / right
j k or โ†‘ โ†“ Select stream
+ - Zoom in / out
f or 0 Fit all (full time range)
Tab Next kernel
/ Search kernels
n Toggle NVTX
a AI Chat
? Help overlay

โŒจ๏ธ Timeline TUI

Prefer working in the terminal? The timeline TUI is a Perfetto-style horizontal viewer with per-stream kernel visualization, NVTX hierarchy bars, and a time-cursor navigation model.

Navigation

Key Action
โ† โ†’ Pan through time
Shift+โ†/โ†’ Page pan (1/4 viewport)
โ†‘ โ†“ Select stream
Tab Snap to next kernel
+ - Zoom in / out
a Toggle absolute โ†” relative time

Analysis

Key Action
/ Filter kernels by name
m Set minimum duration threshold
d Toggle demangled kernel names
C Open config panel
h Full help overlay

Bookmarks

Key Action
B Save bookmark (with kernel + NVTX context)
' Bookmark list โ€” press 1-9 to jump
, . Cycle through bookmarks
` Jump back to previous position
[ ] Set range start / end

Config Panel (C)

Tweak settings live with โ†‘/โ†“ to select and โ†/โ†’ to adjust:

  • Selected stream rows (1-6)
  • Other stream rows (1-4)
  • Time tick density (2-20)
  • NVTX depth levels (0-8)
  • Min kernel duration filter

๐Ÿค– Claude Code plugin

nsys-ai ships as a Claude Code plugin: one slash command /nsys-ai turns a profile into a root-cause + fix + annotated timeline. See docs/claude-plugin-quickstart.md to install and docs/claude-plugin.md for the full 9-mode reference.


๐Ÿ“š Documentation

The docs/ directory includes comprehensive guides for Nsight Systems profiling:

Guide Topic
CLI Reference Full nsys command reference
SQLite Schema Database tables & relationships
NVTX Annotations Adding markers to your code
CUDA Trace GPU kernel tracing
NCCL Tracing Multi-GPU collective analysis
Python/PyTorch Profiling PyTorch workloads
Containers Profiling inside Docker/Slurm
Focused Profiling Targeted profiling strategies
CUTracer Instruction Analysis Instruction-level drill-down for top kernels

๐Ÿ” Interactive SQLite Schema Explorer

The docs/sqlite-explorer/ contains an interactive HTML tool for exploring the Nsight SQLite schema โ€” tables, foreign keys, example queries, and key concepts. Open docs/sqlite-explorer/index.html in a browser:

  • Browse all Nsight SQLite tables with column types
  • See foreign key relationships visualized
  • Copy-paste ready SQL query examples
  • Cross-highlighted concept explanations

๐Ÿ”€ Profile Diff

Compare two profiles side-by-side โ€” spot regressions and improvements from a single command.

# Compare before and after a code change
nsys-ai diff before.sqlite after.sqlite

# Open interactive side-by-side web comparison
nsys-ai diff-web before.sqlite after.sqlite

# Focus on a specific GPU
nsys-ai diff before.sqlite after.sqlite --gpu 0

# Compare a specific time window
nsys-ai diff before.sqlite after.sqlite --trim 39 42

# Export as markdown (for GitHub issues)
nsys-ai diff before.sqlite after.sqlite --format markdown -o diff.md

# JSON output for scripting
nsys-ai diff before.sqlite after.sqlite --format json --no-ai

The report shows:

  • Top regressions โ€” kernels that got slower (by ฮ” time, %, or total)
  • Top improvements โ€” kernels that got faster
  • New / removed kernels โ€” workload changes across runs
  • NVTX region diff โ€” wall-time delta for annotated regions
  • Overlap diff โ€” compute/NCCL overlap and idle gap changes
  • Per-GPU breakdown โ€” when no --gpu is specified, shows every device

Options:

Flag Default Description
--gpu N all GPUs Focus on a specific device
--trim START END โ€” Compare only this time window (seconds)
--format terminal terminal | markdown | json
-o / --output stdout Write output to file
--limit N 15 Top regressions/improvements to show
--sort delta delta | percent | total
--no-ai โ€” Skip AI narration (numeric diff only)

๐Ÿ› ๏ธ All Commands

Command Description
info Profile metadata & GPU hardware
summary Top kernels, stream breakdown, auto-commentary
overlap Compute / NCCL overlap analysis
nccl NCCL collective breakdown by type
iters Auto-detect training iterations
tree NVTX hierarchy as text
diff Before/after profile comparison (CLI)
diff-web Side-by-side comparison web viewer
tui Interactive tree TUI
timeline Interactive timeline TUI
timeline-web Web-based multi-GPU timeline (progressive rendering)
cutracer Instruction-level drill-down (check, install, plan, run, analyze)
search Search kernels / NVTX by name
export Perfetto JSON traces
export-csv Flat CSV for spreadsheets
export-json Flat JSON for scripting
viewer Interactive HTML report
markdown NVTX hierarchy as markdown

๐Ÿงฉ Skills (Analysis Building Blocks)

nsys-ai ships with built-in analysis skills โ€” self-contained analysis units that work without any LLM:

# List all available skills
nsys-ai skill list

# Run a specific skill
nsys-ai skill run top_kernels profile.sqlite
nsys-ai skill run gpu_idle_gaps profile.sqlite
nsys-ai skill run nccl_breakdown profile.sqlite
Skill What it does
top_kernels Heaviest GPU kernels by total time
memory_transfers H2D/D2H/D2D transfer breakdown
nvtx_kernel_map NVTX annotation โ†’ kernel mapping
gpu_idle_gaps Pipeline bubbles between kernels
nccl_breakdown NCCL collective operation summary
kernel_launch_overhead CPUโ†’GPU dispatch latency
thread_utilization CPU thread bottleneck detection
schema_inspect Database tables and columns
module_loading JIT compilation & module loading stalls
gc_impact GC & memory allocation stalls
pipeline_bubble_metrics True GPU idle percentage per device
cutracer_analysis Correlate CUTracer instruction mix with nsys/NVTX context

Skills are extensible โ€” add your own by creating a Python file that exports a SKILL constant.


๐Ÿค– AI Agent

The agent is a CUDA ML systems expert that runs skills automatically and diagnoses problems:

# Full auto-analysis
nsys-ai agent analyze profile.sqlite

# Ask a specific question
nsys-ai agent ask profile.sqlite "why are there bubbles in the pipeline?"
nsys-ai agent ask profile.sqlite "is NCCL overlapping with compute?"

With pip install nsys-ai[agent], the agent can use an LLM to synthesize natural language analysis from skill results.


๐Ÿ“ฆ Install Tiers

pip install nsys-ai          # Core: CLI + TUI + skills (rich + textual)
pip install nsys-ai[agent]   # + LLM-backed agent analysis (requires anthropic)
pip install nsys-ai[cutracer] # + CUTracer instruction-level workflow
pip install nsys-ai[all]     # Everything

๐Ÿค– AI Analysis (Optional)

nsys-ai includes an optional AI module that can analyze your profiles:

pip install nsys-ai[ai]
  • Auto-commentary on kernel distributions and performance patterns
  • NVTX annotation suggestions for un-annotated code regions
  • Performance bottleneck detection with actionable recommendations
  • Framework Fingerprinting statically identifies distributed stacks (vLLM, Megatron-LM, DeepSpeed) and cluster networking hardware (Mellanox, Broadcom) to context-align AI recommendations.

๐Ÿง‘โ€๐Ÿ’ป Development

git clone https://github.com/GindaChen/nsys-ai.git
cd nsys-ai
pip install -e '.[dev]'
pytest tests/ -v

๐Ÿ“„ License

MIT โ€” see LICENSE.

Built for GPU performance engineers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nsys_ai-0.2.2.tar.gz (461.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nsys_ai-0.2.2-py3-none-any.whl (435.0 kB view details)

Uploaded Python 3

File details

Details for the file nsys_ai-0.2.2.tar.gz.

File metadata

  • Download URL: nsys_ai-0.2.2.tar.gz
  • Upload date:
  • Size: 461.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nsys_ai-0.2.2.tar.gz
Algorithm Hash digest
SHA256 7197119e68cf4786b36bdf23512dbe2ebdc7656e83e8f0ae25c8cbf7a494b925
MD5 702bb8a829d6abfc801da342955768bd
BLAKE2b-256 1a2789d9a0e7c4aee51ac17ea98e796e04f4ebe3ebf151392f7edd60fdcda9fc

See more details on using hashes here.

File details

Details for the file nsys_ai-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: nsys_ai-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 435.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nsys_ai-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4b50ad0bfaf49eb754803774598c6f2a9715be0a1d1c8ac96c01aa1e6b151310
MD5 f77c60abec1c885d5ecf46f19718cda5
BLAKE2b-256 7426ff84e2eae726786fc8b6fa8a627a6af179ccca08bb6df852bb274bc83721

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page