CPU Top-Down Microarchitecture Analysis (Intel & ARM Neoverse) collector with MCP server, label-based querying, and pluggable SQL backends.
Project description
topdown-profiler
CPU Top-Down Microarchitecture Analysis (TMA) collector for Intel and ARM Neoverse, with MCP server, label-based querying, and pluggable SQL backends.
Wraps pmu-tools/toplev on Intel or perf stat --topdown on ARM to collect, store, and query CPU performance data — like Polar Signals but for hardware performance counters.
What is Top-Down Microarchitecture Analysis?
TMA classifies every CPU pipeline slot into four categories that sum to 100%:
Pipeline Slots (100%)
├── Frontend_Bound 15.2% ███████ Instruction supply problems
├── Bad_Speculation 10.1% █████ Branch mispredictions, machine clears
├── Backend_Bound 44.6% ██████████████ Data supply / execution bottlenecks
│ ├── Memory_Bound 30.2% ███████████ Cache misses, DRAM latency
│ │ ├── L1_Bound 5.1% ██
│ │ ├── L3_Bound 12.4% ██████
│ │ └── DRAM_Bound 8.3% ████
│ └── Core_Bound 14.4% ███████ Port contention, dividers
└── Retiring 30.1% ███████████ Useful work (higher = better)
This tool collects that data, stores it with labels (branch, test name, topology, etc.), and lets you query it from the CLI or via AI assistants through MCP.
Install
pip install topdown-profiler
# Or from source
git clone https://github.com/redis-performance/topdown-profiler.git
cd topdown-profiler
poetry install
Prerequisites
- Linux with
perftools installed - Intel CPU (Sandy Bridge or newer) or ARM Neoverse (Graviton3/4)
- pmu-tools installed (
pip install pmu-tools) — Intel only perf_event_paranoid <= 1(or run as root)
# Check permissions
cat /proc/sys/kernel/perf_event_paranoid
# If > 1, fix with:
sudo sysctl kernel.perf_event_paranoid=1
ARM Neoverse Prerequisites
- Linux kernel 5.15+ with ARM PMU perf support
perftools installed (apt install linux-tools-$(uname -r)oryum install perf)perf_event_paranoid <= 1(same as Intel)- No pmu-tools required — uses
perf stat --topdowndirectly - L1 topdown metrics only (Frontend_Bound, Backend_Bound, Bad_Speculation, Retiring)
Quick Start
Collect
Profile a process by name (not PID) with benchmark labels:
topdown collect --process redis-server --level 3 --duration 30s \
--label git_branch=unstable \
--label git_hash=abc123 \
--label test_name=set-get-100 \
--label topology=oss-standalone \
--label client_tool=memtier \
--label build_variant=release
Query
# What are the bottlenecks for this branch?
topdown query --label git_branch=unstable --bottlenecks
# VTune-style pipeline funnel (where do 100% of slots go?)
topdown query --funnel --label git_branch=unstable --label test_name=set-get-100
# Which benchmarks are DRAM-bound above 15%?
topdown query --bottleneck DRAM_Bound --min-pct 15
# Full TMA tree for a specific run
topdown query --run-id <id> --tree
Compare
# Compare two runs by ID
topdown compare <run-id-a> <run-id-b>
# Compare release vs debug by labels
topdown compare --label-a build_variant=release --label-b build_variant=debug
Explain
Every TMA metric has built-in descriptions, typical causes, and tuning hints:
topdown explain DRAM_Bound
╭──────────────── Description ────────────────╮
│ Backend_Bound.Memory_Bound.DRAM_Bound │
│ │
│ Stalls caused by loads missing all cache │
│ levels and going to main memory (DRAM). │
│ Latency is typically 60-120ns (local) or │
│ 150-300ns (remote NUMA). │
╰─────────────────────────────────────────────╯
╭──────────────── Typical Causes ─────────────╮
│ - Working set exceeding LLC capacity │
│ - Random access to large hash tables │
│ - Pointer-chasing with poor locality │
│ - NUMA remote memory accesses │
╰─────────────────────────────────────────────╯
╭──────────────── Tuning Hints ───────────────╮
│ - Use numactl --membind to keep data │
│ local │
│ - Configure THP for large Redis instances │
│ - Pin io-threads to same NUMA node │
│ - Drill into MEM_Bandwidth vs │
│ MEM_Latency │
╰─────────────────────────────────────────────╯
Microarchitecture Analysis Example
Here is a real-world example analyzing redis-server under a memtier benchmark:
# 1. Start your benchmark
memtier_benchmark -s 127.0.0.1 -p 6379 --test-time=60 --threads=4 --clients=50 &
# 2. Collect Level 3 TMA data while the benchmark runs
topdown collect --process redis-server --level 3 --duration 30s \
--label git_branch=unstable \
--label git_hash=a1b2c3d \
--label test_name=set-get-50-50 \
--label topology=oss-standalone \
--label client_tool=memtier \
--label build_variant=release \
--label compiler=gcc-13
# Output:
# Found 1 PID(s) for 'redis-server': [12345]
# Collecting level 3 data for 30s...
# Done. Run ID: 7f3a2b1c-...
# Samples: 2340 | Duration: 30.2s
# Labels: 18 (7 user-supplied)
# 3. View the pipeline funnel — where are CPU cycles going?
topdown query --funnel --label test_name=set-get-50-50
# Pipeline Slots Funnel (100% total)
# Useful work (Retiring): 31.2%
# Wasted: 68.8%
#
# Frontend_Bound 12.3% █████ ✗
# Fetch_Latency 8.1% ███ ✗
# ICache_Misses 3.2% █ ✗
# Branch_Resteers 3.8% █ ✗
# Fetch_Bandwidth 4.2% █ ✗
# Bad_Speculation 8.5% ███ ✗
# Branch_Mispredicts 6.2% ██ ✗
# Backend_Bound 48.0% ███████████████████ ✗
# Memory_Bound 32.1% ████████████ ✗
# L1_Bound 5.3% ██ ✗
# L3_Bound 12.8% █████ ✗
# DRAM_Bound 8.7% ███ ✗
# Store_Bound 3.1% █ ✗
# Core_Bound 15.9% ██████ ✗
# Ports_Utilization 13.2% █████ ✗
# Retiring 31.2% ████████████ ✓
# 4. The workload is Backend_Bound (48%) → Memory_Bound (32%) → L3_Bound (12.8%)
# Let's understand what L3_Bound means:
topdown explain L3_Bound
# 5. Collect again after tuning (e.g., enabling io-threads)
topdown collect --process redis-server --level 3 --duration 30s \
--label git_branch=unstable \
--label test_name=set-get-50-50 \
--label build_variant=release-io-threads-4
# 6. Compare the two configurations
topdown compare \
--label-a build_variant=release \
--label-b build_variant=release-io-threads-4 \
--process redis-server
# Comparison: 7f3a2b1c vs 9e4d5f6a
#
# Regressions (1):
# ↑ Frontend_Bound: 12.3% -> 14.1% (+1.8%)
# Improvements (3):
# ↓ Backend_Bound.Memory_Bound.L3_Bound: 12.8% -> 7.2% (-5.6%)
# ↓ Backend_Bound.Core_Bound: 15.9% -> 11.3% (-4.6%)
# ↑ Retiring: 31.2% -> 38.5% (+7.3%) ← more useful work!
# 7. Which of your benchmarks are DRAM-bound?
topdown query --bottleneck DRAM_Bound --min-pct 10
# Runs where DRAM_Bound >= 10%:
# RUN ID | VALUE | PROCESS | LABELS
# 7f3a2b1c | 18.7% | redis-server | test_name=hset-hget, topology=oss-cluster
# 3c8d9e2f | 12.1% | redis-server | test_name=zadd-zrange, topology=oss-standalone
Labels
Every run is tagged with auto-detected system labels plus user-supplied benchmark labels:
Auto-detected (zero config)
arch, kernel_version, node, cpu, pmu_name, platform, comm, pid, collector, tma_level, pmu_tools_version (Intel) / perf_version (ARM)
User-supplied (via --label key=value)
git_branch, git_hash, build_variant, compiler, test_name, client_tool, topology, dataset_name, tested_commands, tested_groups, github_org, github_repo, role, coordinator_version, thread_name
All labels are stored as JSON and queryable:
topdown list --label git_branch=unstable --label topology=oss-standalone
topdown query --label compiler=gcc-13 --bottlenecks
Agent Mode (Continuous Collection)
Run as a daemon that collects periodically:
# Foreground
topdown agent --process redis-server --level 2 --every 5m --duration 30s
# Install as systemd service
sudo topdown install-service --process redis-server --level 2 --every 5m
# Preview the unit file without installing
topdown install-service --process redis-server --preview
MCP Server (AI-Assisted Querying)
The MCP server lets Claude (or any MCP client) query your profiling data:
# Start MCP server (stdio for Claude Code/Desktop)
topdown mcp-serve
# HTTP transport for remote access
topdown mcp-serve --transport http --port 8000
Claude Code / Claude Desktop config
Add to .mcp.json in your project or ~/.claude/settings.json:
{
"mcpServers": {
"topdown": {
"command": "topdown",
"args": ["mcp-serve"]
}
}
}
Then ask Claude:
- "What's the top bottleneck for redis-server on branch unstable?"
- "Show me the pipeline funnel for test set-get-100"
- "Which benchmarks are DRAM-bound above 15%?"
- "Compare release vs debug builds for redis-server"
- "Explain what L3_Bound means and how to fix it"
MCP Tools
| Tool | Description |
|---|---|
collect_topdown |
Run a TMA collection for a process |
query_bottlenecks |
Find ranked CPU bottlenecks |
query_by_bottleneck |
Find runs matching a specific bottleneck |
get_funnel |
VTune-style pipeline slot funnel |
compare_runs |
Compare two runs by ID |
compare_by_labels |
Compare runs by label sets |
explain_metric |
Explain a TMA metric with tuning hints |
list_profiling_runs |
List recent runs |
Storage Backends
SQLite (default)
Zero configuration, stored at ~/.topdown/data.db:
topdown collect --process redis-server --level 2 --duration 30s
PostgreSQL
export TOPDOWN_BACKEND=postgresql
export TOPDOWN_DSN="postgresql://user:pass@host:5432/topdown"
topdown collect --process redis-server --level 2 --duration 30s
Environment Variables
| Variable | Description | Default |
|---|---|---|
TOPDOWN_BACKEND |
Storage backend (sqlite or postgresql) |
sqlite |
TOPDOWN_DSN |
PostgreSQL connection string | — |
TOPDOWN_DB_PATH |
SQLite database path | ~/.topdown/data.db |
TOPDOWN_TOPLEV_PATH |
Path to toplev.py (Intel only) | toplev.py |
TOPDOWN_PMU_TOOLS_DIR |
pmu-tools directory (Intel only) | — |
TOPDOWN_COLLECTOR |
Collector backend: toplev (Intel), perf_stat (ARM), or auto-detect |
auto |
Knowledge Base
120+ TMA metrics with descriptions, causes, and tuning hints covering Intel Skylake through Panther Lake and ARM Neoverse L1:
topdown explain Frontend_Bound.Fetch_Latency.ICache_Misses
topdown explain Branch_Mispredicts
topdown explain Ports_Utilization
CLI Reference
topdown collect Collect TMA data for a process
topdown list List recent profiling runs
topdown query Query stored data (--bottlenecks, --tree, --funnel, --bottleneck)
topdown compare Compare two runs (by ID or labels)
topdown explain Explain a TMA metric
topdown agent Continuous collection daemon
topdown install-service Install systemd service
topdown mcp-serve Start MCP server
topdown version Show version
Development
git clone https://github.com/redis-performance/topdown-profiler.git
cd topdown-profiler
poetry install
make test
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file topdown_profiler-0.2.2.tar.gz.
File metadata
- Download URL: topdown_profiler-0.2.2.tar.gz
- Upload date:
- Size: 75.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b47982bb077375a2c3ffbfca50429849ad28f2deb7fc3a56fb2aa8aed026b286
|
|
| MD5 |
194bd51f9c1c6ed5f0ee36f14fc0fbd3
|
|
| BLAKE2b-256 |
8a95168d323b2bb8305d4a6c1d3cdfe06509b40c0c1a4b70876a842318ec12ff
|
Provenance
The following attestation bundles were made for topdown_profiler-0.2.2.tar.gz:
Publisher:
release.yml on redis-performance/topdown-profiler
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
topdown_profiler-0.2.2.tar.gz -
Subject digest:
b47982bb077375a2c3ffbfca50429849ad28f2deb7fc3a56fb2aa8aed026b286 - Sigstore transparency entry: 1312153929
- Sigstore integration time:
-
Permalink:
redis-performance/topdown-profiler@9fee9d82942a53eb65f0966081181ea80d3ed7f6 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/redis-performance
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9fee9d82942a53eb65f0966081181ea80d3ed7f6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file topdown_profiler-0.2.2-py3-none-any.whl.
File metadata
- Download URL: topdown_profiler-0.2.2-py3-none-any.whl
- Upload date:
- Size: 86.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43fd1f91322fbaaa50c8edb6ab7678fa5b938100acf52d09172d4b176a5d9778
|
|
| MD5 |
9a5d2b7070a37e55aeedabfc31fe4519
|
|
| BLAKE2b-256 |
1be5776ed651124852d707154e964dfb068447a504e12a6ed690d4af3345c969
|
Provenance
The following attestation bundles were made for topdown_profiler-0.2.2-py3-none-any.whl:
Publisher:
release.yml on redis-performance/topdown-profiler
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
topdown_profiler-0.2.2-py3-none-any.whl -
Subject digest:
43fd1f91322fbaaa50c8edb6ab7678fa5b938100acf52d09172d4b176a5d9778 - Sigstore transparency entry: 1312154045
- Sigstore integration time:
-
Permalink:
redis-performance/topdown-profiler@9fee9d82942a53eb65f0966081181ea80d3ed7f6 -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/redis-performance
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9fee9d82942a53eb65f0966081181ea80d3ed7f6 -
Trigger Event:
push
-
Statement type: