CPU Top-Down Microarchitecture Analysis (Intel & ARM Neoverse) collector with MCP server, label-based querying, and pluggable SQL backends.

These details have not been verified by PyPI

Project description

topdown-profiler

CPU Top-Down Microarchitecture Analysis (TMA) collector for Intel and ARM Neoverse, with MCP server, label-based querying, and pluggable SQL backends.

Wraps pmu-tools/toplev on Intel or perf stat --topdown on ARM to collect, store, and query CPU performance data — like Polar Signals but for hardware performance counters.

What is Top-Down Microarchitecture Analysis?

TMA classifies every CPU pipeline slot into four categories that sum to 100%:

Pipeline Slots (100%)
├── Frontend_Bound    15.2%  ███████         Instruction supply problems
├── Bad_Speculation   10.1%  █████           Branch mispredictions, machine clears
├── Backend_Bound     44.6%  ██████████████  Data supply / execution bottlenecks
│   ├── Memory_Bound  30.2%  ███████████     Cache misses, DRAM latency
│   │   ├── L1_Bound   5.1%  ██
│   │   ├── L3_Bound  12.4%  ██████
│   │   └── DRAM_Bound 8.3%  ████
│   └── Core_Bound    14.4%  ███████         Port contention, dividers
└── Retiring          30.1%  ███████████     Useful work (higher = better)

This tool collects that data, stores it with labels (branch, test name, topology, etc.), and lets you query it from the CLI or via AI assistants through MCP.

Install

pip install topdown-profiler

# Or from source
git clone https://github.com/redis-performance/topdown-profiler.git
cd topdown-profiler
poetry install

Prerequisites

Linux with perf tools installed
Intel CPU (Sandy Bridge or newer) or ARM Neoverse (Graviton3/4)
pmu-tools installed (pip install pmu-tools) — Intel only
perf_event_paranoid <= 1 (or run as root)

# Check permissions
cat /proc/sys/kernel/perf_event_paranoid
# If > 1, fix with:
sudo sysctl kernel.perf_event_paranoid=1

ARM Neoverse Prerequisites

Linux kernel 5.15+ with ARM PMU perf support
perf tools installed (apt install linux-tools-$(uname -r) or yum install perf)
perf_event_paranoid <= 1 (same as Intel)
No pmu-tools required — uses perf stat --topdown directly
L1 topdown metrics only (Frontend_Bound, Backend_Bound, Bad_Speculation, Retiring)

Quick Start

Collect

Profile a process by name (not PID) with benchmark labels:

topdown collect --process redis-server --level 3 --duration 30s \
  --label git_branch=unstable \
  --label git_hash=abc123 \
  --label test_name=set-get-100 \
  --label topology=oss-standalone \
  --label client_tool=memtier \
  --label build_variant=release

Query

# What are the bottlenecks for this branch?
topdown query --label git_branch=unstable --bottlenecks

# VTune-style pipeline funnel (where do 100% of slots go?)
topdown query --funnel --label git_branch=unstable --label test_name=set-get-100

# Which benchmarks are DRAM-bound above 15%?
topdown query --bottleneck DRAM_Bound --min-pct 15

# Full TMA tree for a specific run
topdown query --run-id <id> --tree

Compare

# Compare two runs by ID
topdown compare <run-id-a> <run-id-b>

# Compare release vs debug by labels
topdown compare --label-a build_variant=release --label-b build_variant=debug

Explain

Every TMA metric has built-in descriptions, typical causes, and tuning hints:

topdown explain DRAM_Bound

╭──────────────── Description ────────────────╮
│ Backend_Bound.Memory_Bound.DRAM_Bound       │
│                                             │
│ Stalls caused by loads missing all cache    │
│ levels and going to main memory (DRAM).     │
│ Latency is typically 60-120ns (local) or    │
│ 150-300ns (remote NUMA).                    │
╰─────────────────────────────────────────────╯
╭──────────────── Typical Causes ─────────────╮
│   - Working set exceeding LLC capacity      │
│   - Random access to large hash tables      │
│   - Pointer-chasing with poor locality      │
│   - NUMA remote memory accesses             │
╰─────────────────────────────────────────────╯
╭──────────────── Tuning Hints ───────────────╮
│   - Use numactl --membind to keep data      │
│     local                                   │
│   - Configure THP for large Redis instances │
│   - Pin io-threads to same NUMA node        │
│   - Drill into MEM_Bandwidth vs             │
│     MEM_Latency                             │
╰─────────────────────────────────────────────╯

Microarchitecture Analysis Example

Here is a real-world example analyzing redis-server under a memtier benchmark:

# 1. Start your benchmark
memtier_benchmark -s 127.0.0.1 -p 6379 --test-time=60 --threads=4 --clients=50 &

# 2. Collect Level 3 TMA data while the benchmark runs
topdown collect --process redis-server --level 3 --duration 30s \
  --label git_branch=unstable \
  --label git_hash=a1b2c3d \
  --label test_name=set-get-50-50 \
  --label topology=oss-standalone \
  --label client_tool=memtier \
  --label build_variant=release \
  --label compiler=gcc-13

# Output:
# Found 1 PID(s) for 'redis-server': [12345]
# Collecting level 3 data for 30s...
# Done. Run ID: 7f3a2b1c-...
#   Samples: 2340 | Duration: 30.2s
#   Labels: 18 (7 user-supplied)

# 3. View the pipeline funnel — where are CPU cycles going?
topdown query --funnel --label test_name=set-get-50-50

# Pipeline Slots Funnel (100% total)
#   Useful work (Retiring): 31.2%
#   Wasted:                 68.8%
#
#   Frontend_Bound              12.3%  █████ ✗
#     Fetch_Latency              8.1%  ███ ✗
#       ICache_Misses            3.2%  █ ✗
#       Branch_Resteers          3.8%  █ ✗
#     Fetch_Bandwidth            4.2%  █ ✗
#   Bad_Speculation              8.5%  ███ ✗
#     Branch_Mispredicts         6.2%  ██ ✗
#   Backend_Bound               48.0%  ███████████████████ ✗
#     Memory_Bound              32.1%  ████████████ ✗
#       L1_Bound                 5.3%  ██ ✗
#       L3_Bound                12.8%  █████ ✗
#       DRAM_Bound               8.7%  ███ ✗
#       Store_Bound              3.1%  █ ✗
#     Core_Bound                15.9%  ██████ ✗
#       Ports_Utilization       13.2%  █████ ✗
#   Retiring                    31.2%  ████████████ ✓

# 4. The workload is Backend_Bound (48%) → Memory_Bound (32%) → L3_Bound (12.8%)
#    Let's understand what L3_Bound means:
topdown explain L3_Bound

# 5. Collect again after tuning (e.g., enabling io-threads)
topdown collect --process redis-server --level 3 --duration 30s \
  --label git_branch=unstable \
  --label test_name=set-get-50-50 \
  --label build_variant=release-io-threads-4

# 6. Compare the two configurations
topdown compare \
  --label-a build_variant=release \
  --label-b build_variant=release-io-threads-4 \
  --process redis-server

# Comparison: 7f3a2b1c vs 9e4d5f6a
#
# Regressions (1):
#   ↑ Frontend_Bound: 12.3% -> 14.1% (+1.8%)
# Improvements (3):
#   ↓ Backend_Bound.Memory_Bound.L3_Bound: 12.8% -> 7.2% (-5.6%)
#   ↓ Backend_Bound.Core_Bound: 15.9% -> 11.3% (-4.6%)
#   ↑ Retiring: 31.2% -> 38.5% (+7.3%)   ← more useful work!

# 7. Which of your benchmarks are DRAM-bound?
topdown query --bottleneck DRAM_Bound --min-pct 10

# Runs where DRAM_Bound >= 10%:
#   RUN ID       | VALUE  | PROCESS       | LABELS
#   7f3a2b1c     | 18.7%  | redis-server  | test_name=hset-hget, topology=oss-cluster
#   3c8d9e2f     | 12.1%  | redis-server  | test_name=zadd-zrange, topology=oss-standalone

Labels

Every run is tagged with auto-detected system labels plus user-supplied benchmark labels:

Auto-detected (zero config)

arch, kernel_version, node, cpu, pmu_name, platform, comm, pid, collector, tma_level, pmu_tools_version (Intel) / perf_version (ARM)

User-supplied (via `--label key=value`)

git_branch, git_hash, build_variant, compiler, test_name, client_tool, topology, dataset_name, tested_commands, tested_groups, github_org, github_repo, role, coordinator_version, thread_name

All labels are stored as JSON and queryable:

topdown list --label git_branch=unstable --label topology=oss-standalone
topdown query --label compiler=gcc-13 --bottlenecks

Agent Mode (Continuous Collection)

Run as a daemon that collects periodically:

# Foreground
topdown agent --process redis-server --level 2 --every 5m --duration 30s

# Install as systemd service
sudo topdown install-service --process redis-server --level 2 --every 5m

# Preview the unit file without installing
topdown install-service --process redis-server --preview

MCP Server (AI-Assisted Querying)

The MCP server lets Claude (or any MCP client) query your profiling data:

# Start MCP server (stdio for Claude Code/Desktop)
topdown mcp-serve

# HTTP transport for remote access
topdown mcp-serve --transport http --port 8000

Claude Code / Claude Desktop config

Add to .mcp.json in your project or ~/.claude/settings.json:

{
  "mcpServers": {
    "topdown": {
      "command": "topdown",
      "args": ["mcp-serve"]
    }
  }
}

Then ask Claude:

"What's the top bottleneck for redis-server on branch unstable?"
"Show me the pipeline funnel for test set-get-100"
"Which benchmarks are DRAM-bound above 15%?"
"Compare release vs debug builds for redis-server"
"Explain what L3_Bound means and how to fix it"

MCP Tools

Tool	Description
`collect_topdown`	Run a TMA collection for a process
`query_bottlenecks`	Find ranked CPU bottlenecks
`query_by_bottleneck`	Find runs matching a specific bottleneck
`get_funnel`	VTune-style pipeline slot funnel
`compare_runs`	Compare two runs by ID
`compare_by_labels`	Compare runs by label sets
`explain_metric`	Explain a TMA metric with tuning hints
`list_profiling_runs`	List recent runs

Storage Backends

SQLite (default)

Zero configuration, stored at ~/.topdown/data.db:

topdown collect --process redis-server --level 2 --duration 30s

PostgreSQL

export TOPDOWN_BACKEND=postgresql
export TOPDOWN_DSN="postgresql://user:pass@host:5432/topdown"
topdown collect --process redis-server --level 2 --duration 30s

Environment Variables

Variable	Description	Default
`TOPDOWN_BACKEND`	Storage backend (`sqlite` or `postgresql`)	`sqlite`
`TOPDOWN_DSN`	PostgreSQL connection string	—
`TOPDOWN_DB_PATH`	SQLite database path	`~/.topdown/data.db`
`TOPDOWN_TOPLEV_PATH`	Path to toplev.py (Intel only)	`toplev.py`
`TOPDOWN_PMU_TOOLS_DIR`	pmu-tools directory (Intel only)	—
`TOPDOWN_COLLECTOR`	Collector backend: `toplev` (Intel), `perf_stat` (ARM), or auto-detect	auto

Knowledge Base

120+ TMA metrics with descriptions, causes, and tuning hints covering Intel Skylake through Panther Lake and ARM Neoverse L1:

topdown explain Frontend_Bound.Fetch_Latency.ICache_Misses
topdown explain Branch_Mispredicts
topdown explain Ports_Utilization

CLI Reference

topdown collect         Collect TMA data for a process
topdown list            List recent profiling runs
topdown query           Query stored data (--bottlenecks, --tree, --funnel, --bottleneck)
topdown compare         Compare two runs (by ID or labels)
topdown explain         Explain a TMA metric
topdown agent           Continuous collection daemon
topdown install-service Install systemd service
topdown mcp-serve       Start MCP server
topdown version         Show version

Development

git clone https://github.com/redis-performance/topdown-profiler.git
cd topdown-profiler
poetry install
make test

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.3

Apr 15, 2026

This version

0.2.2

Apr 15, 2026

0.2.1

Apr 13, 2026

0.2.0

Apr 13, 2026

0.1.2

Apr 8, 2026

0.1.1

Apr 8, 2026

0.1.0

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topdown_profiler-0.2.2.tar.gz (75.6 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

topdown_profiler-0.2.2-py3-none-any.whl (86.1 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file topdown_profiler-0.2.2.tar.gz.

File metadata

Download URL: topdown_profiler-0.2.2.tar.gz
Upload date: Apr 15, 2026
Size: 75.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for topdown_profiler-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`b47982bb077375a2c3ffbfca50429849ad28f2deb7fc3a56fb2aa8aed026b286`
MD5	`194bd51f9c1c6ed5f0ee36f14fc0fbd3`
BLAKE2b-256	`8a95168d323b2bb8305d4a6c1d3cdfe06509b40c0c1a4b70876a842318ec12ff`

See more details on using hashes here.

Provenance

The following attestation bundles were made for topdown_profiler-0.2.2.tar.gz:

Publisher: release.yml on redis-performance/topdown-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: topdown_profiler-0.2.2.tar.gz
- Subject digest: b47982bb077375a2c3ffbfca50429849ad28f2deb7fc3a56fb2aa8aed026b286
- Sigstore transparency entry: 1312153929
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: redis-performance/topdown-profiler@9fee9d82942a53eb65f0966081181ea80d3ed7f6
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/redis-performance
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@9fee9d82942a53eb65f0966081181ea80d3ed7f6
- Trigger Event: push

File details

Details for the file topdown_profiler-0.2.2-py3-none-any.whl.

File metadata

Download URL: topdown_profiler-0.2.2-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 86.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for topdown_profiler-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`43fd1f91322fbaaa50c8edb6ab7678fa5b938100acf52d09172d4b176a5d9778`
MD5	`9a5d2b7070a37e55aeedabfc31fe4519`
BLAKE2b-256	`1be5776ed651124852d707154e964dfb068447a504e12a6ed690d4af3345c969`

See more details on using hashes here.

Provenance

The following attestation bundles were made for topdown_profiler-0.2.2-py3-none-any.whl:

Publisher: release.yml on redis-performance/topdown-profiler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: topdown_profiler-0.2.2-py3-none-any.whl
- Subject digest: 43fd1f91322fbaaa50c8edb6ab7678fa5b938100acf52d09172d4b176a5d9778
- Sigstore transparency entry: 1312154045
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: redis-performance/topdown-profiler@9fee9d82942a53eb65f0966081181ea80d3ed7f6
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/redis-performance
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@9fee9d82942a53eb65f0966081181ea80d3ed7f6
- Trigger Event: push

topdown-profiler 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

topdown-profiler

What is Top-Down Microarchitecture Analysis?

Install

Prerequisites

ARM Neoverse Prerequisites

Quick Start

Collect

Query

Compare

Explain

Microarchitecture Analysis Example

Labels

Auto-detected (zero config)

User-supplied (via --label key=value)

Agent Mode (Continuous Collection)

MCP Server (AI-Assisted Querying)

Claude Code / Claude Desktop config

MCP Tools

Storage Backends

SQLite (default)

PostgreSQL

Environment Variables

Knowledge Base

CLI Reference

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

User-supplied (via `--label key=value`)