CLI tool to view TPU metrics

These details have not been verified by PyPI

Project links

Project description

`tpu-info` CLI

tpu-info is a simple CLI tool for detecting Cloud TPU devices and reading runtime metrics from libtpu, including memory usage and duty cycle. It supports both a static, one-time snapshot and a live streaming mode to monitor metrics continuously.

Note: to access libtpu utilization metrics, you must have a workload running with a supported ML framework, such as JAX or PyTorch/XLA. See the Usage section for more information.

What's New in Version 0.11.0

🚀 New Features

Group and aggregate transfer latency metrics by buffer size. (Matches calculation on Monarch / Metric Explorer on GCP.)

Installing

Install the latest release using pip:

pip install tpu-info

Alternatively, install tpu-info from source:

pip install git+https://github.com/AI-Hypercomputer/cloud-accelerator-diagnostics/#subdirectory=tpu_info

Usage

To view current TPU utilization data, tpu-info requires a running TPU workload with supported ML framework[^1] such as JAX or PyTorch/XLA. For example:

# JAX
>>> import jax
>>> jax.device_count()
4
# Create a tensor on the TPU
>>> t = jax.numpy.ones((300, 300))

# PyTorch/XLA
>>> import torch
>>> import torch_xla
>>> t = torch.randn((300, 300), device=torch_xla.device())

Then, on the same machine, you can run the tpu-info command in your terminal.

Static Mode

Run the following command for a one-time snapshot of the current metrics.

$ tpu-info
Libtpu version: 0.0.19.dev20250721+nightly
Accelerator type: v6e

TPU Chips

| Chip        | Type         | Devices | PID     |
|-------------|--------------|---------|---------|
| /dev/vfio/0 | TPU v6e chip | 1       | 1469584 |
| /dev/vfio/1 | TPU v6e chip | 1       | 1469584 |
| /dev/vfio/2 | TPU v6e chip | 1       | 1469584 |
| /dev/vfio/3 | TPU v6e chip | 1       | 1469584 |

TPU Runtime Utilization

| Chip | HBM Usage (GiB)       | Duty cycle |
|------|-----------------------|------------|
| 0    | 18.45 GiB / 31.25 GiB |    100.00% |
| 1    | 10.40 GiB / 31.25 GiB |    100.00% |
| 2    | 10.40 GiB / 31.25 GiB |    100.00% |
| 3    | 10.40 GiB / 31.25 GiB |    100.00% |

TensorCore Utilization

| Core ID | TensorCore Utilization |
|---------|------------------------|
| 0       | 13.60%                 |
| 1       | 14.81%                 |
| 2       | 14.36%                 |
| 3       | 13.60%                 |

TPU Buffer Transfer Latency

| Buffer Size  | P50          | P90          | P95          | P999         |
|--------------|--------------|--------------|--------------|--------------|
| 8MB+         | 108978.82 us | 164849.81 us | 177366.42 us | 212419.07 us |
| 4MB+         | 21739.38 us  | 38126.84 us  | 42110.12 us  | 55474.21 us  |

TPU Inbound Buffer Transfer Latency

| Buffer Size  | P50          | P90          | P95          | P999         |
|--------------|--------------|--------------|--------------|--------------|
| 8MB+         | 18945.59 us  | 34461.46 us  | 39652.74 us  | 56051.94 us  |
| 4MB+         | 4829.09 us   | 8594.43 us   | 10236.53 us  | 17754.86 us  |

TPU Host Compute Latency

| Buffer Size  | P50          | P90          | P95          | P999         |
|--------------|--------------|--------------|--------------|--------------|
| 8MB+         | 998.17 us    | 3605.34 us   | 6292.10 us   | 11608.01 us  |
| 4MB+         | 678.33 us    | 2611.93 us   | 5258.30 us   | 11083.23 us  |

TPU gRPC TCP Minimum RTT

| P50      | P90      | P95      | P999     |
|----------|----------|----------|----------|
| 35.99 us | 52.15 us | 53.83 us | 55.51 us |

TPU gRPC TCP Delivery Rate

| P50           | P90           | P95           | P999          |
|---------------|---------------|---------------|---------------|
| 12305.96 Mbps | 18367.10 Mbps | 24872.11 Mbps | 44841.55 Mbps |

Streaming Mode

You can run tpu-info in a streaming mode to periodically refresh and display the utilization statistics.

# Refresh stats every 2 seconds
tpu-info --streaming --rate 2

Refresh rate: 0.1s
Last update: 2025-07-24 11:00:59 UTC
Libtpu version: 0.0.19.dev20250721+nightly
Accelerator type: v6e

TPU Chips

| Chip         | Type         | Devices | PID    |
|--------------|--------------|---------|--------|
| /dev/vfio/0  | TPU v6e chip | 1       | 1022   |
| /dev/vfio/1  | TPU v6e chip | 1       | 1022   |
| /dev/vfio/2  | TPU v6e chip | 1       | 1022   |
| /dev/vfio/3  | TPU v6e chip | 1       | 1022   |

TPU Runtime Utilization

| Chip   | HBM Usage (GiB)          | Duty cycle |
|--------|--------------------------|------------|
| 8      | 17.26 GiB / 31.25 GiB    |    100.00% |
| 9      |  9.26 GiB / 31.25 GiB    |    100.00% |
| 12     |  9.26 GiB / 31.25 GiB    |    100.00% |
| 13     |  9.26 GiB / 31.25 GiB    |    100.00% |

TensorCore Utilization

| Core ID | TensorCore Utilization |
|---------|------------------------|
| 0       | 15.17%                 |
| 1       | 14.62%                 |
| 2       | 14.68%                 |
| 3       | 15.14%                 |

TPU Buffer Transfer Latency

| Buffer Size  | P50          | P90          | P95          | P999         |
|--------------|--------------|--------------|--------------|--------------|
| 8MB+         | 18264.03 us  | 33263.06 us  | 35990.98 us  | 53997.32 us  |

TPU Inbound Buffer Transfer Latency

| Buffer Size  | P50          | P90          | P95          | P999         |
|--------------|--------------|--------------|--------------|--------------|
| 8MB+         | 18316.95 us  | 32857.03 us  | 36501.59 us  | 58854.54 us  |

TPU Host Compute Latency

| Buffer Size  | P50          | P90          | P95          | P999         |
|--------------|--------------|--------------|--------------|--------------|
| 8MB+         | 678.33 us    | 2611.93 us   | 5258.30 us   | 11083.23 us  |

TPU gRPC TCP Minimum RTT

| P50      | P90      | P95      | P999     |
|----------|----------|----------|----------|
| 35.99 us | 52.15 us | 53.83 us | 55.51 us |

TPU gRPC TCP Delivery Rate

| P50           | P90           | P95           | P999          |
|---------------|---------------|---------------|---------------|
| 12305.96 Mbps | 18367.10 Mbps | 24872.11 Mbps | 44841.55 Mbps |

Version

To check the installed version of tpu-info, libtpu version and accelerator type of the TPU chip, use the --version or -v flag.

Compatible Environment:

$ tpu-info --version
- tpu-info version: 0.8.0
- libtpu version: 0.0.18
- accelerator type: v6e

Incompatible Environment (Python 3.12+):

$ tpu-info --version
- tpu-info version: 0.8.0
- libtpu version: N/A (incompatible environment)
- accelerator type: N/A (incompatible environment)

Process

You can use the --process or -p flag to display information about the processes currently running on the TPU.

$ tpu-info --process
TPU Process Info

| Chip        | PID    | Process Name |
|-------------|--------|--------------|
| /dev/vfio/0 | 799657 | python3      |
| /dev/vfio/1 | 799657 | python3      |
| /dev/vfio/2 | 799657 | python3      |
| /dev/vfio/3 | 799657 | python3      |
| /dev/vfio/4 | 799657 | python3      |
| /dev/vfio/5 | 799657 | python3      |
| /dev/vfio/6 | 799657 | python3      |
| /dev/vfio/7 | 799657 | python3      |

List Metrics

You can use the --list_metrics flag to display all supported metrics that can be given along with the --metric flag.

$ tpu-info --list_metrics
╭─ Supported Metrics ─────────────────────────────────────────────────────────────────────────────╮
│         buffer_transfer_latency                                                                 │
│         collective_e2e_latency                                                                  │
│         core_state                                                                              │
│         device_to_host_transfer_latency                                                         │
│         duty_cycle_percent                                                                      │
│         grpc_tcp_delivery_rate                                                                  │
│         grpc_tcp_min_rtt                                                                        │
│         hbm_usage                                                                               │
│         hlo_exec_timing                                                                         │
│         hlo_queue_size                                                                          │
│         host_compute_latency                                                                    │
│         host_to_device_transfer_latency                                                         │
│         inbound_buffer_transfer_latency                                                         │
│         queued_programs                                                                         │
│         sequencer_state                                                                         │
│         sequencer_state_detailed                                                                │
│         tensorcore_utilization                                                                  │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯

Metric

You can use the --metric flag to display specific metrics. You can specify multiple metrics separated by spaces with multiple --metric flags.

$ tpu-info --metric duty_cycle_percent --metric hbm_usage
TPU Duty Cycle

| Core ID | Duty Cycle (%) |
|---------|----------------|
| 0       | 100.00%        |
| 1       | 100.00%        |
| 2       | 100.00%        |
| 3       | 100.00%        |
| 4       | 100.00%        |
| 5       | 100.00%        |
| 6       | 100.00%        |
| 7       | 100.00%        |

TPU HBM Usage

| Chip   | HBM Usage (GiB)       |
|--------|-----------------------|
| 0      | 29.50 GiB / 31.25 GiB |
| 1      | 21.50 GiB / 31.25 GiB |
| 2      | 21.50 GiB / 31.25 GiB |
| 3      | 21.50 GiB / 31.25 GiB |
| 4      | 21.50 GiB / 31.25 GiB |
| 5      | 21.50 GiB / 31.25 GiB |
| 6      | 21.50 GiB / 31.25 GiB |
| 7      | 21.50 GiB / 31.25 GiB |

[^1]: Releases from before 2024 may not be compatible.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.11.0

Apr 15, 2026

0.10.0

Mar 17, 2026

0.9.0

Mar 11, 2026

0.8.1

Dec 22, 2025

0.8.0

Dec 11, 2025

0.7.1

Nov 20, 2025

0.7.0

Nov 14, 2025

0.6.0

Oct 22, 2025

0.5.4

Oct 21, 2025

0.5.2

Sep 26, 2025

0.5.1

Aug 18, 2025

0.4.0

Jul 9, 2025

0.3.0

May 13, 2025

0.2.2

Apr 15, 2025

0.2.0

Sep 20, 2024

0.1.0

Aug 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tpu_info-0.11.0-py3-none-any.whl (37.4 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file tpu_info-0.11.0-py3-none-any.whl.

File metadata

Download URL: tpu_info-0.11.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 37.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for tpu_info-0.11.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d055f20ec8e1b3ea945bf668b907cf0834f411f7044ec5329564e9cfcf4a1a7c`
MD5	`3d131187e94a5872de7502d1c40925e0`
BLAKE2b-256	`fdc0a1e17ab051dea52e6c6d3821a05099919ca664dea24205547156e8dc0d7b`

See more details on using hashes here.

tpu-info 0.11.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

`tpu-info` CLI

What's New in Version 0.11.0

Installing

Usage

Static Mode

Streaming Mode

Version

Compatible Environment:

Incompatible Environment (Python 3.12+):

Process

List Metrics

Metric

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

tpu-info 0.11.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

tpu-info CLI

What's New in Version 0.11.0

Installing

Usage

Static Mode

Streaming Mode

Version

Compatible Environment:

Incompatible Environment (Python 3.12+):

Process

List Metrics

Metric

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

`tpu-info` CLI