Skip to main content

NVIDIA GPU administration and configuration tools for managing Confidential Computing modes and debugging

Project description

NVIDIA GPU Admin Tools

This utility is used for various configuration including the Confidential Computing modes of supported GPUs as well as some debug/test tasks. It is designed to be run as a privileged python3 command.

Supported CC modes are:

  • on
    • All supported GPU security features are enabled (e.g., bus encryption, performance counters off)
  • devtools
    • All supported GPU security features are enabled, however blocks preventing DevTools profiling/debugging are lifted
  • off
    • The GPU operates in its default mode; no supplementary confidential computing features are enabled

Most Commonly Used Examples

Query the CC mode of all GPUs the system

sudo python3 ./nvidia_gpu_tools.py --devices gpus --query-cc-mode

Query the CC mode of first 4 GPUs the system

sudo python3 ./nvidia_gpu_tools.py --devices gpus[0:4] --query-cc-mode

Enable CC mode on all GPUs

sudo python3 ./nvidia_gpu_tools.py --devices gpus --set-cc-mode=on --reset-after-cc-mode-switch

Disable CC mode on a specific GPU in the system

sudo python3 ./nvidia_gpu_tools.py --devices 45:00.0 --set-cc-mode=off --reset-after-cc-mode-switch

Generic debug dump from GPU

sudo python3 ./nvidia_gpu_tools.py --gpu-bdf=45:00.0 --debug-dump --log debug

Debug dump of NVLINK state

sudo python3 ./nvidia_gpu_tools.py --gpu-bdf=45:00.0 --nvlink-debug-dump --log debug

Usage

sudo python3 nvidia_gpu_tools.py --help

NVIDIA GPU Tools version v2025.03.26o
Command line arguments: ['nvidia_gpu_tools.py', '--help']
usage: nvidia_gpu_tools.py [-h] [--devices DEVICES] [--gpu GPU]
                         [--gpu-bdf GPU_BDF] [--gpu-name GPU_NAME]
                         [--no-gpu]
                         [--log {debug,info,warning,error,critical}]
                         [--mmio-access-type {devmem,sysfs}]
                         [--recover-broken-gpu]
                         [--set-next-sbr-to-fundamental-reset]
                         [--reset-with-sbr] [--reset-with-flr]
                         [--reset-with-os] [--remove-from-os]
                         [--sysfs-bind SYSFS_BIND] [--sysfs-unbind]
                         [--query-ecc-state] [--query-cc-mode]
                         [--query-cc-settings] [--query-ppcie-mode]
                         [--query-ppcie-settings] [--query-prc-knobs]
                         [--set-cc-mode {off,on,devtools}]
                         [--reset-after-cc-mode-switch]
                         [--test-cc-mode-switch]
                         [--reset-after-ppcie-mode-switch]
                         [--set-ppcie-mode {off,on}]
                         [--test-ppcie-mode-switch]
                         [--set-bar0-firewall-mode {off,on}]
                         [--query-bar0-firewall-mode]
                         [--query-l4-serial-number] [--query-module-name]
                         [--clear-memory] [--debug-dump]
                         [--nvlink-debug-dump]
                         [--knobs-reset-to-defaults-list]
                         [--knobs-reset-to-defaults KNOBS_RESET_TO_DEFAULTS [KNOBS_RESET_TO_DEFAULTS ...]]
                         [--knobs-reset-to-defaults-assume-no-pending-changes]
                         [--knobs-reset-to-defaults-test] [--noop]
                         [--force-ecc-on-after-reset] [--test-ecc-toggle]
                         [--query-mig-mode] [--force-mig-off-after-reset]
                         [--test-mig-toggle]
                         [--block-nvlink BLOCK_NVLINK [BLOCK_NVLINK ...]]
                         [--block-all-nvlinks] [--test-nvlink-blocking]
                         [--dma-test] [--test-pcie-p2p]
                         [--read-sysmem-pa READ_SYSMEM_PA]
                         [--write-sysmem-pa WRITE_SYSMEM_PA WRITE_SYSMEM_PA]
                         [--read-config-space READ_CONFIG_SPACE]
                         [--write-config-space WRITE_CONFIG_SPACE WRITE_CONFIG_SPACE]
                         [--read-bar0 READ_BAR0]
                         [--write-bar0 WRITE_BAR0 WRITE_BAR0]
                         [--read-bar1 READ_BAR1]
                         [--write-bar1 WRITE_BAR1 WRITE_BAR1]
                         [--ignore-nvidia-driver]
                         {} ...

positional arguments:
{}

options:
-h, --help            show this help message and exit
--devices DEVICES     Generic device selector supporting multiple comma-separated specifiers:
                      - 'gpus' - Find all NVIDIA GPUs
                      - 'gpus[n]' - Find nth NVIDIA GPU
                      - 'gpus[n:m]' - Find NVIDIA GPUs from index n to m
                      - 'nvswitches' - Find all NVIDIA NVSwitches
                      - 'nvswitches[n]' - Find nth NVIDIA NVSwitch
                      - 'vendor:device' - Find devices matching 4-digit hex vendor:device ID
                      - 'domain:bus:device.function' - Find device at specific BDF address
--gpu GPU
--gpu-bdf GPU_BDF     Select a single GPU by providing a substring of the
                      BDF, e.g. '01:00'.
--gpu-name GPU_NAME   Select a single GPU by providing a substring of the
                      GPU name, e.g. 'T4'. If multiple GPUs match, the first
                      one will be used.
--no-gpu              Do not use any of the GPUs; commands requiring one
                      will not work.
--log {debug,info,warning,error,critical}
--mmio-access-type {devmem,sysfs}
                      On Linux, specify whether to do MMIO through /dev/mem
                      or /sys/bus/pci/devices/.../resourceN
--recover-broken-gpu  Attempt recovering a broken GPU (unresponsive config
                      space or MMIO) by performing an SBR. If the GPU is
                      broken from the beginning and hence correct config
                      space wasn't saved then reenumarate it in the OS by
                      sysfs remove/rescan to restore BARs etc.
--set-next-sbr-to-fundamental-reset
                      Configure the GPU to make the next SBR same as
                      fundamental reset. After the SBR this setting resets
                      back to False. Supported on H100 only.
--reset-with-sbr      Reset the GPU with SBR and restore its config space
                      settings, before any other actions
--reset-with-flr      Reset the GPU with FLR and restore its config space
                      settings, before any other actions
--reset-with-os       Reset with OS through /sys/.../reset
--remove-from-os      Remove from OS through /sys/.../remove
--sysfs-bind SYSFS_BIND
                      Bind devices to the specified driver
--sysfs-unbind        Unbind devices from the current driver
--query-ecc-state     Query the ECC state of the GPU
--query-cc-mode       Query the current Confidential Computing (CC) mode of
                      the GPU.
--query-cc-settings   Query the Confidential Computing (CC) settings of the
                      GPU.This prints the lower level setting knobs that
                      will take effect upon GPU reset.
--query-ppcie-mode    Query the current Protected PCIe (PPCIe) mode of the
                      GPU or switch.
--query-ppcie-settings
                      Query the Protected PPCIe (PPCIe) settings of the GPU
                      or switch.This prints the lower level setting knobs
                      that will take effect upon GPU or switch reset.
--query-prc-knobs     Query all the Product Reconfiguration (PRC) knobs.
--set-cc-mode {off,on,devtools}
                      Configure Confidentail Computing (CC) mode. The
                      choices are off (disabled), on (enabled) or devtools
                      (enabled in DevTools mode).The GPU needs to be reset
                      to make the selected mode active. See --reset-after-
                      cc-mode-switch for one way of doing it.
--reset-after-cc-mode-switch
                      Reset the GPU after switching CC mode such that it is
                      activated immediately.
--test-cc-mode-switch
                      Test switching CC modes.
--reset-after-ppcie-mode-switch
                      Reset the GPU or switch after switching PPCIe mode
                      such that it is activated immediately.
--set-ppcie-mode {off,on}
                      Configure Protected PCIe (PPCIe) mode. The choices are
                      off (disabled) or on (enabled).The GPU or switch needs
                      to be reset to make the selected mode active. See
                      --reset-after-ppcie-mode-switch for one way of doing
                      it.
--test-ppcie-mode-switch
                      Test switching PPCIE mode.
--set-bar0-firewall-mode {off,on}
                      Configure BAR0 firewall mode. The choices are off
                      (disabled) or on (enabled).
--query-bar0-firewall-mode
                      Query the current BAR0 firewall mode of the GPU.
                      Blackwell+ only.
--query-l4-serial-number
                      Query the L4 certificate serial number without the
                      MSB. The MSB could be either 0x41 or 0x40 based on the
                      RoT returning the certificate chain.
--query-module-name   Query the module name (aka physical ID and module ID).
                      Supported only on H100 SXM and NVSwitch_gen3
--clear-memory        Clear the contents of the GPU memory. Supported on
                      Pascal+ GPUs. Assumes the GPU has been reset with SBR
                      prior to this operation and can be comined with
                      --reset-with-sbr if not.
--debug-dump          Dump various state from the device for debug
--nvlink-debug-dump   Dump NVLINK debug state.
--knobs-reset-to-defaults-list
                      Show the supported knobs and their default state
--knobs-reset-to-defaults KNOBS_RESET_TO_DEFAULTS [KNOBS_RESET_TO_DEFAULTS ...]
                      Set various device configuration knobs to defaults.
                      Supported on Turing+ GPUs and NvSwitch_gen3. See
                      --knobs-reset-to-defaults-list for the list of
                      supported knobs and their defaults on a specific
                      device. The option can be specified multiple times to
                      list specific knobs or 'all' can be used to indicate
                      all supported ones should be reset.
--knobs-reset-to-defaults-assume-no-pending-changes
                      Indicate that the device was reset after last time any
                      knobs were modified. This allows the reset to defaults
                      to be slightly optimized by querying the current state
--knobs-reset-to-defaults-test
                      Test knob setting and resetting
--noop                An empty option that can be used to separate nargs=+
                      options from positional arguments
--force-ecc-on-after-reset
                      Force ECC to be enabled after a subsequent GPU reset
--test-ecc-toggle     Test toggling ECC mode.
--query-mig-mode      Query whether MIG mode is enabled.
--force-mig-off-after-reset
                      Force MIG mode to be disabled after a subsequent GPU
                      reset
--test-mig-toggle     Test toggling MIG mode.
--block-nvlink BLOCK_NVLINK [BLOCK_NVLINK ...]
                      Block the specified NVLinks. NVLinks will be blocked
                      until a subsequent GPU reset (SBR on A100, FLR or SBR
                      on Hopper GPUs [based on OOB configuration], FLR or
                      SBR on Blackwell and later). Supported on A100 and
                      later GPUs that have NVLinks.
--block-all-nvlinks   Block all NVLinks. See --block-nvlink for more
                      details.
--test-nvlink-blocking
                      Test blocking NVLinks.
--dma-test            Check that GPUs are able to perform DMA to all/most of
                      available system memory.
--test-pcie-p2p       Check that all GPUs are able to perform DMA to each
                      other.
--read-sysmem-pa READ_SYSMEM_PA
                      Use GPU's DMA to read 32-bits from the specified
                      sysmem physical address
--write-sysmem-pa WRITE_SYSMEM_PA WRITE_SYSMEM_PA
                      Use GPU's DMA to write specified 32-bits to the
                      specified sysmem physical address
--read-config-space READ_CONFIG_SPACE
                      Read 32-bits from device's config space at specified
                      offset
--write-config-space WRITE_CONFIG_SPACE WRITE_CONFIG_SPACE
                      Write 32-bit to device's config space at specified
                      offset
--read-bar0 READ_BAR0
                      Read 32-bits from GPU BAR0 at specified offset
--write-bar0 WRITE_BAR0 WRITE_BAR0
                      Write 32-bit to GPU BAR0 at specified offset
--read-bar1 READ_BAR1
                      Read 32-bits from GPU BAR1 at specified offset
--write-bar1 WRITE_BAR1 WRITE_BAR1
                      Write 32-bit to GPU BAR1 at specified offset
--ignore-nvidia-driver
                      Do not treat nvidia driver apearing to be loaded as an
                      error

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nvidia_gpu_admin_tools-2025.11.21.tar.gz (101.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nvidia_gpu_admin_tools-2025.11.21-py3-none-any.whl (164.9 kB view details)

Uploaded Python 3

File details

Details for the file nvidia_gpu_admin_tools-2025.11.21.tar.gz.

File metadata

File hashes

Hashes for nvidia_gpu_admin_tools-2025.11.21.tar.gz
Algorithm Hash digest
SHA256 0c187736ad2e6bf399a99eef5f3ff9e566390f90fc6f65e8a0cebf5bff573b69
MD5 95ab5b348fc863f9e38c6fbec0cf453b
BLAKE2b-256 da5134a3d113cb06a91b21a6c012c082cb0060e8c94f7eb53dbd015412730928

See more details on using hashes here.

File details

Details for the file nvidia_gpu_admin_tools-2025.11.21-py3-none-any.whl.

File metadata

File hashes

Hashes for nvidia_gpu_admin_tools-2025.11.21-py3-none-any.whl
Algorithm Hash digest
SHA256 3be8c4515dea919fb1c411bc2ad04ae3ef71c43a5b8112f569d87bc2c9b078de
MD5 ceb6af1e02ebd871276b0666ebf1d99f
BLAKE2b-256 9c25f75a6a0210392a1d22b27c8ae8b776e15d64eb04dd3236a284ed0f8a98ee

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page