Protected PCIE Verifier
Project description
Protected PCIE Verifier
- Overview
- Architecture diagram
- Getting Started
- Prerequisites
- Installation
- Usage
- Troubleshooting
- License
Overview
In a multi-GPU confidential computing (CC) setup, NVLink interconnects and NVSwitches are used for GPU to GPU data traffic. NVLink interconnects and NVSwitches are outside the trust boundary and thus should not allow access to plain-text data. All data that flows over NVLink must be encrypted prior to transfer and decrypted at the destination GPU. On the GPU encryption and decryption is performed by the GPU copy engine (CE).
Bouncing through a CE adds constraints and latency to the data path which may result in performance drops for some workloads. To minimize performance impact, NVIDIA's 'PPCIE' mode adjusts the security model to trust NVLink data, enabling plain-text traffic without CEs while preserving a Confidential Virtual Machine.
Note: There are only two supported GPU usage configurations: ALL GPUs are in CC mode. Each GPU can be assigned to one Confidential VM. In this scenario, use the CC verifier. ALL GPUs are in PPCIe mode. All GPUs must be assigned one Confidential VM. In this scenario, use the PPCIE verifier
High-Level Architecture Diagram
The PPCIE verifier is a tool designed to verify the security of the multi-GPU system by attesting to the integrity of its GPUs and NVSwitches. The attestation SDK is used to gather evidence for each device, with further attestation performed either locally or remotely, as specified by the user when running the PPCIE Verifier tool.
After collecting attestation results for each device, the PPCIE verifier validates these results against a policy file to confirm that all claims are legitimate. Following the attestation process, the tool conducts a final topology check to verify that the devices are securely connected to the expected configuration. The final attestation results are then presented to the user, detailing the checks performed.
Detailed Architecture Flow
- The PPCIE Verifier tool is initiated by the user, who specifies the attestation mode for both GPUs and NvSwitches.
- The system components are enumerated (number of GPUs and NvSwitches).
- Pre-checks are performed on each GPU to ensure it is configured for confidential computing.
- Pre-checks are performed on each NvSwitch to ensure it is configured for confidential computing.
- The required GPU evidence for attestation is collected from the Attestation SDK for each GPU.
- Once the evidence is collected, the PPCIE Verifier tool initiates attestation verification based on the mode specified by the user.
- GPU attestation is initiated by the Attestation SDK: the local-gpu-verifier is used for local attestation, while NRAS (NVIDIA's Remote Attestation Service) is used for remote attestation.
- The Attestation SDK provides GPU attestation results to the PPCIE Verifier.
- If the GPU attestation is successful, the PPCIE Verifier proceeds to collect evidence for the NvSwitches from the Attestation SDK.
- Once all NvSwitch evidence is collected, attestation is initiated by the PPCIE Verifier.
- NvSwitch attestation is performed by the Attestation SDK: the local-switch-verifier is used for local attestation, while NRAS is used for remote attestation.
- The Attestation SDK provides NvSwitch attestation results to the PPCIE Verifier.
- If the NvSwitch attestation is successful, the PPCIE Verifier performs a topology check to ensure the devices are securely connected in the expected configuration.
- The PPCIE Verifier determines the overall results and updates the status for each check it performs.
- The GPU ready state is set.
- The final attestation results are presented to the user, detailing the checks performed and the status of each device in the system.
Getting started
Prerequisites
HGX system with 8 GPUs and 4 switches assigned to the single tenant
python >= 3.8
git installed
Nvidia GPU driver installed
Nvidia Switch driver installed
Nvidia Fabric Manager installed
Installation/Dependencies
PPCIE Verifier has the following dependencies:
- nv-attestation-sdk (Attestation SDK)
- nv-local-gpu-verifier (Local GPU Verifier)
- nv-switch-verifier (Local Switch Verifier) Note: nv-switch-verifier (Local Switch Verifier) This is a module inside attestation-sdk and does not require separate installation
Installation Instructions:
Please elevate to Root User Privileges before installing the packages: (Note: This is necessary to set the GPU ready state)
sudo -i
Method 1: Using installer script
1. git clone https://github.com/NVIDIA/nvtrust/tree/main
2. cd nvtrust/guest_tools/ppcie-verifier/install
3. source ppcie-installer.sh (This would install the required dependencies)
Method 2: Using PyPI (Requires python virtual environment creation)
1. python3 -m venv venv
2. source venv/bin/activate
3. pip3 install nv-ppcie-verifier (This would automatically install nv-attestation-sdk, nv-local-gpu-verifier and nv-switch-verifier)
Usage
python3 -m ppcie.verifier.verification --gpu-attestation-mode=LOCAL --switch-attestation-mode=LOCAL (Example arguments provided)
Options
Option | Description | Value Options |
---|---|---|
--gpu-attestation-mode |
Type of GPU Attestation | LOCAL, REMOTE |
--switch-attestation-mode |
Type of nvSwitch Attestation | LOCAL, REMOTE |
--log |
Configure log level | DEBUG, INFO, WARNING, ERROR, TRACE, CRITICAL |
Troubleshooting
Below are some of the common issues that have been encountered:
Installation Issues:
-
ModuleNotFoundError: No module named 'nv_attestation_sdk'
while installing the packages using the installer script(ppcie-installer.sh)Solution: Delete the venv created and try installing the packages using the script again
-
If you encounter warning and installation issues similar to the below while installing the package:
WARNING: Ignoring invalid distribution ~v-attestation-sdk <site-package-directory>
Please execute the following commands to clean up packages that were not installed properly and then re-try the installation:Solution:
rm -rf $(ls -l <site-packages-directory> | grep '~' | awk '{print $9}')
Configuration Issues
-
The nvmlInit call timed out.
orError in Initializing NVML library. Please install the drivers again and re-try
Solution: This requires re-installing the Nvidia GPU driver and fabric manager
-
NSCQWarning: NSCQ_RC_WARNING_RDT_INIT_FAILURE
Solution: This requires installing the correct version of the Nvidia Switch driver compatible with the GPU driver
License
The license for this repository is Apache v2 except where otherwise noted.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file nv_ppcie_verifier-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: nv_ppcie_verifier-1.1.0-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0da93dbe94baca3b1afa289e99b13736c6e9e19be2375f1c57feffba8cc19bdc |
|
MD5 | 9c2b86bf569a2e220da9873f47c77191 |
|
BLAKE2b-256 | 48d0aedc68524584721c533d437804ef95ff91cb761d0ec4d4a0028cd4850d9a |