diff --git a/gpu-operator/life-cycle-policy.rst b/gpu-operator/life-cycle-policy.rst index 97d88a653..080bafd1c 100644 --- a/gpu-operator/life-cycle-policy.rst +++ b/gpu-operator/life-cycle-policy.rst @@ -95,106 +95,68 @@ The following table shows the operands and default operand versions that corresp When post-release testing confirms support for newer versions of operands, these updates are identified as *recommended updates* to a GPU Operator version. Refer to :ref:`Upgrading the GPU Operator` for more information. - .. list-table:: - :header-rows: 1 - :align: center - - * - Release - - | NVIDIA - | GPU - | Driver - - | NVIDIA Driver - | Manager for K8s - - | NVIDIA - | Container - | Toolkit - - | NVIDIA Kubernetes - | Device Plugin - - DCGM Exporter - - | Node Feature - | Discovery - - | NVIDIA GPU Feature - | Discovery for Kubernetes - - | NVIDIA MIG Manager - | for Kubernetes - - DCGM - - | Validator for - | NVIDIA GPU Operator - - | NVIDIA KubeVirt - | GPU Device Plugin - - | NVIDIA vGPU - | Device Manager - - NVIDIA GDS Driver - - | NVIDIA Kata Manager - | for Kubernetes - - | NVIDIA Confidential - | Computing Manager - | for Kubernetes - - * - v23.9.0 - - | `535.129.03 `_ (recommended), - | `535.104.12 `_ (default), - | `525.147.05 `_, - | `470.223.02 `_ - - `v0.6.4 `_ - - `1.14.3 `_ - - `0.14.2 `_ - - `3.2.6-3.1.9 `_ - - v0.14.2 - - `0.8.2 `_ - - `0.5.5 `_ - - `3.2.6-1 `_, - - v23.9.0 - - `v1.2.3 `_ - - v0.2.4 - - `2.16.1 `_ - - v0.1.2 - - v0.1.1 - - * - v23.6.1 - - | `535.129.03 `_ (recommended), - | `535.104.05 `_ (default), - | `525.147.05 `_, - | `470.223.02 `_ - - `v0.6.2 `_ - - `1.13.4 `_ - - `0.14.1 `_ - - `3.1.8-3.1.5 `_ - - v0.13.1 - - `0.8.1 `_ - - `0.5.3 `_ - - | `3.1.8-1 `_ (default), - - v23.6.1 - - `v1.2.2 `_ - - v0.2.3 - - `2.16.1 `_ - - v0.1.0 - - v0.1.0 - - * - v23.6.0 - - | `535.129.03 `_ (recommended), - | `535.86.10 `_ (default), - | `525.147.05 `_, - | `470.223.02 `_ - - `v0.6.2 `_ - - `1.13.4 `_ - - `0.14.1 `_ - - `3.1.8-3.1.5 `_ - - v0.13.1 - - `0.8.1 `_ - - `0.5.3 `_ - - | `3.1.8-1 `_ (default), - - v23.6.0 - - `v1.2.2 `_ - - v0.2.3 - - `2.16.1 `_ - - v0.1.0 - - v0.1.0 - - .. note:: - - - Driver version could be different with NVIDIA vGPU, as it depends on the driver - version downloaded from the `NVIDIA vGPU Software Portal `_. - - The GPU Operator is supported on all active NVIDIA datacenter production drivers. - Refer to `Supported Drivers and CUDA Toolkit Versions `_ - for more information. +.. list-table:: + :header-rows: 1 + + * - Component + - Version + + * - NVIDIA GPU Operator + - v23.9.1 + + * - NVIDIA GPU Driver + - | `535.129.03 `_ (default), + | `525.147.05 `_, + | `470.223.02 `_, + + * - NVIDIA Driver Manager for K8s + - `v0.6.5 `_ + + * - NVIDIA Container Toolkit + - `1.14.3 `_ + + * - NVIDIA Kubernetes Device Plugin + - `0.14.3 `_ + + * - DCGM Exporter + - `3.3.0-3.2.0 `_ + + * - Node Feature Discovery + - v0.14.2 + + * - | NVIDIA GPU Feature Discovery + | for Kubernetes + - `0.8.2 `_ + + * - NVIDIA MIG Manager for Kubernetes + - `0.5.5 `_ + + * - DCGM + - `3.3.0-1 `_ + + * - Validator for NVIDIA GPU Operator + - v23.9.1 + + * - NVIDIA KubeVirt GPU Device Plugin + - `v1.2.4 `_ + + * - NVIDIA vGPU Device Manager + - v0.2.4 + + * - NVIDIA GDS Driver + - `2.17.5 `_ + + * - NVIDIA Kata Manager for Kubernetes + - v0.1.2 + + * - | NVIDIA Confidential Computing + | Manager for Kubernetes + - v0.1.1 + +.. note:: + + - Driver version could be different with NVIDIA vGPU, as it depends on the driver + version downloaded from the `NVIDIA vGPU Software Portal `_. + - The GPU Operator is supported on all active NVIDIA datacenter production drivers. + Refer to `Supported Drivers and CUDA Toolkit Versions `_ + for more information. diff --git a/gpu-operator/platform-support.rst b/gpu-operator/platform-support.rst index c1bc9e927..e4115992f 100644 --- a/gpu-operator/platform-support.rst +++ b/gpu-operator/platform-support.rst @@ -34,14 +34,27 @@ Platform Support .. include:: life-cycle-policy.rst -Supported NVIDIA GPUs and Systems ---------------------------------- +.. _supported-nvidia-gpus-and-systems: + +Supported NVIDIA Data Center GPUs and Systems +--------------------------------------------- The following NVIDIA data center GPUs are supported on x86 based platforms: .. tab-set:: - .. tab-item:: Data Center A, H and L-series Products + .. tab-item:: GH-series Products + + .. list-table:: + :header-rows: 1 + + * - Product + - Architecture + + * - NVIDIA GH200 + - NVIDIA Grace Hopper + + .. tab-item:: A, H and L-series Products +-------------------------+---------------------------+ | Product | Architecture | @@ -90,7 +103,7 @@ The following NVIDIA data center GPUs are supported on x86 based platforms: * Hopper (H100) GPU is only supported on x86 servers. * The GPU Operator supports DGX A100 with DGX OS 5.1+ and Red Hat OpenShift using Red Hat Core OS. For installation instructions, see :ref:`here ` for DGX OS 5.1+ and :ref:`here ` for Red Hat OpenShift. - .. tab-item:: Data Center D,T and V-series Products + .. tab-item:: D,T and V-series Products +-----------------------+------------------------+ | Product | Architecture | @@ -106,7 +119,7 @@ The following NVIDIA data center GPUs are supported on x86 based platforms: | NVIDIA P4 | Pascal | +-----------------------+------------------------+ - .. tab-item:: Data Center RTX / T-series Products + .. tab-item:: RTX / T-series Products +-------------------------+------------------------+ | Product | Architecture | @@ -244,29 +257,21 @@ The GPU Operator has been validated in the following scenarios: | MicroK8s * - Ubuntu 20.04 LTS - - 1.25---1.28 + - 1.22---1.28 - - 7.0 U3c, 8.0 U2 - - 1.25---1.28 + - 1.22---1.28 - - * - Ubuntu 22.04 LTS - - 1.25---1.28 + - 1.22---1.28 - - - - - 1.26 - * - CentOS 7 - - 1.25---1.28 - - - - - - - - - - - * - Red Hat Core OS - - | 4.9---4.14 @@ -279,10 +284,10 @@ The GPU Operator has been validated in the following scenarios: | Enterprise | Linux 8.4, | 8.6---8.9 - - 1.25---1.28 + - 1.22---1.28 - - - - 1.25---1.28 + - 1.22---1.28 - - @@ -407,8 +412,8 @@ Operating System Kubernetes KubeVirt OpenShift Virtual ================ =========== ============= ========= ============= ======== Ubuntu 20.04 LTS 1.22---1.28 0.36+ 0.59.1+ Ubuntu 22.04 LTS 1.22---1.28 0.36+ 0.59.1+ -Red Hat Core OS 4.11, 4.12, 4.13 - 4.13 +Red Hat Core OS 4.11---4.14 4.13, + 4.14 ================ =========== ============= ========= ============= ======== You can run GPU passthrough and NVIDIA vGPU in the same cluster as long as you use @@ -426,9 +431,8 @@ Support for GPUDirect RDMA Supported operating systems and NVIDIA GPU Drivers with GPUDirect RDMA. -- Ubuntu 20.04 and 22.04 LTS with Network Operator 23.7.0 -- Red Hat OpenShift 4.9 and higher with Network Operator 23.7.0 -- CentOS 7 with MOFED installed on the node +- Ubuntu 20.04 and 22.04 LTS with Network Operator 23.10.0 +- Red Hat OpenShift 4.9 and higher with Network Operator 23.10.0 For information about configuring GPUDirect RDMA, refer to :doc:`gpu-operator-rdma`. @@ -438,13 +442,19 @@ Support for GPUDirect Storage Supported operating systems and NVIDIA GPU Drivers with GPUDirect Storage. -- Ubuntu 20.04 and 22.04 LTS with Network Operator 23.7.0 +- Ubuntu 20.04 and 22.04 LTS with Network Operator 23.10.0 - Red Hat OpenShift Container Platform 4.11 and higher .. note:: - Not supported with secure boot. - Supported storage types are local NVMe and remote NFS. + Version v2.17.5 and higher of the NVIDIA GPUDirect Storage kernel driver, ``nvidia-fs``, + requires the NVIDIA open kernel modules. + You can install the open kernel modules by specifying the ``driver.useOpenKernelModules=true`` + argument to the ``helm`` command. + Refer to :ref:`chart customization options` for more information. + + Not supported with secure boot. + Supported storage types are local NVMe and remote NFS. Additional Supported Container Management Tools -----------------------------------------------