From 6c121145e6e60ebe92d994e0164d846df7660429 Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Tue, 3 Dec 2024 11:28:38 -0500 Subject: [PATCH 01/10] these files aren't used. part of them is in What Is --- docs/introduction.rst | 46 -------------------------------------- docs/introduction.rst.orig | 46 -------------------------------------- 2 files changed, 92 deletions(-) delete mode 100644 docs/introduction.rst delete mode 100644 docs/introduction.rst.orig diff --git a/docs/introduction.rst b/docs/introduction.rst deleted file mode 100644 index 60ece9b0..00000000 --- a/docs/introduction.rst +++ /dev/null @@ -1,46 +0,0 @@ - -************* -Introduction -************* - -.. toctree:: - :maxdepth: 4 - :caption: Contents: - -Overview -================== - -hipCUB is a thin wrapper library on top of rocPRIM or CUB. It enables developers to port project -using CUB library to the `HIP `_ layer and to run them -on AMD hardware. In the `ROCm `_ environment, hipCUB uses -rocPRIM library as the backend, however, on CUDA platforms it uses CUB instead. - -- When using hipCUB you should only include ```` header. -- When rocPRIM is used as backend ``HIPCUB_ROCPRIM_API`` is defined. -- When CUB is used as backend ``HIPCUB_CUB_API`` is defined. -- Backends are automaticaly selected based on platform detected by HIP layer - (``__HIP_PLATFORM_AMD__``, ``__HIP_PLATFORM_NVIDIA__``). - -rocPRIM backend -==================================== - -hipCUB with rocPRIM backend may not support all function and features CUB has because of the -differences between ROCm (HIP) platform and CUDA platform. - -Not-supported features and differences: - -- Functions, classes and macros which are not in the public API or not documented are not - supported. -- Device-wide primitives can't be called from kernels (dynamic parallelism is not supported in HIP - on ROCm). -- Storage management and debug functions: - - - ``Debug``, ``PtxVersion``, ``SmVersion`` functions and ``CubDebug``, ``CubDebugExit``, - ``_CubLog`` macros are not supported. -- Intrinsics: - - - ``ThreadExit``, ``ThreadTrap`` - not supported. - - Warp thread masks (when used) are 64-bit unsigned integers. - - ``member_mask`` input argument is ignored in ``WARP_*`` functions. - - Arguments ``first_thread``, ``last_thread``, and ``member_mask`` are ignored in ``Shuffle*`` - functions. diff --git a/docs/introduction.rst.orig b/docs/introduction.rst.orig deleted file mode 100644 index 60ece9b0..00000000 --- a/docs/introduction.rst.orig +++ /dev/null @@ -1,46 +0,0 @@ - -************* -Introduction -************* - -.. toctree:: - :maxdepth: 4 - :caption: Contents: - -Overview -================== - -hipCUB is a thin wrapper library on top of rocPRIM or CUB. It enables developers to port project -using CUB library to the `HIP `_ layer and to run them -on AMD hardware. In the `ROCm `_ environment, hipCUB uses -rocPRIM library as the backend, however, on CUDA platforms it uses CUB instead. - -- When using hipCUB you should only include ```` header. -- When rocPRIM is used as backend ``HIPCUB_ROCPRIM_API`` is defined. -- When CUB is used as backend ``HIPCUB_CUB_API`` is defined. -- Backends are automaticaly selected based on platform detected by HIP layer - (``__HIP_PLATFORM_AMD__``, ``__HIP_PLATFORM_NVIDIA__``). - -rocPRIM backend -==================================== - -hipCUB with rocPRIM backend may not support all function and features CUB has because of the -differences between ROCm (HIP) platform and CUDA platform. - -Not-supported features and differences: - -- Functions, classes and macros which are not in the public API or not documented are not - supported. -- Device-wide primitives can't be called from kernels (dynamic parallelism is not supported in HIP - on ROCm). -- Storage management and debug functions: - - - ``Debug``, ``PtxVersion``, ``SmVersion`` functions and ``CubDebug``, ``CubDebugExit``, - ``_CubLog`` macros are not supported. -- Intrinsics: - - - ``ThreadExit``, ``ThreadTrap`` - not supported. - - Warp thread masks (when used) are 64-bit unsigned integers. - - ``member_mask`` input argument is ignored in ``WARP_*`` functions. - - Arguments ``first_thread``, ``last_thread``, and ``member_mask`` are ignored in ``Shuffle*`` - functions. From cfc98f61460e709b6befc8d3bbb726262564ab29 Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Tue, 3 Dec 2024 11:29:10 -0500 Subject: [PATCH 02/10] changed 'integral' to 'integer' --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 26f902df..a2895dce 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,8 @@ # hipCUB +> [!NOTE] +> The published documentation is available at [hipCUB](https://rocm.docs.amd.com/projects/hipCUB/en/latest/) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the `docs` folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html). + hipCUB is a thin wrapper library on top of [rocPRIM](https://github.com/ROCm/rocPRIM) or [CUB](https://github.com/thrust/cub). You can use it to port a CUB project into From 3cfdf40c88384de68788fdc65b21a7c5bc0e338b Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Tue, 3 Dec 2024 11:29:51 -0500 Subject: [PATCH 03/10] Added installation files --- docs/index.rst | 7 +++ docs/install/hipCUB-install-on-Windows.rst | 31 ++++++++++++ docs/install/hipCUB-install-overview.rst | 23 +++++++++ docs/install/hipCUB-install-with-cmake.rst | 56 ++++++++++++++++++++++ docs/install/hipCUB-prerequisites.rst | 34 +++++++++++++ docs/sphinx/_toc.yml.in | 10 ++++ 6 files changed, 161 insertions(+) create mode 100644 docs/install/hipCUB-install-on-Windows.rst create mode 100644 docs/install/hipCUB-install-overview.rst create mode 100644 docs/install/hipCUB-install-with-cmake.rst create mode 100644 docs/install/hipCUB-prerequisites.rst diff --git a/docs/index.rst b/docs/index.rst index ef1d4262..dbc469a1 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -19,6 +19,13 @@ The documentation is structured as follows: .. grid:: 2 + .. grid-item-card:: Installation + + * :doc:`Prerequisites ` + * :doc:`Installation overview ` + * :doc:`Installing on Windows ` + * :doc:`Installing on Linux and Windows with CMake ` + .. grid-item-card:: API Reference * :ref:`data-type-support` diff --git a/docs/install/hipCUB-install-on-Windows.rst b/docs/install/hipCUB-install-on-Windows.rst new file mode 100644 index 00000000..0a9f5c49 --- /dev/null +++ b/docs/install/hipCUB-install-on-Windows.rst @@ -0,0 +1,31 @@ +.. meta:: + :description: Build and install hipCUB with rmake.py + :keywords: install, building, hipCUB, AMD, ROCm, source code, installation script, Windows + +******************************************************************** +Building and installing hipCUB on Windows +******************************************************************** + +You can use ``rmake.py`` to build and install hipCUB on Windows. You can also use `CMake <./hipCUB-install-with-cmake.html>`_ if you want more build and installation options. + +``rmake.py`` is located in the ``hipCUB`` root directory. To build and install hipCUB with ``rmake.py``, run: + +.. code:: shell + + python rmake.py -i + +This command also downloads `rocPRIM `_ and installs it in ``C:\hipSDK``. + +The ``-c`` option builds all clients, including the unit tests: + +.. code:: shell + + python rmake.py -c + +To see a complete list of ``rmake.py`` options, run: + +.. code-block:: shell + + python rmake.py --help + + \ No newline at end of file diff --git a/docs/install/hipCUB-install-overview.rst b/docs/install/hipCUB-install-overview.rst new file mode 100644 index 00000000..772e53d0 --- /dev/null +++ b/docs/install/hipCUB-install-overview.rst @@ -0,0 +1,23 @@ +.. meta:: + :description: hipCUB installation overview + :keywords: install, hipCUB, AMD, ROCm, installation, overview, general + +********************************* +hipCUB installation overview +********************************* + +The hipCUB source code is available from the `hipCUB GitHub Repository `_. + +The develop branch is the default branch. The develop branch is intended for users who want to preview new features or contribute to the hipCUB code base. + +If you don't intend to contribute to the hipCUB code base and won't be previewing features, use a branch that matches the version of ROCm installed on your system. + +hipCUB can be built and installed with |rmake|_ on Windows, or `CMake <./hipCUB-install-with-cmake.html>`_ on both Windows and Linux. + +.. |install| replace:: ``install`` +.. _install: ./rocThrust-install-script.html + +.. |rmake| replace:: ``rmake.py`` +.. _rmake: ./hipCUB-install-on-Windows.html + +CMake provides the most flexibility in building and installing hipCUB. \ No newline at end of file diff --git a/docs/install/hipCUB-install-with-cmake.rst b/docs/install/hipCUB-install-with-cmake.rst new file mode 100644 index 00000000..f21bf0db --- /dev/null +++ b/docs/install/hipCUB-install-with-cmake.rst @@ -0,0 +1,56 @@ +.. meta:: + :description: Build and install hipCUB with CMake + :keywords: install, building, hipCUB, AMD, ROCm, source code, cmake + +.. _install-with-cmake: + +******************************************************************** +Building and installing hipCUB with CMake +******************************************************************** + +You can build and install hipCUB with CMake on AMD and NVIDIA GPUs on Windows or Linux. + +Before you begin, set ``CXX`` to ``amdclang++`` or ``hipcc`` if you're building hipCUB on an AMD GPU, or to ``nvcc`` if you're building hipCUB on an NVIDIA GPU. Then set ``CMAKE_CXX_COMPILER`` to the compiler's absolute path. For example: + +.. code:: shell + + CXX=amdclang++ + CMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ + +Create the ``build`` directory inside the ``hipCUB`` directory, then change directory to the ``build`` directory: + +.. code:: shell + + mkdir build + cd build + +Generate the makefile using the ``cmake`` command: + +.. code:: shell + + cmake ../. [-D [-D] ...] + +The available build options are: + + +* ``BUILD_BENCHMARK``. Set this to ``ON`` to build benchmark tests. Off by default. +* ``BUILD_TEST``. Set this to ``ON`` to build tests. Off by default. +* ``DEPENDENCIES_FORCE_DOWNLOAD``. Set this to ``ON`` to download the dependencies regardless of whether or not they are already installed. Off by default. + +Build hipCUB using the generated make file: + +.. code:: shell + + make -j4 + +After you've built hipCUB, you can optionally generate tar, zip, and deb packages: + +.. code:: shell + + make package + +Finally, install hipCUB: + +.. code:: shell + + make install diff --git a/docs/install/hipCUB-prerequisites.rst b/docs/install/hipCUB-prerequisites.rst new file mode 100644 index 00000000..077625e7 --- /dev/null +++ b/docs/install/hipCUB-prerequisites.rst @@ -0,0 +1,34 @@ +.. meta:: + :description: hipCUB Installation Prerequisites + :keywords: install, hipCUB, AMD, ROCm, prerequisites, dependencies, requirements + +******************************************************************** +hipCUB prerequisites +******************************************************************** + +hipCUB has the following prerequisites on all platforms: + +* CMake version 3.16 or higher + +On AMD GPUs: + +* `ROCm `_ +* `amdclang++ `_ +* `rocPRIM `_ + +amdclang++ is installed with ROCm. rocPRIM is automatically downloaded and installed by the CMake script. + +On NVIDIA GPUs: + +* The CUDA Toolkit +* CCCL library version 2.3.2 or later +* CUB and Thrust +* libcu++ version 2.2.0 + +The CCCL library is automatically downloaded and built by the CMake script. If libcu++ isn't found on the system, it will be downloaded from the CCCL repository. + +On Windows: + +* Python verion 3.6 or later +* Visual Studio 2019 with Clang support +* Strawberry Perl diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 79b81038..aac506db 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -4,6 +4,16 @@ root: index subtrees: - entries: - file: what-is-hipcub + - caption: Installation + entries: + - file: install/hipCUB-prerequisites + title: Installation prerequisites + - file: install/hipCUB-install-overview + title: Installation overview + - file: install/hipCUB-install-on-Windows + title: Installing on Windows + - file: install/hipCUB-install-with-cmake + title: Installing on Linux and Windows with CMake - caption: API reference entries: - file: api-reference/data-type-support From 3c5eece3bc1e3ecb10fcbdd25cdc45544ba26a8a Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Tue, 3 Dec 2024 11:32:54 -0500 Subject: [PATCH 04/10] Update README.md --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 4c1f1d6d..6a0ad19b 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,6 @@ # hipCUB > [!NOTE] - > The published documentation is available at [hipCUB](https://rocm.docs.amd.com/projects/hipCUB/en/latest/index.html) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the `docs` folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html). hipCUB is a thin wrapper library on top of From 77560304e63f5cae7c9659d0a5833af2945e34fc Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Tue, 3 Dec 2024 11:34:29 -0500 Subject: [PATCH 05/10] changing the link --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a2895dce..9755b9fe 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # hipCUB > [!NOTE] -> The published documentation is available at [hipCUB](https://rocm.docs.amd.com/projects/hipCUB/en/latest/) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the `docs` folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html). +> The published documentation is available at [hipCUB](https://rocm.docs.amd.com/projects/hipCUB/en/latest/index.html) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the `docs` folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html). hipCUB is a thin wrapper library on top of [rocPRIM](https://github.com/ROCm/rocPRIM) or From 2fe183cac13c1662e2bf6f2dbf2ef02b623ebb25 Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Wed, 18 Dec 2024 11:24:03 -0500 Subject: [PATCH 06/10] reworded what is and data type support --- docs/api-reference/data-type-support.rst | 61 ++++-------------------- docs/index.rst | 9 ++-- docs/what-is-hipcub.rst | 6 +-- 3 files changed, 15 insertions(+), 61 deletions(-) diff --git a/docs/api-reference/data-type-support.rst b/docs/api-reference/data-type-support.rst index e306c148..ae2761a5 100644 --- a/docs/api-reference/data-type-support.rst +++ b/docs/api-reference/data-type-support.rst @@ -8,57 +8,16 @@ Data type support ****************************************** -The input and output data types supported by hipCUB are listed here: +hipCUB supports the following data types on both ROCm and CUDA: - .. list-table:: Supported Input/Output Types - :header-rows: 1 - :name: supported-input-output-types +* ``int8`` +* ``int16`` +* ``int32`` +* ``float32`` +* ``float64`` - * - - Input/Output Types - - AMD Support - - CUDA Support - * - - int8 - - ✅ - - ✅ - * - - float8 - - ❌ - - ❌ - * - - bfloat8 - - ❌ - - ❌ - * - - int16 - - ✅ - - ✅ - * - - float16 - - ✅ - - ✅ [#]_ - * - - bfloat16 - - ✅ - - ✅ [#]_ - * - - int32 - - ✅ - - ✅ - * - - tensorfloat32 - - ❌ - - ❌ - * - - float32 - - ✅ - - ✅ - * - - float64 - - ✅ - - ✅ +``float8``, ``bfloat8``, and ``tensorfloat32`` are not supported by hipCUB on neither ROCm nor CUDA. -.. rubric:: Footnotes -.. [#] NVIDIA backend can't handle ``float16`` with the following API calls: ``block_adjacent_difference``, ``device_adjacenet_difference``, ``device_reduce``, ``device_scan``, ``device_segmented_reduce`` and ``device_select``. -.. [#] NVIDIA backend can't handle ``bfloat16`` with the following API calls: ``block_adjacent_difference``, ``device_adjacenet_difference``, ``device_reduce``, ``device_scan``, ``device_segmented_reduce``, ``device_select`` and ``device_histogram``. +The Nvidia back end does not support ``float16`` nor ``bfloat16`` with the following API calls: ``block_adjacent_difference``, ``device_adjacent_difference``, ``device_reduce``, ``device_scan``, ``device_segmented_reduce`` and ``device_select``. + +The Nvidia backend also does not support ``bfloat16`` with ``device_histogram``. diff --git a/docs/index.rst b/docs/index.rst index dbc469a1..8e4f96eb 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,13 +9,10 @@ hipCUB documentation =========================== -hipCUB is a thin header-only wrapper library on top of rocPRIM or CUB. It enables developers to port project -using CUB library to the `HIP `_ layer and to run them -on AMD hardware. To learn more, see :ref:`what-is-hipcub` +hipCUB is a thin, header-only wrapper library for `rocPRIM `_ and `CUB `_. It enables developers to port projects +using the CUB library to the `HIP `_ layer and run on AMD hardware. To learn more, see :ref:`what-is-hipcub` -You can access hipCUB code on our `GitHub repository `_. - -The documentation is structured as follows: +The hipCUB repository is located at `https://github.com/ROCm/hipCUB `_. .. grid:: 2 diff --git a/docs/what-is-hipcub.rst b/docs/what-is-hipcub.rst index 7c48d04b..bf8ea3f4 100644 --- a/docs/what-is-hipcub.rst +++ b/docs/what-is-hipcub.rst @@ -9,10 +9,8 @@ What is hipCUB? ***************** -hipCUB is a thin header-only wrapper library on top of rocPRIM or CUB. It enables developers to port project -using CUB library to the `HIP `_ layer and to run them -on AMD hardware. In the `ROCm `_ environment, hipCUB uses -rocPRIM library as the backend, while on CUDA platforms it uses CUB. +hipCUB is a thin, header-only wrapper library for `rocPRIM `_ and `CUB `_. It enables developers to port projects +using the CUB library to the `HIP `_ layer and run on AMD hardware. Here are some key points to be noted: From 630ea5ee3d3057987130b56e030605ad6e156358 Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Wed, 18 Dec 2024 11:26:08 -0500 Subject: [PATCH 07/10] Update docs/install/hipCUB-prerequisites.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> --- docs/install/hipCUB-prerequisites.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/install/hipCUB-prerequisites.rst b/docs/install/hipCUB-prerequisites.rst index 077625e7..8310ffd0 100644 --- a/docs/install/hipCUB-prerequisites.rst +++ b/docs/install/hipCUB-prerequisites.rst @@ -27,7 +27,8 @@ On NVIDIA GPUs: The CCCL library is automatically downloaded and built by the CMake script. If libcu++ isn't found on the system, it will be downloaded from the CCCL repository. -On Windows: +On Microsoft Windows: + * Python verion 3.6 or later * Visual Studio 2019 with Clang support From 6d64bf04c8c7acb92b7c06a4de7174f7229590d0 Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Wed, 18 Dec 2024 11:26:20 -0500 Subject: [PATCH 08/10] Update docs/install/hipCUB-install-on-Windows.rst Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com> --- docs/install/hipCUB-install-on-Windows.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/install/hipCUB-install-on-Windows.rst b/docs/install/hipCUB-install-on-Windows.rst index 0a9f5c49..76375abe 100644 --- a/docs/install/hipCUB-install-on-Windows.rst +++ b/docs/install/hipCUB-install-on-Windows.rst @@ -6,7 +6,8 @@ Building and installing hipCUB on Windows ******************************************************************** -You can use ``rmake.py`` to build and install hipCUB on Windows. You can also use `CMake <./hipCUB-install-with-cmake.html>`_ if you want more build and installation options. +You can use ``rmake.py`` to build and install hipCUB on Microsoft Windows. You can also use `CMake <./hipCUB-install-with-cmake.html>`_ if you want more build and installation options. + ``rmake.py`` is located in the ``hipCUB`` root directory. To build and install hipCUB with ``rmake.py``, run: From ca8b6b1bc19fda1362d7bab9f37c3aeb076207ee Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Wed, 18 Dec 2024 11:27:23 -0500 Subject: [PATCH 09/10] added link to cmake --- docs/install/hipCUB-prerequisites.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/install/hipCUB-prerequisites.rst b/docs/install/hipCUB-prerequisites.rst index 8310ffd0..881cef31 100644 --- a/docs/install/hipCUB-prerequisites.rst +++ b/docs/install/hipCUB-prerequisites.rst @@ -8,7 +8,7 @@ hipCUB prerequisites hipCUB has the following prerequisites on all platforms: -* CMake version 3.16 or higher +* `CMake `_ version 3.16 or higher On AMD GPUs: From be6a12b63be21f9eeb42b33ff1418facecde8d8d Mon Sep 17 00:00:00 2001 From: spolifroni-amd Date: Wed, 18 Dec 2024 12:51:07 -0500 Subject: [PATCH 10/10] uppercases. --- docs/api-reference/data-type-support.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/api-reference/data-type-support.rst b/docs/api-reference/data-type-support.rst index ae2761a5..47f01cf9 100644 --- a/docs/api-reference/data-type-support.rst +++ b/docs/api-reference/data-type-support.rst @@ -18,6 +18,6 @@ hipCUB supports the following data types on both ROCm and CUDA: ``float8``, ``bfloat8``, and ``tensorfloat32`` are not supported by hipCUB on neither ROCm nor CUDA. -The Nvidia back end does not support ``float16`` nor ``bfloat16`` with the following API calls: ``block_adjacent_difference``, ``device_adjacent_difference``, ``device_reduce``, ``device_scan``, ``device_segmented_reduce`` and ``device_select``. +The NVIDIA back end does not support ``float16`` nor ``bfloat16`` with the following API calls: ``block_adjacent_difference``, ``device_adjacent_difference``, ``device_reduce``, ``device_scan``, ``device_segmented_reduce`` and ``device_select``. -The Nvidia backend also does not support ``bfloat16`` with ``device_histogram``. +The NVIDIA backend also does not support ``bfloat16`` with ``device_histogram``.