From 983519493e77be104544c4a4586c7c93a7db176e Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Sun, 15 Dec 2024 18:53:40 +0100 Subject: [PATCH] Apply suggestions from code review Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com> --- docs/how-to/hip_runtime_api/asynchronous.rst | 33 ++++++++++---------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/docs/how-to/hip_runtime_api/asynchronous.rst b/docs/how-to/hip_runtime_api/asynchronous.rst index f7b1fae5ba..cf427431d4 100644 --- a/docs/how-to/hip_runtime_api/asynchronous.rst +++ b/docs/how-to/hip_runtime_api/asynchronous.rst @@ -8,10 +8,10 @@ Asynchronous concurrent execution ******************************************************************************* -Asynchronous concurrent execution important for efficient parallelism and +Asynchronous concurrent execution is important for efficient parallelism and resource utilization, with techniques such as overlapping computation and data transfer, managing concurrent kernel execution with streams on single or -multiple devices or using HIP graphs. +multiple devices, or using HIP graphs. Streams and concurrent execution =============================================================================== @@ -20,10 +20,10 @@ All asynchronous APIs, such as kernel execution, data movement and potentially data allocation/freeing all happen in the context of device streams. Streams are FIFO buffers of commands to execute in order on a given device. -Commands which enqueue tasks on a stream all return promptly and the command is -executed asynchronously. Multiple streams may point to the same device and -those streams may be fed from multiple concurrent host-side threads. Execution -on multiple streams may be concurrent but isn't required to be. +Commands which enqueue tasks on a stream all return promptly and the task is +executed asynchronously. Multiple streams can point to the same device and +those streams might be fed from multiple concurrent host-side threads. Execution +on multiple streams might be concurrent but isn't required to be. Managing streams ------------------------------------------------------------------------------- @@ -31,11 +31,11 @@ Managing streams Streams enable the overlap of computation and data transfer, ensuring continuous GPU activity. -To create a stream, the following functions are used, each returning a handle +To create a stream, the following functions are used, each defining a handle to the newly created stream: - :cpp:func:`hipStreamCreate`: Creates a stream with default settings. -- :cpp:func:`hipStreamCreateWithFlags`: Allows creating a stream, with specific +- :cpp:func:`hipStreamCreateWithFlags`: Creates a stream, with specific flags, listed below, enabling more control over stream behavior: - ``hipStreamDefault``: creates a default stream suitable for most @@ -45,7 +45,7 @@ to the newly created stream: simultaneously without waiting for each other to complete, thus improving overall performance. -- :cpp:func:`hipStreamCreateWithPriority``: Allows creating a stream with a +- :cpp:func:`hipStreamCreateWithPriority`: Allows creating a stream with a specified priority, enabling prioritization of certain tasks. The :cpp:func:`hipStreamSynchronize` function is used to block the calling host @@ -80,17 +80,17 @@ achieved using :cpp:func:`hipStreamWaitEvent`, which allows a kernel to wait for a specific event before starting execution. Independent kernels can only run concurrently, if there are enough registers -and share memories for the kernels. To reach concurrent kernel executions, the +and shared memory for the kernels. To enable concurrent kernel executions, the developer may have to reduce the block size of the kernels. The kernel runtimes -can be misleading at concurrent kernel runs, that's why during optimization -it's better to check the trace files, to see if a kernel is blocking another -kernel, while they are running parallel. +can be misleading for concurrent kernel runs, that is why during optimization +it is a good practice to check the trace files, to see if one kernel is blocking another +kernel, while they are running in parallel. When running kernels in parallel, the execution time can increase due to contention for shared resources. This is because multiple kernels may attempt to access the same GPU resources simultaneously, leading to delays. -Asynchronous kernel execution is beneficial only under specific conditions It +Asynchronous kernel execution is beneficial only under specific conditions. It is most effective when the kernels do not fully utilize the GPU's resources. In such cases, overlapping kernel execution can improve overall throughput and efficiency by keeping the GPU busy without exceeding its capacity. @@ -170,7 +170,7 @@ are used when immediate completion is required. When a synchronous function is called, control is not returned to the host thread before the device has completed the requested task. The behavior of the host thread—whether to yield, block, or spin—can be specified using :cpp:func:`hipSetDeviceFlags` with -specific flags. Understanding when to use synchronous calls is important for +appropriate flags. Understanding when to use synchronous calls is important for managing execution flow and avoiding data races. Events for synchronization @@ -179,7 +179,7 @@ Events for synchronization By creating an event with :cpp:func:`hipEventCreate` and recording it with :cpp:func:`hipEventRecord`, developers can synchronize operations across streams, ensuring correct task execution order. :cpp:func:`hipEventSynchronize` -allows waiting for an event to complete before proceeding with the next +lets the application wait for an event to complete before proceeding with the next operation. Programmatic dependent launch and synchronization @@ -483,5 +483,4 @@ abstraction for managing dependencies and synchronization. By representing sequences of kernels and memory operations as a single graph, they simplify complex workflows and enhance performance, particularly for applications with intricate dependencies and multiple execution stages. - For more details, see the :ref:`how_to_HIP_graph` documentation.