Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: randyh62 <[email protected]>
  • Loading branch information
2 people authored and matyas-streamhpc committed Dec 16, 2024
1 parent 46e45bc commit 9835194
Showing 1 changed file with 16 additions and 17 deletions.
33 changes: 16 additions & 17 deletions docs/how-to/hip_runtime_api/asynchronous.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
Asynchronous concurrent execution
*******************************************************************************

Asynchronous concurrent execution important for efficient parallelism and
Asynchronous concurrent execution is important for efficient parallelism and
resource utilization, with techniques such as overlapping computation and data
transfer, managing concurrent kernel execution with streams on single or
multiple devices or using HIP graphs.
multiple devices, or using HIP graphs.

Streams and concurrent execution
===============================================================================
Expand All @@ -20,22 +20,22 @@ All asynchronous APIs, such as kernel execution, data movement and potentially
data allocation/freeing all happen in the context of device streams.

Streams are FIFO buffers of commands to execute in order on a given device.
Commands which enqueue tasks on a stream all return promptly and the command is
executed asynchronously. Multiple streams may point to the same device and
those streams may be fed from multiple concurrent host-side threads. Execution
on multiple streams may be concurrent but isn't required to be.
Commands which enqueue tasks on a stream all return promptly and the task is
executed asynchronously. Multiple streams can point to the same device and
those streams might be fed from multiple concurrent host-side threads. Execution
on multiple streams might be concurrent but isn't required to be.

Managing streams
-------------------------------------------------------------------------------

Streams enable the overlap of computation and data transfer, ensuring
continuous GPU activity.

To create a stream, the following functions are used, each returning a handle
To create a stream, the following functions are used, each defining a handle
to the newly created stream:

- :cpp:func:`hipStreamCreate`: Creates a stream with default settings.
- :cpp:func:`hipStreamCreateWithFlags`: Allows creating a stream, with specific
- :cpp:func:`hipStreamCreateWithFlags`: Creates a stream, with specific
flags, listed below, enabling more control over stream behavior:

- ``hipStreamDefault``: creates a default stream suitable for most
Expand All @@ -45,7 +45,7 @@ to the newly created stream:
simultaneously without waiting for each other to complete, thus improving
overall performance.

- :cpp:func:`hipStreamCreateWithPriority``: Allows creating a stream with a
- :cpp:func:`hipStreamCreateWithPriority`: Allows creating a stream with a
specified priority, enabling prioritization of certain tasks.

The :cpp:func:`hipStreamSynchronize` function is used to block the calling host
Expand Down Expand Up @@ -80,17 +80,17 @@ achieved using :cpp:func:`hipStreamWaitEvent`, which allows a kernel to wait
for a specific event before starting execution.

Independent kernels can only run concurrently, if there are enough registers
and share memories for the kernels. To reach concurrent kernel executions, the
and shared memory for the kernels. To enable concurrent kernel executions, the
developer may have to reduce the block size of the kernels. The kernel runtimes
can be misleading at concurrent kernel runs, that's why during optimization
it's better to check the trace files, to see if a kernel is blocking another
kernel, while they are running parallel.
can be misleading for concurrent kernel runs, that is why during optimization
it is a good practice to check the trace files, to see if one kernel is blocking another
kernel, while they are running in parallel.

When running kernels in parallel, the execution time can increase due to
contention for shared resources. This is because multiple kernels may attempt
to access the same GPU resources simultaneously, leading to delays.

Asynchronous kernel execution is beneficial only under specific conditions It
Asynchronous kernel execution is beneficial only under specific conditions. It
is most effective when the kernels do not fully utilize the GPU's resources. In
such cases, overlapping kernel execution can improve overall throughput and
efficiency by keeping the GPU busy without exceeding its capacity.
Expand Down Expand Up @@ -170,7 +170,7 @@ are used when immediate completion is required. When a synchronous function is
called, control is not returned to the host thread before the device has
completed the requested task. The behavior of the host thread—whether to yield,
block, or spin—can be specified using :cpp:func:`hipSetDeviceFlags` with
specific flags. Understanding when to use synchronous calls is important for
appropriate flags. Understanding when to use synchronous calls is important for
managing execution flow and avoiding data races.

Events for synchronization
Expand All @@ -179,7 +179,7 @@ Events for synchronization
By creating an event with :cpp:func:`hipEventCreate` and recording it with
:cpp:func:`hipEventRecord`, developers can synchronize operations across
streams, ensuring correct task execution order. :cpp:func:`hipEventSynchronize`
allows waiting for an event to complete before proceeding with the next
lets the application wait for an event to complete before proceeding with the next
operation.

Programmatic dependent launch and synchronization
Expand Down Expand Up @@ -483,5 +483,4 @@ abstraction for managing dependencies and synchronization. By representing
sequences of kernels and memory operations as a single graph, they simplify
complex workflows and enhance performance, particularly for applications with
intricate dependencies and multiple execution stages.

For more details, see the :ref:`how_to_HIP_graph` documentation.

0 comments on commit 9835194

Please sign in to comment.