From 983519493e77be104544c4a4586c7c93a7db176e Mon Sep 17 00:00:00 2001
From: Istvan Kiss <neon60@gmail.com>
Date: Sun, 15 Dec 2024 18:53:40 +0100
Subject: [PATCH] Apply suggestions from code review

Co-authored-by: randyh62 <42045079+randyh62@users.noreply.github.com>
---
 docs/how-to/hip_runtime_api/asynchronous.rst | 33 ++++++++++----------
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/docs/how-to/hip_runtime_api/asynchronous.rst b/docs/how-to/hip_runtime_api/asynchronous.rst
index f7b1fae5ba..cf427431d4 100644
--- a/docs/how-to/hip_runtime_api/asynchronous.rst
+++ b/docs/how-to/hip_runtime_api/asynchronous.rst
@@ -8,10 +8,10 @@
 Asynchronous concurrent execution
 *******************************************************************************
 
-Asynchronous concurrent execution important for efficient parallelism and
+Asynchronous concurrent execution is important for efficient parallelism and
 resource utilization, with techniques such as overlapping computation and data
 transfer, managing concurrent kernel execution with streams on single or
-multiple devices or using HIP graphs.
+multiple devices, or using HIP graphs.
 
 Streams and concurrent execution
 ===============================================================================
@@ -20,10 +20,10 @@ All asynchronous APIs, such as kernel execution, data movement and potentially
 data allocation/freeing all happen in the context of device streams.
 
 Streams are FIFO buffers of commands to execute in order on a given device.
-Commands which enqueue tasks on a stream all return promptly and the command is
-executed asynchronously. Multiple streams may point to the same device and
-those streams may be fed from multiple concurrent host-side threads. Execution
-on multiple streams may be concurrent but isn't required to be.
+Commands which enqueue tasks on a stream all return promptly and the task is
+executed asynchronously. Multiple streams can point to the same device and
+those streams might be fed from multiple concurrent host-side threads. Execution
+on multiple streams might be concurrent but isn't required to be.
 
 Managing streams
 -------------------------------------------------------------------------------
@@ -31,11 +31,11 @@ Managing streams
 Streams enable the overlap of computation and data transfer, ensuring
 continuous GPU activity. 
 
-To create a stream, the following functions are used, each returning a handle
+To create a stream, the following functions are used, each defining a handle
 to the newly created stream:
 
 - :cpp:func:`hipStreamCreate`: Creates a stream with default settings.
-- :cpp:func:`hipStreamCreateWithFlags`: Allows creating a stream, with specific
+- :cpp:func:`hipStreamCreateWithFlags`: Creates a stream, with specific
   flags, listed below, enabling more control over stream behavior:
 
   - ``hipStreamDefault``: creates a default stream suitable for most
@@ -45,7 +45,7 @@ to the newly created stream:
     simultaneously without waiting for each other to complete, thus improving
     overall performance.
 
-- :cpp:func:`hipStreamCreateWithPriority``: Allows creating a stream with a
+- :cpp:func:`hipStreamCreateWithPriority`: Allows creating a stream with a
   specified priority, enabling prioritization of certain tasks.
 
 The :cpp:func:`hipStreamSynchronize` function is used to block the calling host
@@ -80,17 +80,17 @@ achieved using :cpp:func:`hipStreamWaitEvent`, which allows a kernel to wait
 for a specific event before starting execution.
 
 Independent kernels can only run concurrently, if there are enough registers
-and share memories for the kernels. To reach concurrent kernel executions, the
+and shared memory for the kernels. To enable concurrent kernel executions, the
 developer may have to reduce the block size of the kernels. The kernel runtimes
-can be misleading at concurrent kernel runs, that's why during optimization
-it's better to check the trace files, to see if a kernel is blocking another
-kernel, while they are running parallel.
+can be misleading for concurrent kernel runs, that is why during optimization
+it is a good practice to check the trace files, to see if one kernel is blocking another
+kernel, while they are running in parallel.
 
 When running kernels in parallel, the execution time can increase due to
 contention for shared resources. This is because multiple kernels may attempt
 to access the same GPU resources simultaneously, leading to delays.
 
-Asynchronous kernel execution is beneficial only under specific conditions It
+Asynchronous kernel execution is beneficial only under specific conditions. It
 is most effective when the kernels do not fully utilize the GPU's resources. In
 such cases, overlapping kernel execution can improve overall throughput and
 efficiency by keeping the GPU busy without exceeding its capacity.
@@ -170,7 +170,7 @@ are used when immediate completion is required. When a synchronous function is
 called, control is not returned to the host thread before the device has
 completed the requested task. The behavior of the host thread—whether to yield,
 block, or spin—can be specified using :cpp:func:`hipSetDeviceFlags` with
-specific flags. Understanding when to use synchronous calls is important for
+appropriate flags. Understanding when to use synchronous calls is important for
 managing execution flow and avoiding data races.
 
 Events for synchronization
@@ -179,7 +179,7 @@ Events for synchronization
 By creating an event with :cpp:func:`hipEventCreate` and recording it with
 :cpp:func:`hipEventRecord`, developers can synchronize operations across
 streams, ensuring correct task execution order. :cpp:func:`hipEventSynchronize`
-allows waiting for an event to complete before proceeding with the next
+lets the application wait for an event to complete before proceeding with the next
 operation.
 
 Programmatic dependent launch and synchronization
@@ -483,5 +483,4 @@ abstraction for managing dependencies and synchronization. By representing
 sequences of kernels and memory operations as a single graph, they simplify
 complex workflows and enhance performance, particularly for applications with
 intricate dependencies and multiple execution stages.
-
 For more details, see the :ref:`how_to_HIP_graph` documentation.