Add asynchronous concurrent execution #3687

matyas-streamhpc · 2024-11-25T10:27:58Z

No description provided.

randyh62

left comments. Looks good overall.

docs/how-to/hip_runtime_api/asynchronous.rst

cjatin · 2025-01-09T11:20:44Z

docs/how-to/hip_runtime_api/asynchronous.rst

+
+Concurrent execution between the host (CPU) and device (GPU) allows the CPU to
+perform other tasks while the GPU is executing kernels. Kernels can be launched
+asynchronously using ``hipLaunchKernelDefault`` with a stream, enabling the CPU


What is hipLaunchKernelDefault? Where is it defined?

Fixed this sentence.

cjatin · 2025-01-09T11:25:49Z

docs/how-to/hip_runtime_api/asynchronous.rst

+and shared memory for the kernels. To enable concurrent kernel executions, the
+developer may have to reduce the block size of the kernels. The kernel runtimes
+can be misleading for concurrent kernel runs, that is why during optimization
+it is a good practice to check the trace files, to see if one kernel is blocking another


Can we clarify what we mean by tracing here.
User might confuse it with ltrace etc. Also can we point user to the document which helps them trace so that they do not have to search for it.

I am linking in the rocprof documnetation.

cjatin · 2025-01-09T11:27:31Z

docs/how-to/hip_runtime_api/asynchronous.rst

+utilization and improved performance.
+
+Asynchronous execution is particularly advantageous in iterative processes. For
+instance, if an iteration calculation is initiated, it can be efficient to


Not sure what you mean by

iteration calculation is initiated

Can we provide an example here.

Added example link, plus rephrased.

cjatin · 2025-01-09T11:31:49Z

docs/how-to/hip_runtime_api/asynchronous.rst

+                        << status << ": "            \
+                        << hipGetErrorString(status) \
+                        << " at " << __FILE__ << ":" \
+                        << __LINE__ << std::endl;    \


maybe add an exit or std::abort here if check fails.

I would avoid the exit usage. We mentioned this at the error handling page:
https://rocm.docs.amd.com/projects/HIP/en/docs-develop/how-to/hip_runtime_api/error_handling.html#hip-check-macros

cjatin · 2025-01-09T11:36:02Z

docs/how-to/hip_runtime_api/asynchronous.rst

+            constexpr int numOfBlocks = 256;
+            constexpr int threadsPerBlock = 4096;
+            constexpr int numberOfIterations = 50;
+            size_t arraySize = 1U << 20;


might as well make this constexpr too

Made the requested changes.

cjatin · 2025-01-09T11:36:47Z

docs/how-to/hip_runtime_api/asynchronous.rst

+            }
+
+            // Wait for all operations to complete
+            HIP_CHECK(hipDeviceSynchronize());


Can we do some sort of validation here. All we did was see if kernels got executed and user did not get any error.

This should help user get confidence that the output is same regardless of sync or async.

Made the requested changes.

cjatin · 2025-01-09T11:37:44Z

docs/how-to/hip_runtime_api/asynchronous.rst

+                        << status << ": "            \
+                        << hipGetErrorString(status) \
+                        << " at " << __FILE__ << ":" \
+                        << __LINE__ << std::endl;    \


Same comments as sync variant

Made the requested changes.

cjatin · 2025-01-09T11:38:11Z

docs/how-to/hip_runtime_api/asynchronous.rst

+                        << status << ": "            \
+                        << hipGetErrorString(status) \
+                        << " at " << __FILE__ << ":" \
+                        << __LINE__ << std::endl;    \


Same comments as sync variant

Made the requested changes.

AidanBeltonS

Overall LGTM, some minor comments

AidanBeltonS · 2025-01-10T15:27:36Z

docs/how-to/hip_runtime_api/asynchronous.rst

+those streams might be fed from multiple concurrent host-side threads. Execution
+on multiple streams might be concurrent but isn't required to be.


A bit vague for quite an important detail

Suggested change

those streams might be fed from multiple concurrent host-side threads. Execution

on multiple streams might be concurrent but isn't required to be.

those streams might be fed from multiple concurrent host-side threads. Multiple streams

tied to the same device are not guaranteed to execute their commands in order.

AidanBeltonS · 2025-01-10T15:30:06Z

docs/how-to/hip_runtime_api/asynchronous.rst

+Streams enable the overlap of computation and data transfer, ensuring
+continuous GPU activity.


[NIT] Seems vague and out of place in.

docs/how-to/hip_runtime_api/asynchronous.rst

AidanBeltonS · 2025-01-10T15:45:39Z

docs/how-to/hip_runtime_api/asynchronous.rst

+Asynchronous memory operations allow data to be transferred between the host
+and device while kernels are being executed on the GPU. Using operations like


I think this sentence is a bit misleading. A reader could miss the fact that this operation must be on a different streams to get this behavior.

Asynchronous memory operations, do not block the host while copying this data.
Asynchronous memory operations on multiple streams allow for data to be transferred between the host and device while kernels are executed. (and do not block the host while copying this data)

AidanBeltonS · 2025-01-10T15:47:25Z

docs/how-to/hip_runtime_api/asynchronous.rst

+One of the primary benefits of asynchronous operations is the ability to
+overlap data transfer with kernel execution, leading to better resource
+utilization and improved performance.


[NIT] you could clarify that multiple streams are needed to copy while executing a kernel in parallel.

AidanBeltonS · 2025-01-10T15:51:26Z

docs/how-to/hip_runtime_api/asynchronous.rst

+another. This technique is especially useful in applications with large data
+sets that need to be processed quickly.
+
+Concurrent data transfers


How does this differ to the Asynchronous memory operations section?
This feels repetitive, and it is not clear on how you wish to distinguish between concurrent and asynchronous within this context

Co-authored-by: AidanBeltonS <[email protected]>

matyas-streamhpc requested a review from neon60 November 25, 2024 10:27

matyas-streamhpc self-assigned this Nov 25, 2024

neon60 force-pushed the docs/develop branch from d5aedf9 to dcc6faa Compare November 28, 2024 22:33

neon60 force-pushed the async-doc branch 2 times, most recently from 1484d67 to f81588d Compare December 2, 2024 08:46

neon60 marked this pull request as ready for review December 2, 2024 08:53

neon60 requested review from chrispaquot, gandryey, saleelk, mangupta and rakesroy as code owners December 2, 2024 08:53

neon60 force-pushed the docs/develop branch from dcc6faa to 8d396e4 Compare December 4, 2024 12:37

neon60 force-pushed the async-doc branch from f927cd0 to 01036d5 Compare December 4, 2024 12:43

neon60 force-pushed the docs/develop branch from 1026037 to f59066d Compare December 5, 2024 12:20

neon60 force-pushed the async-doc branch 4 times, most recently from fd5af51 to 6a139c6 Compare December 6, 2024 18:18

randyh62 approved these changes Dec 11, 2024

View reviewed changes

randyh62 reviewed Dec 11, 2024

View reviewed changes

docs/how-to/hip_runtime_api/asynchronous.rst Outdated Show resolved Hide resolved

matyas-streamhpc force-pushed the async-doc branch from 495e166 to 9835194 Compare December 16, 2024 16:22

neon60 force-pushed the docs/develop branch from c604f44 to 7a6770a Compare December 19, 2024 16:26

neon60 force-pushed the async-doc branch 4 times, most recently from ed7e05f to 7cb3237 Compare December 19, 2024 17:42

neon60 force-pushed the docs/develop branch from ad51760 to 01cfbee Compare January 2, 2025 09:24

neon60 force-pushed the async-doc branch from 3e290fe to 9592272 Compare January 2, 2025 09:25

neon60 and others added 2 commits January 8, 2025 16:27

Add asynchronous execution documentation page

c4ab7bf

Update based on review comment

fb4835b

WIP

e0ab3f3

neon60 force-pushed the async-doc branch from 9592272 to e0ab3f3 Compare January 8, 2025 15:37

cjatin requested changes Jan 9, 2025

View reviewed changes

neon60 added 6 commits January 9, 2025 20:21

WIP

be2e342

WIP

5114227

WIP

d2f5343

WIP

725c9d9

WIP

05f183e

WIP

01e69cd

neon60 requested a review from cjatin January 10, 2025 11:44

AidanBeltonS reviewed Jan 10, 2025

View reviewed changes

neon60 and others added 2 commits January 10, 2025 18:37

Update docs/how-to/hip_runtime_api/asynchronous.rst

29e831c

Co-authored-by: AidanBeltonS <[email protected]>

Update docs/how-to/hip_runtime_api/asynchronous.rst

9bd7ad6

Co-authored-by: AidanBeltonS <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add asynchronous concurrent execution #3687

Add asynchronous concurrent execution #3687

matyas-streamhpc commented Nov 25, 2024

randyh62 left a comment

cjatin Jan 9, 2025

neon60 Jan 10, 2025

cjatin Jan 9, 2025

neon60 Jan 10, 2025

cjatin Jan 9, 2025

neon60 Jan 10, 2025

cjatin Jan 9, 2025

neon60 Jan 10, 2025

cjatin Jan 9, 2025

neon60 Jan 10, 2025

cjatin Jan 9, 2025

neon60 Jan 10, 2025

cjatin Jan 9, 2025

neon60 Jan 10, 2025

cjatin Jan 9, 2025

neon60 Jan 10, 2025

AidanBeltonS left a comment

AidanBeltonS Jan 10, 2025

AidanBeltonS Jan 10, 2025

AidanBeltonS Jan 10, 2025 •

edited

Loading

AidanBeltonS Jan 10, 2025

AidanBeltonS Jan 10, 2025

		those streams might be fed from multiple concurrent host-side threads. Execution
		on multiple streams might be concurrent but isn't required to be.

		Streams enable the overlap of computation and data transfer, ensuring
		continuous GPU activity.

		Asynchronous memory operations allow data to be transferred between the host
		and device while kernels are being executed on the GPU. Using operations like

Add asynchronous concurrent execution #3687

Are you sure you want to change the base?

Add asynchronous concurrent execution #3687

Conversation

matyas-streamhpc commented Nov 25, 2024

randyh62 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AidanBeltonS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AidanBeltonS Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AidanBeltonS Jan 10, 2025 •

edited

Loading