Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplification of capture mode usages in submit / parallel_for calls #1959

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

SergeyKopienko
Copy link
Contributor

@SergeyKopienko SergeyKopienko commented Dec 6, 2024

In this PR we simplify our code in the part of capture modes in submit / parallel_for calls:

  • we avoid extra captures by value of some variables.

Justification.

  1. sycl::submit always executed synchronously till the call of parallel_for inside.
    This mean that no reasons to capture by value any variables used only inside submit call but NOT inside parallel_for
  2. sycl::parallel_for function always captures all variables by copy ([=]).
    This mean that inside lambda, passed into sycl::parallel_for call all references and so on will be in real new local copies of source variables.

SYCL™ 2020 Specification (revision 9).

As described at 4.6.5. Queue class, All member functions of the queue class are synchronous and errors are handled by throwing synchronous SYCL exceptions. The submit member function synchronously invokes the provided command group function object (as described in Section 3.7.1.2) in the calling thread, thereby scheduling a command group for asynchronous execution. Any error in the submission of a command group is handled by throwing a synchronous SYCL exception. Any errors from the command group after it has been submitted are handled by passing asynchronous errors at specific times to an async_handler, as described in Section 4.13.

Example:

Thanks to @dmitriy-sobolev for this reference to sycl documentation.

…pture in __parallel_for_submitter::operator()

Signed-off-by: Sergey Kopienko <[email protected]>
…pture in __parallel_partial_sort_submitter::operator()

Signed-off-by: Sergey Kopienko <[email protected]>
…ix capture in __parallel_for_fpga_submitter::operator()

Signed-off-by: Sergey Kopienko <[email protected]>
….h - fix capture in __merge_sort_global_submitter::operator()

Signed-off-by: Sergey Kopienko <[email protected]>
….h - fix capture in __merge_sort_copy_back_submitter::operator()

Signed-off-by: Sergey Kopienko <[email protected]>
… fix capture in __parallel_transform_reduce_small_submitter::operator()

Signed-off-by: Sergey Kopienko <[email protected]>
… fix capture in __parallel_transform_reduce_small_submitter::operator()

Signed-off-by: Sergey Kopienko <[email protected]>
… fix capture in __parallel_transform_reduce_small_submitter::operator()

Signed-off-by: Sergey Kopienko <[email protected]>
… fix capture in __parallel_transform_reduce_impl::submit

Signed-off-by: Sergey Kopienko <[email protected]>
@SergeyKopienko
Copy link
Contributor Author

@danhoeflinger you wrote this capturing:

return __exec.queue().submit([&, this](sycl::handler& __cgh) {

It's technically correct but a little bit tricky.
Could you please fix it too in this PR?

@SergeyKopienko SergeyKopienko marked this pull request as ready for review December 6, 2024 16:17
@mmichel11
Copy link
Contributor

What is the benefit of capturing types with a size less than or equal to the size of a pointer by reference? Throughout oneDPL, we accept small types by value instead of by reference.

If the type is large, then this change makes sense to me.

@SergeyKopienko
Copy link
Contributor Author

SergeyKopienko commented Dec 6, 2024

What is the benefit of capturing types with a size less than or equal to the size of a pointer by reference? Throughout oneDPL, we accept small types by value instead of by reference.

If the type is large, then this change makes sense to me.

Exactly in which place? Inside parallel_for we will work with the copies, I mean.

@mmichel11
Copy link
Contributor

mmichel11 commented Dec 6, 2024

What is the benefit of capturing types with a size less than or equal to the size of a pointer by reference? Throughout oneDPL, we accept small types by value instead of by reference.
If the type is large, then this change makes sense to me.

Exactly in which place? Inside parallel_for we will work with the copies, I mean.

In the outer lambda passed to submit. I assume the capture by reference pulls in the address of the variable. If the size of the type is smaller than the size of the type that holds the address (likely a pointer), then I do not see the benefit. Although I assume the compiler optimizes either case and there is no real performance concern.

@MikeDvorskiy
Copy link
Contributor

What is the benefit of capturing types with a size less than or equal to the size of a pointer by reference? Throughout oneDPL, we accept small types by value instead of by reference.
If the type is large, then this change makes sense to me.

Exactly in which place? Inside parallel_for we will work with the copies, I mean.

In the outer lambda passed to submit. I assume the capture by reference pulls in the address of the variable. If the size of the type is smaller than the size of the type that holds the address (likely a pointer), then I do not see the benefit. Although I assume the compiler optimizes either case and there is no real performance concern.

Yes, I believe in a compiler too.

@SergeyKopienko SergeyKopienko changed the title Fix capture modes in submit calls Simplification of capture mode usages in submit / parallel_for calls Dec 9, 2024
@SergeyKopienko SergeyKopienko marked this pull request as draft December 9, 2024 15:30
@SergeyKopienko SergeyKopienko removed this from the 2022.8.0 milestone Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants