-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add ThreadLocalCache #1380
Merged
Merged
add ThreadLocalCache #1380
Changes from all commits
Commits
Show all changes
31 commits
Select commit
Hold shift + click to select a range
aff7f03
add ThreadLocalCache
jcosborn 8f297f1
fix CI error
jcosborn 1a3614c
add coalescing to ThreadLocalCache
jcosborn c2235ee
don't return reference for ThreadLocalCache indexing
jcosborn f686285
update docs
jcosborn b16b9b2
add shared memory helper
jcosborn d6ac933
add shared memory helper to HIP
jcosborn f29ff00
refactor SharedMemoryCache to use new SharedMemory object
jcosborn f2419e1
remove unused parameter
jcosborn dd7b975
update thread_array
jcosborn 3fee9f9
Merge branch 'develop' into feature/sycl-merge
jcosborn 72cc52b
update remaining shared memory uses
jcosborn 13f6701
fix HIP target
jcosborn 7df0d93
fix HIP build
jcosborn c73d2b4
update shared memory object
jcosborn bab726e
fix clang build
jcosborn b576582
code cleanup
jcosborn 5889bc4
format
jcosborn 9f7074f
Fix some issues TuneParam::shared_bytes settings with some of the tun…
maddyscientist 12c5c98
Fix sharedBytesPerThread for kernels that utilize thread_array
maddyscientist 5ce1230
fix overlapping shared mem
jcosborn 56e0ee6
Bug fix for VUV/VLV kernels now that dynamic shared memory is used
maddyscientist c2d336c
Add some shared memory checks when launching kernels
maddyscientist a5cb472
Merge branch 'develop' into feature/sycl-merge
jcosborn ce5cf6b
Merge branch 'develop' into feature/sycl-merge
jcosborn 2048bd1
fix some shared bytes amounts
jcosborn d5e73b6
move some code to address comments
jcosborn 13075c7
fix typo
jcosborn fc3ea42
clang format
jcosborn bad098a
Apply clang format
maddyscientist 418824e
Merge branch 'develop' into feature/sycl-merge
jcosborn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ namespace quda | |
// matrix+matrix = 18 floating-point ops | ||
// => Total number of floating point ops per function call | ||
// dims * (2*18 + 4*198) = dims*828 | ||
using computeStapleOps = thread_array<int, 4>; | ||
template <typename Arg, typename Staple, typename Int> | ||
__host__ __device__ inline void computeStaple(const Arg &arg, const int *x, const Int *X, const int parity, const int nu, Staple &staple, const int dir_ignore) | ||
{ | ||
|
@@ -94,6 +95,7 @@ namespace quda | |
// matrix+matrix = 18 floating-point ops | ||
// => Total number of floating point ops per function call | ||
// dims * (8*18 + 28*198) = dims*5688 | ||
using computeStapleRectangleOps = thread_array<int, 4>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use this on line 107 or delete this line |
||
template <typename Arg, typename Staple, typename Rectangle, typename Int> | ||
__host__ __device__ inline void computeStapleRectangle(const Arg &arg, const int *x, const Int *X, const int parity, const int nu, | ||
Staple &staple, Rectangle &rectangle, const int dir_ignore) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker, but if this is being defined here, we should use this type down on line 29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with your logic here @weinbe2: it's not a case that we're using this type in this kernel function, rather this user defined type is set to match the type used in the kernel function. Making line 29 use
computeStapleOps
would serve to obfuscate the code.