You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During our training, one of my team members added a private declaration for all module-inlined kernel, and got a significant speedup. I verified this, so we have now this confirmed for gfortran 8.4, 11.4, and 14-something. Timing (of my game-of-life test in the training):
No inlining (-O3): 38.933 seconds
(module) inlining: 27.603 seconds
(module) inlining + private declaration of all kernels: 10.159 seconds
Looking at the assembly output indicates that without the private declaration only two of four kernels are inlined. Without private:
When also using fuse, it gets even worse: fusing the first three loops (the fourth one can't) with inlining results in a runtime of 30 seconds. Adding the above private declarations brings down the runtime to 7.5 seconds.
The text was updated successfully, but these errors were encountered:
During our training, one of my team members added a private declaration for all module-inlined kernel, and got a significant speedup. I verified this, so we have now this confirmed for gfortran 8.4, 11.4, and 14-something. Timing (of my game-of-life test in the training):
Looking at the assembly output indicates that without the private declaration only two of four kernels are inlined. Without private:
So there are still two calls left. Adding the private directive (details below):
Test case: https://github.com/stfc/PSyclone/tree/1623_add_training/tutorial/training/gocean/2.6-GameOfLife-fuse/solution
(just modify the Makefile to use inline.py instead of fuse_loops.py).
Then manually add:
When also using fuse, it gets even worse: fusing the first three loops (the fourth one can't) with inlining results in a runtime of 30 seconds. Adding the above private declarations brings down the runtime to 7.5 seconds.
The text was updated successfully, but these errors were encountered: