Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement by using private declaration for gfortran #2850

Open
hiker opened this issue Jan 15, 2025 · 0 comments
Open

Performance improvement by using private declaration for gfortran #2850

hiker opened this issue Jan 15, 2025 · 0 comments

Comments

@hiker
Copy link
Collaborator

hiker commented Jan 15, 2025

During our training, one of my team members added a private declaration for all module-inlined kernel, and got a significant speedup. I verified this, so we have now this confirmed for gfortran 8.4, 11.4, and 14-something. Timing (of my game-of-life test in the training):

No inlining (-O3):                                      38.933 seconds
(module) inlining:                                      27.603 seconds
(module) inlining + private declaration of all kernels: 10.159 seconds

Looking at the assembly output indicates that without the private declaration only two of four kernels are inlined. Without private:

~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ gfortran -S  -c -O3 -I/home/joerg/work/psyclone/tutorial/training/gocean/gol-lib -I/home/joerg/work/psyclone/external/dl_esm_inf/finite_difference/src time_step_alg_mod_psy.f90
~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ grep call time_step_alg_mod_psy.s 
	call	__psy_time_step_alg_mod_MOD_compute_die_code
	call	__psy_time_step_alg_mod_MOD_count_neighbours_code
~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ 

So there are still two calls left. Adding the private directive (details below):

~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ gfortran -S  -c -O3 -I/home/joerg/work/psyclone/tutorial/training/gocean/gol-lib -I/home/joerg/work/psyclone/external/dl_esm_inf/finite_difference/src time_step_alg_mod_psy.f90
~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ grep call time_step_alg_mod_psy.s 
~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ 

Test case: https://github.com/stfc/PSyclone/tree/1623_add_training/tutorial/training/gocean/2.6-GameOfLife-fuse/solution
(just modify the Makefile to use inline.py instead of fuse_loops.py).
Then manually add:

    private:: count_neighbours_code, compute_born_code, compute_die_code, combine_code

When also using fuse, it gets even worse: fusing the first three loops (the fourth one can't) with inlining results in a runtime of 30 seconds. Adding the above private declarations brings down the runtime to 7.5 seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant