Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_ADDREDUCE needs to have a different receive buffer than send buffer #404

Closed
ekluzek opened this issue Jun 26, 2023 · 2 comments
Closed
Assignees
Labels

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Jun 26, 2023

I found this from working with izumi, which uses the mvapich2 MPI library for all compilers: intel, gnu, and nag.

There's a call in mpi_process.f90 subroutine pass_global_data to MPI_ADDREDUCE and it sends and receive the same
integer variable: maxtdh. It just needs the send and receive buffer to be different.

The cesm.log gives this kind of error:

[0] (shr_orb_params) ------ Computed Orbital Parameters ------
[0] (shr_orb_params) Eccentricity      =   1.670366E-02
[0] (shr_orb_params) Obliquity (deg)   =   2.343977E+01
[0] (shr_orb_params) Obliquity (rad)   =   4.091011E-01
[0] (shr_orb_params) Long of perh(deg) =   1.028955E+02
[0] (shr_orb_params) Long of perh(rad) =   4.937458E+00
[0] (shr_orb_params) Long at v.e.(rad) =  -3.247250E-02
[0] (shr_orb_params) -----------------------------------------
[[email protected]] HYDT_bscd_pbs_wait_for_completion (tools/bootstrap/external/pbs_wait.c:67): tm_poll(obit_event) failed with TM error 17002
[[email protected]] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
@ekluzek ekluzek added the bug label Jun 26, 2023
@ekluzek ekluzek self-assigned this Jun 26, 2023
@ekluzek
Copy link
Collaborator Author

ekluzek commented Jun 26, 2023

This is the fix that is working for me:

index fa4af147..36d0ddac 100644
--- a/route/build/src/mpi_process.f90
+++ b/route/build/src/mpi_process.f90
@@ -2883,6 +2883,8 @@ SUBROUTINE pass_global_data(comm, ierr, message)   ! output: error control
   integer(i4b),                   intent(out) :: ierr
   character(len=strLen),          intent(out) :: message ! error message
 
+  integer(i4b) :: receivemax      ! Receive buffer for MAX over all tasks
+
   ierr=0; message='pass_global_data/'
 
   ! send scalars
@@ -2892,7 +2894,8 @@ SUBROUTINE pass_global_data(comm, ierr, message)   ! output: error control
   call MPI_BCAST(calendar,  strLen,  MPI_CHARACTER,        root, comm, ierr)
   call MPI_BCAST(time_units,strLen,  MPI_CHARACTER,        root, comm, ierr)
 
-  CALL MPI_ALLREDUCE(maxtdh, maxtdh, 1, MPI_INTEGER, MPI_MAX, comm, ierr)
+  CALL MPI_ALLREDUCE(maxtdh, receivemax, 1, MPI_INTEGER, MPI_MAX, comm, ierr)
+  maxtdh = receivemax
 
  END SUBROUTINE pass_global_data
 

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jul 19, 2023

This was done in #391

@ekluzek ekluzek closed this as completed Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant