Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

64 simple mpi support #65

Merged
merged 21 commits into from
Feb 12, 2024
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
2d06968
#64 Allow not using a t-mask with non-periodic boundary conditions.
hiker Feb 14, 2022
4500abd
#64 Added support for gathering a distributed field into a Fortran ar…
hiker Feb 24, 2022
28a5647
#64 Updated documentation.
hiker Feb 24, 2022
319129f
#64 Moved MPI into parallel_utils_mod, so that field is independent o…
hiker Feb 25, 2022
e2af613
#64 Reduce buffer size to optimise communication, added more comments.
hiker Feb 25, 2022
0f77d94
#1623 Ignore build directory.
hiker Aug 21, 2023
483a0a3
#64 Use only the .o files for the required library version (not all .…
hiker Aug 23, 2023
129beaa
#64 Added TODO back in.
hiker Jan 7, 2024
0bd0905
#64 Checked allocate return status.
hiker Jan 7, 2024
35f556f
#64 Added tests for new reduction and initialisation functionality.
hiker Jan 8, 2024
33f1abd
#64 Added tests for new reduction and initialisation functionality.
hiker Jan 8, 2024
18b3ae4
Merge branch '64_simple_mpi_support' of github.com:stfc/dl_esm_inf in…
hiker Jan 8, 2024
86c442a
#64 Improved documentation.
hiker Jan 8, 2024
b6a727f
#64 omp-parallelise initialisation loop.
hiker Jan 29, 2024
be87b09
#64 Updated comments.
hiker Jan 29, 2024
f3ce7f6
#64 Added license to grid_mod, and updated years.
hiker Jan 29, 2024
dd9eab1
#64 Abort early if PBC and no tmask is specified. Create dummy tmask …
hiker Jan 31, 2024
a7d7479
#64 Allow periodic boundary condition in non-distributed memory setti…
hiker Jan 31, 2024
79bc758
#64 Updated comments.
hiker Jan 31, 2024
685f19f
#64 Updatd documentation and comments.
hiker Jan 31, 2024
18fed02
#64 Updated documentation conf.py file to remove warnings.
hiker Jan 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_build
17 changes: 10 additions & 7 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Three types of boundary condition are currently supported:
Name Description
=============== =========================================
GO_BC_NONE No boundary conditions are applied.
GO_BC_EXTERNAL Some external forcing is applied. This must be implemented by a kernel. The domain must be defined with a T-point mask (see :ref:`gocean1.0-grid-init`).
GO_BC_EXTERNAL Some external forcing is applied. This must be implemented by a kernel.
arporter marked this conversation as resolved.
Show resolved Hide resolved
GO_BC_PERIODIC Periodic boundary conditions are applied.
=============== =========================================

Expand Down Expand Up @@ -98,11 +98,8 @@ object. This is done via a call to the ``grid_init`` subroutine::
!! wet (1), dry (0) or external (-1).
integer, dimension(m,n), intent(in), optional :: tmask

If no T-mask is supplied then this routine configures the grid
appropriately for an all-wet domain with periodic boundary conditions
in both the *x*- and *y*-dimensions. It should also be noted that
currently only grids with constant resolution in *x* and *y* are
supported by this routine.
It should be noted that currently only grids with constant
arporter marked this conversation as resolved.
Show resolved Hide resolved
resolution in *x* and *y* are supported by this routine.

.. _gocean1.0-fields:

Expand All @@ -128,11 +125,17 @@ constructor::
sshn_v = r2d_field(model_grid, GO_V_POINTS)
sshn_t = r2d_field(model_grid, GO_T_POINTS)

The constructor takes two arguments:
The constructor takes two mandatory and two optional arguments:

1. The grid on which the field exists
2. The type of grid point at which the field is defined
(``GO_U_POINTS``, ``GO_V_POINTS``, ``GO_T_POINTS`` or ``GO_F_POINTS``)
3. ``do_tile``: If the field should be tiled among all threads, or if only
a single field should be allocated (which is not currently
supported by PSyclone).
4. ``init_global_data``: a global 2D Fortran array, which must be
provided on each rank. On each rank the field will be initialised
with the data from the corresponding subdomain.

arporter marked this conversation as resolved.
Show resolved Hide resolved
Note that the grid object must have been fully configured (by a
call to ``grid_init`` for instance) before it is passed into this
Expand Down
2 changes: 1 addition & 1 deletion finite_difference/src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ all: ${API_LIB}

# Create the archive.
${API_LIB}: ${MODULES}
${AR} ${ARFLAGS} ${API_LIB} *.o
${AR} ${ARFLAGS} ${API_LIB} $(MODULES)

install:
${MAKE} -C .. install
Expand Down
99 changes: 95 additions & 4 deletions finite_difference/src/field_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ end subroutine write_to_device_f_interface
procedure, pass :: read_from_device
procedure, pass :: write_to_device
procedure, public :: halo_exchange
procedure, public :: gather_inner_data
end type r2d_field

!> Interface for the copy_field operation. Overloaded to take
Expand Down Expand Up @@ -237,9 +238,10 @@ end subroutine write_to_device_f_interface

!===================================================

function r2d_field_constructor(grid, &
function r2d_field_constructor(grid, &
grid_points, &
do_tile) result(self)
do_tile, &
init_global_data) result(self)
use parallel_mod, only: go_decompose, on_master
!$ use omp_lib, only : omp_get_max_threads
implicit none
Expand All @@ -252,6 +254,9 @@ function r2d_field_constructor(grid, &
!> a single field should be allocated (which is not currently
!> supported by PSyclone)
logical, intent(in), optional :: do_tile
!> An optional global array with which the field in each domain
!> will be initialsed
real(go_wp), dimension(:,:), intent(in), optional :: init_global_data
! Local declarations
type(r2d_field), target :: self
logical :: local_do_tiling = .false.
Expand All @@ -261,7 +266,7 @@ function r2d_field_constructor(grid, &
!> The upper bounds actually used to allocate arrays (as opposed
!! to the limits carried around with the field)
integer :: upper_x_bound, upper_y_bound
integer :: itile, nthreads, ntilex, ntiley
integer :: itile, nthreads, ntilex, ntiley, dx, dy

if (present(do_tile)) then
local_do_tiling = do_tile
Expand Down Expand Up @@ -349,7 +354,7 @@ function r2d_field_constructor(grid, &
end if

! Since we're allocating the arrays to be larger than strictly
! required we explicitly set all elements to -999 in case the code
! required we explicitly set all elements to 0 in case the code
! does access 'out-of-bounds' elements during speculative
! execution. If we're running with OpenMP this also gives
! us the opportunity to do a 'first touch' policy to aid with
Expand All @@ -368,6 +373,17 @@ function r2d_field_constructor(grid, &
else
self%data(:,:) = 0.0_go_wp
end if

if (present(init_global_data)) then
dx = grid%subdomain%global%xstart - self%internal%xstart
dy = grid%subdomain%global%ystart - self%internal%ystart

arporter marked this conversation as resolved.
Show resolved Hide resolved
do jj = grid%subdomain%internal%ystart, grid%subdomain%internal%ystop
do ji = grid%subdomain%internal%xstart, grid%subdomain%internal%xstop
self%data(ji, jj) = init_global_data(ji+dx, jj+dy)
end do
end do
end if
end function r2d_field_constructor

!===================================================
Expand Down Expand Up @@ -506,6 +522,8 @@ subroutine write_to_device(self, startx, starty, nx, ny, blocking)
end subroutine write_to_device


!===================================================

function get_data(self) result(dptr)
!> Getter for the data associated with a field. If the data is on a
! device it ensures that the host copy is up-to-date with that on
Expand Down Expand Up @@ -1287,6 +1305,79 @@ end function array_checksum

!===================================================

!> Collect a (distributed) field into a global array
!! on the master node.
subroutine gather_inner_data(self, global_data)
use parallel_utils_mod, only: get_num_ranks, gather
use parallel_mod, only: on_master

class(r2d_field), intent(in) :: self
real(go_wp), dimension(:,:), &
allocatable, intent(out) :: global_data

real(go_wp), dimension(:), allocatable :: send_buffer, recv_buffer

integer :: dx, dy, ji, jj, i, n, rank, halo_x, halo_y
integer :: x_start, x_stop, y_start, y_stop

allocate(global_data(self%grid%global_nx, self%grid%global_ny))
arporter marked this conversation as resolved.
Show resolved Hide resolved

! No MPI (or single process), just copy the data out
if (get_num_ranks() == 1) then
! Compute size of inner area only
dx = self%internal%xstart - 1
dy = self%internal%ystart - 1
do jj= self%internal%ystart, self%internal%ystop
do ji = self%internal%xstart, self%internal%xstop
global_data(ji-dx,jj-dy) = self%data(ji,jj)
end do
end do
return
endif

! Determine maximum size of data to be sent. We don't need
! to sent the halo, so reduce max_width and max_height by
! 2*halo size.
halo_x = self%internal%xstart-1
halo_y = self%internal%ystart-1
n = (self%grid%decomp%max_width - 2*halo_x) * &
(self%grid%decomp%max_height - 2*halo_y)
allocate(send_buffer(n))
arporter marked this conversation as resolved.
Show resolved Hide resolved
allocate(recv_buffer(n*get_num_ranks()))

! Copy data into 1D send buffer.
i = 0
do jj= self%internal%ystart, self%internal%ystop
do ji = self%internal%xstart, self%internal%xstop
i = i + 1
send_buffer(i) = self%data(ji,jj)
end do
end do

! Collect all send_buffers on the master:
call gather(send_buffer, recv_buffer)

if (on_master()) then
! Copy the data from each process into the global array
do rank=1, get_num_ranks()
x_start = self%grid%decomp%subdomains(rank)%global%xstart
x_stop = self%grid%decomp%subdomains(rank)%global%xstop
y_start = self%grid%decomp%subdomains(rank)%global%ystart
y_stop = self%grid%decomp%subdomains(rank)%global%ystop
i = (rank-1) * n
do jj= y_start, y_stop
do ji = x_start, x_stop
i = i + 1
global_data(ji, jj) = recv_buffer(i)
end do
end do
enddo
endif

end subroutine gather_inner_data

!===================================================

subroutine init_periodic_bc_halos(fld)
implicit none
class(field_type), intent(inout) :: fld
Expand Down
20 changes: 3 additions & 17 deletions finite_difference/src/grid_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -67,9 +67,8 @@ module grid_mod
!! This is the key quantity that determines the region that
!! is actually simulated. However, we also support the
!! specification of a model consisting entirely of wet points
!! with periodic boundary conditions. Since this does not
!! require a T-mask, we do not allocate this array for that
!! case.
!! Since this does not require a T-mask, we do not allocate
arporter marked this conversation as resolved.
Show resolved Hide resolved
!! this array for that case.
integer, allocatable :: tmask(:,:)
!> Pointer to tmask on remote device (if any)
type(c_ptr) :: tmask_device
Expand Down Expand Up @@ -294,8 +293,7 @@ end function grid_constructor
!! @param[in] dxarg Grid spacing in x dimension
!! @param[in] dyarg Grid spacing in y dimension
!! @param[in] tmask Array holding the T-point mask which defines
!! the contents of the local domain. Need not be
!! supplied if domain is all wet and has PBCs.
!! the contents of the local domain.
arporter marked this conversation as resolved.
Show resolved Hide resolved
subroutine grid_init(grid, dxarg, dyarg, tmask)
use decomposition_mod, only: subdomain_type, decomposition_type
use parallel_mod, only: map_comms, get_rank, get_num_ranks, on_master
Expand Down Expand Up @@ -395,18 +393,6 @@ subroutine grid_init(grid, dxarg, dyarg, tmask)
do ji = xstop+2, grid%nx
grid%tmask(ji, :) = grid%tmask(xstop+1, :)
end do
else
! No T-mask supplied. Check that grid has PBCs in both
! x and y dimensions otherwise we won't know what to do.
if( .not. ( (grid%boundary_conditions(1) == GO_BC_PERIODIC) .and. &
(grid%boundary_conditions(2) == GO_BC_PERIODIC) ) )then
call gocean_stop('grid_init: ERROR: No T-mask supplied and '// &
'grid does not have periodic boundary conditions!')
end if
!> TODO add support for PBCs in parallel
arporter marked this conversation as resolved.
Show resolved Hide resolved
if(get_num_ranks() > 1)then
call gocean_stop('grid_init: PBCs not yet implemented with MPI')
end if
end if ! T-mask supplied

! For a regular, orthogonal mesh the spatial resolution is constant
Expand Down
20 changes: 18 additions & 2 deletions finite_difference/src/parallel/parallel_utils_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ module parallel_utils_mod

public parallel_init, parallel_finalise, parallel_abort, get_max_tag
public get_rank, get_num_ranks, post_receive, post_send, global_sum
public msg_wait, msg_wait_all
public msg_wait, msg_wait_all, gather
public MSG_UNDEFINED, MSG_REQUEST_NULL, DIST_MEM_ENABLED

contains
Expand Down Expand Up @@ -103,7 +103,6 @@ end subroutine parallel_finalise
!! @param[in] msg Message to print - reason we're stopping
subroutine parallel_abort(msg)
use iso_fortran_env, only : error_unit ! access computing environment
implicit none
character(len=*), intent(in) :: msg

write(error_unit, *) msg
Expand Down Expand Up @@ -238,4 +237,21 @@ subroutine global_sum(var)

end subroutine global_sum

!================================================

subroutine gather(send_buffer, recv_buffer)
!> Gathers the data in the send buffer from all
!> processes into the receive buffer.
real(go_wp), dimension(:) :: send_buffer, recv_buffer

integer :: ierr
integer :: n

n = size(send_buffer)
call MPI_Gather(send_buffer, n, MPI_DOUBLE_PRECISION, &
recv_buffer, n, MPI_DOUBLE_PRECISION, &
0, MPI_COMM_WORLD, ierr)

end subroutine gather

end module parallel_utils_mod
14 changes: 13 additions & 1 deletion finite_difference/src/parallel/parallel_utils_stub_mod.f90
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ module parallel_utils_mod
logical, parameter :: DIST_MEM_ENABLED = .False.

public parallel_init, parallel_finalise, parallel_abort
public get_rank, get_num_ranks, get_max_tag
public get_rank, get_num_ranks, get_max_tag, gather
public msg_wait, msg_wait_all, post_receive, post_send, global_sum
public MSG_UNDEFINED, MSG_REQUEST_NULL, DIST_MEM_ENABLED

Expand Down Expand Up @@ -149,4 +149,16 @@ subroutine global_sum(var)
real(go_wp), intent(inout) :: var
end subroutine global_sum

!================================================

subroutine gather(send_buffer, recv_buffer)
!> Gathers the data in the send buffer from all
!> processes into the receive buffer.
real(go_wp), dimension(:) :: send_buffer, recv_buffer

recv_buffer = send_buffer

end subroutine gather


end module parallel_utils_mod