Asynchronous Communication and Computation

This library wraps the blocking communication procudure amrex::FillBoundary in an internal MPI_Waitsome loop that allows asynchronous dispatch with concepts found in libunifex. Consider a blocking call to FillBoundary as in:

// Block the current thread until all pending requests are done
void EulerAmrCore::Advance(double dt) {
  // Blocking call to FillBoundary.
  // This waits for all MPI_Request to be done.
  // states is a MultiFab member variable
  states.FillBoundary();
  // Parallel for loop over all FABs in the MultiFab
#ifdef AMREX_USE_OMP
#pragma omp parallel if (Gpu::notInLaunchRegion())
#endif
  for (MFIter mfi(states); mfi.isValid(); ++mfi) {
    const Box box = mfi.growntilebox();
    auto advance_dir = [&](Direction dir) {
      const int dir_v = int(dir);
      auto csarray = states.const_array(mfi);
      auto farray = fluxes[dir_v].array(mfi);
      Box faces = shrink(convert(box, dim_vec(dir)), dir_v, 1);
      ComputeNumericFluxes(faces, farray, csarray, dir);
      auto cfarray = fluxes[dir_v].const_array(K);
      auto sarray = states.array(K);
      UpdateConservatively(inner_box, sarray, farray, dt_over_dx[dir_v], dir);
    };
    // first order accurate operator splitting
    advance_dir(Direction::x);
    advance_dir(Direction::y);
    advance_dir(Direction::z);
  }
  // implicit join of all OpenMP threads here
}

This library provides thin wrappers around this FillBoundary call and enables the usage of the Sender/Receiver model that is developed in libunifex.

The above example could read instead

void EulerAmrCore::AsyncAdvance(double dt) {
  auto advance = unifex::bulk_join(unifex::bulk_transform(
    // tbb_scheduler to schedule work items and comm_scheduler to schedule MPI_WaitAny/All threads
    // The feedback funcction (Box, int)  will be called for every box thas it ready,
    // i.e. all its ghost cells are filled.
    FillBoundary(tbb_scheduler, comm_scheduler, states), 
    [this, dt_over_dx](const Box& box, int K) {
      auto advance_dir = [&](Direction dir) {
        auto csarray = states.const_array(K);
        auto farray = fluxes[dir_v].array(K);
        Box faces = shrink(convert(box, unit(dir)), 1, unit(dir));
        ComputeNumericFluxes(faces, farray, csarray, dir);
        auto cfarray = fluxes[dir_v].const_array(K);
        auto sarray = states.array(K);
        UpdateConservatively(inner_box, sarray, farray, dt_over_dx[dir_v], dir);
      };
      // first order accurate operator splitting
      advance_dir(Direction::x);
      advance_dir(Direction::y);
      advance_dir(Direction::z);
    }, unifex::par_unseq));
  // Explicitly wait here until the above is done for all boxes
  unifex::sync_wait(std::move(advance));
}

The intent is to try out a structural parallel programming model in classical HPC applications such as in a finite volume flow solver on structured grids.

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
examples		examples
include/ampi		include/ampi
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Asynchronous Communication and Computation

About

Releases

Packages

Languages

License

maikel/async_mpi

Folders and files

Latest commit

History

Repository files navigation

Asynchronous Communication and Computation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages