Gather and Scatter for FabArrays? #4284
-
I have a loop with loop dependencies and dependencies on values in other boxes, hence I am trying to implement a global loop over the whole domain instead of ParallelFor to avoid race conditions. I think what I need is the equivalent of MPI_Gather and MPI_Scatter for FabArray. My initial attempt was something like
When compiling with CUDA, the call to copy in the last for loop gives me an
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
I assume only one process needs to do the work. Right? You could do something like this.
You will need to use managed memory. I am also curious. What kind of data dependencies do you have? (Re: the error. |
Beta Was this translation helpful? Give feedback.
-
Thank you @WeiqunZhang, that points me in the right direction.
Ah yes, I found that in the documentation, sorry I missed it earlier.
My for loop essentially looks like this:
The background is that we construct colored noise in Fourier-space that needs to satisfy complex conjugate symmetries. I am actually trying to eliminate the dependencies since later on we pass only half the domain to |
Beta Was this translation helpful? Give feedback.
I assume only one process needs to do the work. Right? You could do something like this.
You will need to use managed memory.
I am also curious. What kind of data dependencies do you have?
(Re: the error.
copy
totemplate copy<RunOn::Device>
orRunOn::Host
should be able to fix the error. When compiling with CUDA, we force the user to make a choice on where to run Fab level functions, which helps to eliminate a lot of bugs due to synchronization issues.)