-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added non-blocking root communicator #1478
base: develop
Are you sure you want to change the base?
added non-blocking root communicator #1478
Conversation
Unit testing and documentation will be added to this PR in follow-up commits. |
src/axom/lumberjack/MPIUtility.cpp
Outdated
MPI_Status mpiStatus; | ||
|
||
// Get size and source of MPI message | ||
int mpiFlag = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here and below. You could define static constexpr integer variables that use names containing true and false to make the code more readable and avoid magic numbers.
src/axom/lumberjack/MPIUtility.hpp
Outdated
* \param [in] comm The MPI Communicator. | ||
***************************************************************************** | ||
*/ | ||
const char* mpiNonBlockingReceiveMessages(MPI_Comm comm, const int tag = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is tag
argument here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MPI communication calls currently use the value associated with LJ_TAG by default (defined in MPIUtility.cpp). The non-blocking receives used by the new communicator in this PR work better when we use another tag in order to not conflict with other communicators. I added logic into the MPI utility functions to check whether the tag was overridden (i.e. non-zero). In those cases, the sends/receives will use the tag value passed in. Otherwise, we revert to the default LJ_TAG for MPI communication. Setting this default in the function declarations prevents us from having to change all the existing calls to these methods by other communicators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Got it. Thanks for the explanation.
src/axom/lumberjack/MPIUtility.hpp
Outdated
***************************************************************************** | ||
*/ | ||
void mpiNonBlockingSendMessages(MPI_Comm comm, | ||
int destinationRank, | ||
const char* packedMessagesToBeSent); | ||
const char* packedMessagesToBeSent, | ||
const int tag = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about why tag
arg is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar response as above
src/axom/lumberjack/MPIUtility.cpp
Outdated
// Receive packed Message | ||
MPI_Recv(charArray, | ||
messageSize, | ||
MPI_CHAR, | ||
mpiStatus.MPI_SOURCE, | ||
mpiTag, | ||
comm, | ||
&mpiStatus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand the MPI API, this is actually a blocking MPI_Recv
call? So this mpiNonBlockingReceiveMessages
function is currently blocking to receive messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. The non-blocking part is the call to MPI_Iprobe, but then the Recv is blocking. My intent here is to be sure that the receive is fully finished before anything else is done, but to not block any further execution if there are no messages to be received (i.e. when mpiFlag is false). I can change the function name to clarify the intent here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify the above point, MPI_Iprobe is used instead of MPI_probe because the former will return with an mpiFlag value regardless of whether messages need to be received, whereas the latter is a blocking call that will only return if there is a message to be received.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, the combination of MPI_Iprobe
+ MPI_Recv
makes sense now!
I had tunnel vision comparing the non-blocking and blocking MPI interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does also make me think about renaming this communicator to something like "NonCollectiveCommunicator" rather than "NonBlockingCommunicator". It's true that it calls these non-blocking functions, but I think the main feature is actually that we don't rely on collective calls to communicate messages to root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea.
|
||
// Get size and source of MPI message | ||
int mpiFlag = true; | ||
MPI_Iprobe(MPI_ANY_SOURCE, tag, comm, &mpiFlag, &mpiStatus); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MPI_Iprobe
is nonblocking here, so is there a chance the mpiFlag
is not set to true
when it is expected to be? Would it be better to have this be a blocking MPI_Probe
? Basing this comment off this stackoverflow post: https://stackoverflow.com/questions/43823458/mpi-iprobe-vs-mpi-probe
Additionally, if using MPI_Iprobe
, should mpiFlag
default be set to false
, so it can be set to true
only by a successful function call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mpiFlag will be set in either context to either true or false, but to your point, it is safer to initialize this as false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stackoverflow example illustrates an interesting but slightly different approach than what I'm intending to do. They are calling MPI_Iprobe in a while loop that does not exit until it returns a flag that is non-zero. In my case, I am checking to see if any messages need to be received only once, and if there are no messages, the function exits by returning nullptr. This intent in the stackoverflow example is to continuously monitor the status, whereas I'm only intending to periodically monitor the status whenever the code path enters into this function. Both could be relevant to the problem I'm trying to solve with this communicator, where the root rank needs to receive information from other ranks that they are aborting. I had a preference toward the latter option (periodically monitoring the status whenever the root rank reaches a point where it enters this code path) because it seemed to me like the more efficient option, even if it comes at a cost of sometimes not receiving the status before the program aborts. But I'm not really sure which option is best for this scenario. I'd be curious to hear your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a preference toward the latter option (periodically monitoring the status whenever the root rank reaches a point where it enters this code path) because it seemed to me like the more efficient option, even if it comes at a cost of sometimes not receiving the status before the program aborts.
I agree, I would expect the latter option to have less overhead, doing a single poll with MPI_Iprobe
instead of spinning on MPI_Iprobe
until status is updated in the former case. Nevertheless, I might not be considering something, so am also curious if others have ideas.
7921ec5
to
926fd00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…or to NonCollectiveRootCommunicator
… and added unit testing
1b09d37
to
0c0da25
Compare
@@ -49,16 +48,50 @@ const char* mpiBlockingReceiveMessages(MPI_Comm comm) | |||
return charArray; | |||
} | |||
|
|||
const char* mpiNonBlockingReceiveMessages(MPI_Comm comm, int tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add tag
to mpiBlockingReceiveMessages
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mpiBlockingReceiveMessages
is not being used by the non-collective root communicator that's added in this MR, so it just uses the default LJ_TAG defined in this cpp file. That said, if we do want to go for consistency with the other function signatures in this file, I can add the tag argument to that function as well. It could potentially be useful in the future. Any thoughts/preferences?
MPI_Comm_size(m_mpiComm, &m_mpiCommSize); | ||
m_ranksLimit = ranksLimit; | ||
m_mpiTag = mpiTag; | ||
++mpiTag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is mpiTag
being incremented here? This could use a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each communicator that sends/receives messages non-collectively needs its own mpiTag in order to not interfere with other communicators. I added this comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this could overflow the integer size and should be checked. Resetting back to the original should be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hope we don't have users who regularly initialize over 2 billion communicators 😆 (although I've seen crazier things). Anyways, I added the numeric_limits check in, and I reset the MPI tag to the initial value if it hits that limit.
src/axom/lumberjack/tests/lumberjack_NonCollectiveRootCommunicator.hpp
Outdated
Show resolved
Hide resolved
Co-authored-by: Chris White <[email protected]>
Co-authored-by: Chris White <[email protected]>
…ator.hpp Co-authored-by: Chris White <[email protected]>
Summary
This PR is a feature which adds a communicator for sending messages from any rank to the root rank non-collectively. This can be useful in cases where an arbitrary rank throws an error that needs to be sent to the root rank to output to a file.