-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Shared memory tests #316
Comments
Sorry for the delay in responding. The support for OpenMP in pFUnit is minimal in the sense that there is no special mechanism to identify the particular thread where a failure occurs nor to set up an OpenMP parallel region for you. (Though it might be straightforward to extend th e framework to do either of these.) Rather, the (current) expectation is that the layer to be tested creates and completes its own parallel region within that layer. The "limited" support is that there are OpenMP directives in the bit of code where exceptions are accumulated so that if failures happen on multiple threads you don't have them all hitting the same memory at the same time. This should be quite sufficient for testing most OpenMP instrumented procedures. But, if one is using OpenMP in a particularly sophisticated manner, one might want the framework to handle more as I mentioned above. E.g., requesting the same test to be run on 1, 2, ... n threads, much as is possible with MPI. If you are interested in helping to evaluate such extensions, I can work to create them. With regard to multiple nodes: There is no special support in pFUnit, and failures such as you describe can be tricky to diagnose. Often the problem is the MPI implementation itself rather than your own code. E.g., we have had situations where the use of 1-sided MPI calls exposed bugs in various flavors of MPI when used in a multi-node context. Sometimes there are environment variables that allow the MPI to work properly in that context. All such issues are unfortunately, outside the scope of pFUnit. From the perspective of your own source code, "legal" use of MPI should not be able to detect the difference in the number of nodes involved except in some very special cases where MPI knows about shared memory segments. Or if you are somehow otherwise creating subcommunicators that are associated with given nodes (e.g., using You can of course still use pFUnit to create various tests around the code and use them for triangulation. But you'll need to run the tests on a multi-node cluster unless/until you can expose a bit that fails on a single node. Happy to speculate further if you want to provide more details. |
I have a project using shared memory both from MPI and from OpenMP. So far my only exposure to pFUnit is from the demos repo and a youtube video, but it seems like a great option to test my program. I have two questions:
npes
option which by my understanding sets the number of jobs to run. It looks like this callsmpirun
on my workstation. Is there any way to simulate several nodes, e.g. 2 nodes running 2 processes each for a total of 4 processes (preferably in a "simulated" way so that I can still use my workstation)? I guess not as it seems a big ask. Is there any workaround? (My program is exhibiting an error if I run it with several nodes, but no error with one, even if the same number of processes -- I would like to use pFUnit to track down the problem)P.S. please redirect if this is not the place to ask this.
The text was updated successfully, but these errors were encountered: