Replies: 2 comments 10 replies
-
I'm going to suggest that @anagainaru chime in because she has been working with some of these sorts of issues, but in the meantime, I can address the SST questions. Yes, in general Open is a collective operation across all the engines. ADIOS' focus is on optimizing collective IO and in some sense the sweet spot is supporting applications like parallel timestepped simulations where all ranks dump data at the same time and in the aggregate it forms a global data structure. So while BeginStep might or might not be collective, EndStep pretty much always is. That said, @anagainaru has been using SST in more of a job-distribution sort of way by having multiple readers connect to a single producer and using SST's "OnDemand" value for the StepDistributionMode parameter. Hopefully she can fill you in on the details. |
Beta Was this translation helpful? Give feedback.
-
Yes, @eisenhauer is right, I have been looking at this exact scenario. Typically, ADIOS was developed for simulations that generate data in steps. When you connect a producer with a consumer, each time a producer generates a step (which means all the puts between a I added a simple example for using on demand using SST here: https://github.com/anagainaru/ADIOS2-addons/tree/main/DataStreaming/ADIOS_ondemand The
To run the code:
Let me know if this makes sense. I can take a look at code snippets if you'd like me to figure out why your testcase looks sequential. |
Beta Was this translation helpful? Give feedback.
-
Hi, I am a developer in the Melissa team and we are currently investigating adios2 as a way to improve performances of our software.
In order to give you a bit of context, Melissa is a framework that coordinates ensemble runs to perform large scale and online sensitivity analysis or deep surrogate training. Our usual workflow consists of multiple data producers (any MPI code instrumented with our API) which dynamically connect to a parallel Python server and send this server data as they are produced (either in an MxN or a round robin fashion). Our point is to avoid files by directly communicating data through our API based on ZeroMQ sockets.
The main inconvenient in our current configuration is that ZeroMQ does not enable to take full advantage of the data transport layer on supercomputers which is why adios2 seemed like a good alternative. In substituting our current API with adios2, we would still like to handle client connection and manage message reception somehow asynchronously on the server side.
So far we have managed to build a preliminary working version with adios2. However, it is pretty sequential in the sense that as a data producer is detected as running by the server rank 0, its id is broadcasted to all ranks and the corresponding engine is opened on all ranks at the same time. Regarding message extraction, we are basically looping over all active engines trying to extract data if they are holding any.
From the adios2 documentation, it seems that engine (SST in our case) Open/Close are collective functions (not sure about BeginStep/EndStep). It also says that adios2 should be considered as not thread-safe. Indeed, as soon as we are trying to optimize our approach with threads to handle engine opening for instance, the server runs into deadlocks (same with message extraction).
Given our preliminary tests, we are hence wondering if adios2 is actually well suited for our purpose or not. If so, what do you think would be the optimal way to handle client connection and message reception?
Best regards,
Marc
Beta Was this translation helpful? Give feedback.
All reactions