You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The master and nodes for Gemini currently operate on a master-oriented model, where nodes individually make calls to the master to exchange information. Here is a non-exhaustive list of information that will be transferred:
Ping/heartbeat
Jobs to execute
CPU/memory and other metrics
We may also take the other approach and choose a node-oriented model where a master initiates communication with nodes. Some points about this that I have in mind:
Nodes will need to run a Flask server for the masters to hit. Would have to exchange this information on node startup. This also calls for additional firewall/routing rules.
Unclear how multiple masters would interact with eachother and divide work
Will need to code retries into calls. For example, if a master makes a call to nodes when a job is submitted, if that call fails, the master will have to schedule a retry.
We can also choose to use a hybrid model, where some information is pushed to master, and some information is pushed to node.
The master and nodes for Gemini currently operate on a master-oriented model, where nodes individually make calls to the master to exchange information. Here is a non-exhaustive list of information that will be transferred:
We may also take the other approach and choose a node-oriented model where a master initiates communication with nodes. Some points about this that I have in mind:
We can also choose to use a hybrid model, where some information is pushed to master, and some information is pushed to node.
/cc @ncatelli
The text was updated successfully, but these errors were encountered: