Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dashboard] Remove DataSource.agents with internal kv get. #50025

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Jan 23, 2025

Many Heads monitor all nodes updates and keep a grpc connection to each agent. This is not needed because most of the use cases are only talking to single agent to read logs etc.

Remove DataSource.agents and remove all practice of "connect to all agents". Instead, when a need comes, make point request to internal kv get to get the corresponding agent grpc address and connect to them.

NodeHead used to add agents to DataSource.agents on new node, and remove agents from DataSource on node removal. Now, it no longer work for the node addition, but it deletes related internal kv keys on node removal.

One notable exception is JobHead which wants to randomly choose from all agents. Use internal_kv_keys to list all known agents.

@rynewang rynewang added the go add ONLY when ready to merge, run all tests label Jan 23, 2025
Signed-off-by: Ruiyang Wang <[email protected]>
@@ -191,76 +198,135 @@ async def _pick_random_agent(self) -> Optional[JobAgentSubmissionClient]:
"""
# NOTE: Following call will block until there's at least 1 agent info
# being populated from GCS
agent_infos = await self._fetch_agent_infos()
agent_infos = await self._fetch_all_agent_infos()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be expensive for a large cluster. I feel we can let dashboard gcs client subscribes to nodes table so we have a local cache of nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. though this is only used in "random agents for job submission" anyway. is this what we are gonna support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants