-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Improvement-16850][API] Broadcast to cluster when worker group changed #16860
[Improvement-16850][API] Broadcast to cluster when worker group changed #16860
Conversation
...r-api/src/main/java/org/apache/dolphinscheduler/api/service/impl/WorkerGroupServiceImpl.java
Fixed
Show fixed
Hide fixed
84b34f4
to
729e8fd
Compare
729e8fd
to
5ca2308
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
.withHost(master.getHost() + ":" + master.getPort()) | ||
.refreshWorkerGroup(); | ||
} catch (Exception e) { | ||
log.error("Broadcast to master: {} that worker group changed failed", master, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are occasional network interruptions between the API server and master, should retries be implemented during broadcasting? Otherwise, master will take 5 minutes to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are occasional network interruptions between the API server and master, should retries be implemented during broadcasting? Otherwise, master will take 5 minutes to update.
I think if users needs a faster update just adjust 5 minutes to a smaller value, which allows user-defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max delay is 5 minutes, user can reduce this by modified the value at master's config. I don't want to add retry here, since we use long-connection, in most cases, the network should work well, otherwise we may need to add unify retry at rpc framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add unify retry at rpc framework
LGTM. Maybe we can create a new issue about this? @ruanwenjun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I create #16870 to track this.
5ca2308
to
c8e177e
Compare
Quality Gate failedFailed conditions |
Purpose of the pull request
close #16850
Brief change log
Verify this pull request
This pull request is code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(or)
Pull Request Notice
Pull Request Notice
If your pull request contain incompatible change, you should also add it to
docs/docs/en/guide/upgrede/incompatible.md