-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drenv: make accept_cluster idempotent #1106
drenv: make accept_cluster idempotent #1106
Conversation
When a managed cluster is already added to OCM, we should move on. Signed-off-by: Raghavendra Talur <[email protected]>
@nirs I know you had a comment on this commit as part of a different PR. I want to restart the conversation here and I have provided the link and a sample error that I face without this PR. Let me know what you think. |
If this is really broken, running ocm-cluster twice should fail, but it did not fail for me when I worked on this. Can you reproduce this with the ocm env?
I have seen this error in the past:
Only in external k8s clusters, where we had some issue and the cluster was broken. |
When running ocm-cluster/start again after a env was started, it works:
Run many times, cannot reproduce the issue. Tested with:
|
I "think" I hit this in the case where I started a cluster, stopped it and started it again. At this point it fails to start with the following logs (tried to (re)start twice, hence 2 log outputs below). The trick could be to stop and start the env to see if the problem is reproducible:
|
Used the changes from this PR to move forward, worked to resolve the issue (not a review, just a "works" comment) |
I finally reproduced this when trying to start a stopped environment. When running the ocm-cluster/start script manually we see:
This should be fixed in clusteradm but we can do a termporary fix in drenv for now. |
|
||
managed_clusters = clusteradm.get("clusters", output="yaml", context=hub) | ||
|
||
managed_clusters = managed_clusters.split("---") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not parse all documents using yaml.safe_load_all()
?
@@ -6,6 +6,7 @@ | |||
import json | |||
import os | |||
import sys | |||
import yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yaml is 3rd party library, so we separate it from standard library imports. See how it is imported in other files.
|
||
if metadata_name == cluster and hub_accepts_client: | ||
print(f"Cluster '{cluster}' already accepted") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a temporary hack until this is fixed in clusteradm, it will be better to put this code in a helper like is_accepted(), something like:
# TODO: Remove when {link to clusteradm issue} is fixed.
if is_accepted(cluster, hub):
return
I reported the clusteadm issue here: open-cluster-management-io/clusteradm#395 |
Mike Ng suggested a simpler way to fix this issue, see #1148 |
This is already fixed in #1148, we can close this PR. |
There is a bug in the clusteradm accept command where it doesn't work for managed clusters that are already approved. It errors out like shown below
Error: no CSR to approve for cluster c2
See open-cluster-management-io/clusteradm#56
In this PR, we check if the managed cluster is already approved by the hub and if so, we don't run the accept command.