Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drenv: make accept_cluster idempotent #1106

Conversation

raghavendra-talur
Copy link
Member

There is a bug in the clusteradm accept command where it doesn't work for managed clusters that are already approved. It errors out like shown below

Error: no CSR to approve for cluster c2
See open-cluster-management-io/clusteradm#56

In this PR, we check if the managed cluster is already approved by the hub and if so, we don't run the accept command.

When a managed cluster is already added to OCM, we should move on.

Signed-off-by: Raghavendra Talur <[email protected]>
@raghavendra-talur
Copy link
Member Author

@nirs I know you had a comment on this commit as part of a different PR. I want to restart the conversation here and I have provided the link and a sample error that I face without this PR.

Let me know what you think.

@nirs
Copy link
Member

nirs commented Oct 24, 2023

If this is really broken, running ocm-cluster twice should fail, but it did not fail for me when I worked on this.

Can you reproduce this with the ocm env?

drenv start envs/ocm.yaml
for i in $(seq 5); do
    addons/ocm-cluster/start dr1 hub
done

I have seen this error in the past:

Error: no CSR to approve for cluster c2

Only in external k8s clusters, where we had some issue and the cluster was broken.

@nirs
Copy link
Member

nirs commented Oct 25, 2023

When running ocm-cluster/start again after a env was started, it works:

$ addons/ocm-cluster/start dr1 hub
...
Accepting cluster
CSR dr1-x95wh already approved
no CSR to approve for cluster dr1
hubAcceptsClient already set for managed cluster dr1

 Your managed cluster dr1 has joined the Hub successfully. Visit https://open-cluster-management.io/scenarios or https://github.com/open-cluster-management-io/OCM/tree/main/solutions for next steps.
...

Run many times, cannot reproduce the issue.

Tested with:

$ clusteradm version
client		version	:v0.6.0
server release	version	:v1.27.4
default bundle	version	:0.11.0

$ minikube version
minikube version: v1.31.2
commit: fd7ecd9c4599bef9f04c0986c4a0187f98a4396e

$ kubectl version
Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.27.4

@ShyamsundarR
Copy link
Member

I "think" I hit this in the case where I started a cluster, stopped it and started it again. At this point it fails to start with the following logs (tried to (re)start twice, hence 2 log outputs below). The trick could be to stop and start the env to see if the problem is reproducible:

(ramen) [e2e@localhost test]$ drenv start envs/regional-dr.yaml 
2023-10-26 20:27:05,923 INFO    [rdr] Starting environment
2023-10-26 20:27:06,179 INFO    [dr1] Starting minikube cluster
2023-10-26 20:27:07,139 INFO    [dr2] Starting minikube cluster
2023-10-26 20:27:11,275 INFO    [hub] Starting minikube cluster
2023-10-26 20:28:23,194 INFO    [dr1] Cluster started in 77.01 seconds
2023-10-26 20:28:23,194 INFO    [dr1] Waiting until all deployments are available
2023-10-26 20:28:54,206 INFO    [dr2] Cluster started in 107.07 seconds
2023-10-26 20:28:54,206 INFO    [dr2] Waiting until all deployments are available
2023-10-26 20:29:05,771 INFO    [hub] Cluster started in 114.50 seconds
2023-10-26 20:29:05,771 INFO    [hub] Waiting until all deployments are available
2023-10-26 20:29:38,243 INFO    [hub] Deployments are available in 32.47 seconds
2023-10-26 20:29:38,245 INFO    [hub/0] Running addons/ocm-hub/start
2023-10-26 20:29:38,245 INFO    [hub/1] Running addons/submariner/start
2023-10-26 20:29:42,102 INFO    [hub/0] addons/ocm-hub/start completed in 3.86 seconds
2023-10-26 20:29:42,102 INFO    [hub/0] Running addons/ocm-controller/start
2023-10-26 20:30:32,920 INFO    [hub/0] addons/ocm-controller/start completed in 50.82 seconds
2023-10-26 20:30:32,922 INFO    [hub/0] Running addons/cert-manager/start
2023-10-26 20:30:38,255 INFO    [hub/0] addons/cert-manager/start completed in 5.33 seconds
2023-10-26 20:30:38,255 INFO    [hub/0] Running addons/olm/start
2023-10-26 20:30:45,664 INFO    [hub/0] addons/olm/start completed in 7.41 seconds
2023-10-26 20:31:09,973 INFO    [hub/1] addons/submariner/start completed in 91.73 seconds
2023-10-26 20:31:09,973 INFO    [hub/1] Running addons/submariner/test
2023-10-26 20:31:34,167 INFO    [dr2] Deployments are available in 159.96 seconds
2023-10-26 20:31:34,167 INFO    [dr2/0] Running addons/cert-manager/start
2023-10-26 20:31:34,168 INFO    [dr2/1] Running addons/ocm-cluster/start
2023-10-26 20:31:34,170 INFO    [dr2/2] Running addons/csi-addons/start
2023-10-26 20:31:38,516 INFO    [dr2/2] addons/csi-addons/start completed in 4.35 seconds
2023-10-26 20:31:38,516 INFO    [dr2/2] Running addons/olm/start
2023-10-26 20:31:39,736 INFO    [dr2/0] addons/cert-manager/start completed in 5.57 seconds
2023-10-26 20:31:39,736 INFO    [dr2/0] Running addons/rook-operator/start
2023-10-26 20:31:47,276 INFO    [dr2/2] addons/olm/start completed in 8.76 seconds
2023-10-26 20:31:47,276 INFO    [dr2/2] Running addons/minio/start
2023-10-26 20:31:50,351 INFO    [dr2/2] addons/minio/start completed in 3.07 seconds
2023-10-26 20:31:50,351 INFO    [dr2/2] Running addons/velero/start
2023-10-26 20:31:56,666 INFO    [dr2/2] addons/velero/start completed in 6.31 seconds
2023-10-26 20:31:56,667 INFO    [dr2/2] Running addons/velero/test
2023-10-26 20:32:14,483 INFO    [hub/1] addons/submariner/test completed in 64.51 seconds
2023-10-26 20:32:24,885 INFO    [dr1] Deployments are available in 241.69 seconds
2023-10-26 20:32:24,887 INFO    [dr1/0] Running addons/cert-manager/start
2023-10-26 20:32:24,889 INFO    [dr1/1] Running addons/ocm-cluster/start
2023-10-26 20:32:24,890 INFO    [dr1/2] Running addons/csi-addons/start
2023-10-26 20:32:27,487 INFO    [dr1/2] addons/csi-addons/start completed in 2.60 seconds
2023-10-26 20:32:27,487 INFO    [dr1/2] Running addons/olm/start
2023-10-26 20:32:30,048 INFO    [dr1/0] addons/cert-manager/start completed in 5.16 seconds
2023-10-26 20:32:30,048 INFO    [dr1/0] Running addons/rook-operator/start
2023-10-26 20:32:36,980 INFO    [dr1/2] addons/olm/start completed in 9.49 seconds
2023-10-26 20:32:36,980 INFO    [dr1/2] Running addons/minio/start
2023-10-26 20:32:38,229 INFO    [dr1/2] addons/minio/start completed in 1.25 seconds
2023-10-26 20:32:38,229 INFO    [dr1/2] Running addons/velero/start
2023-10-26 20:32:44,536 INFO    [dr1/2] addons/velero/start completed in 6.31 seconds
2023-10-26 20:32:44,536 INFO    [dr1/2] Running addons/velero/test
2023-10-26 20:32:55,782 INFO    [dr2/2] addons/velero/test completed in 59.11 seconds
2023-10-26 20:33:19,329 INFO    [dr2/0] addons/rook-operator/start completed in 99.59 seconds
2023-10-26 20:33:19,330 INFO    [dr2/0] Running addons/rook-cluster/start
2023-10-26 20:33:25,160 INFO    [dr1/2] addons/velero/test completed in 40.62 seconds
2023-10-26 20:33:38,886 INFO    [dr1/0] addons/rook-operator/start completed in 68.84 seconds
2023-10-26 20:33:38,886 INFO    [dr1/0] Running addons/rook-cluster/start
2023-10-26 20:35:27,528 INFO    [dr2/0] addons/rook-cluster/start completed in 128.20 seconds
2023-10-26 20:35:27,528 INFO    [dr2/0] Running addons/rook-pool/start
2023-10-26 20:35:28,726 INFO    [dr2/0] addons/rook-pool/start completed in 1.20 seconds
2023-10-26 20:35:28,726 INFO    [dr2/0] Running addons/rook-toolbox/start
2023-10-26 20:35:33,502 INFO    [dr2/0] addons/rook-toolbox/start completed in 4.78 seconds
2023-10-26 20:35:48,001 INFO    [dr1/0] addons/rook-cluster/start completed in 129.11 seconds
2023-10-26 20:35:48,001 INFO    [dr1/0] Running addons/rook-pool/start
2023-10-26 20:35:49,224 INFO    [dr1/0] addons/rook-pool/start completed in 1.22 seconds
2023-10-26 20:35:49,225 INFO    [dr1/0] Running addons/rook-toolbox/start
2023-10-26 20:35:52,975 INFO    [dr1/0] addons/rook-toolbox/start completed in 3.75 seconds
2023-10-26 20:36:35,807 ERROR   [dr2/1] Cluster failed
Traceback (most recent call last):
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 128, in execute
    f.result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/lib64/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 284, in run_worker
    run_addon(addon, worker["name"], hooks=hooks, allow_failure=allow_failure)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 300, in run_addon
    run_hook(hook, addon["args"], name, allow_failure=allow_failure)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 307, in run_hook
    run(hook, *args, name=name)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 322, in run
    for line in commands.watch(*cmd):
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/commands.py", line 148, in watch
    raise Error(args, error, exitcode=p.returncode)
drenv.commands.Error: Command failed:
   command: ('addons/ocm-cluster/start', 'dr2', 'hub')
   exitcode: 1
   error:
      Traceback (most recent call last):
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 130, in <module>
          deploy(cluster_name, hub_name)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 46, in deploy
          accept_cluster(cluster, hub)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 100, in accept_cluster
          clusteradm.accept([cluster], wait=True, context=hub)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/clusteradm.py", line 73, in accept
          _watch(*cmd, log=log)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/clusteradm.py", line 95, in _watch
          for line in commands.watch(*cmd):
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/commands.py", line 148, in watch
          raise Error(args, error, exitcode=p.returncode)
      drenv.commands.Error: Command failed:
         command: ('clusteradm', 'accept', '--clusters', 'dr2', '--wait', '--context', 'hub')
         exitcode: 1
         error:
            Error: timed out waiting for the condition

2023-10-26 20:37:28,124 ERROR   [dr1/1] Cluster failed
Traceback (most recent call last):
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 128, in execute
    f.result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/lib64/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 284, in run_worker
    run_addon(addon, worker["name"], hooks=hooks, allow_failure=allow_failure)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 300, in run_addon
    run_hook(hook, addon["args"], name, allow_failure=allow_failure)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 307, in run_hook
    run(hook, *args, name=name)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 322, in run
    for line in commands.watch(*cmd):
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/commands.py", line 148, in watch
    raise Error(args, error, exitcode=p.returncode)
drenv.commands.Error: Command failed:
   command: ('addons/ocm-cluster/start', 'dr1', 'hub')
   exitcode: 1
   error:
      Traceback (most recent call last):
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 130, in <module>
          deploy(cluster_name, hub_name)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 46, in deploy
          accept_cluster(cluster, hub)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 100, in accept_cluster
          clusteradm.accept([cluster], wait=True, context=hub)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/clusteradm.py", line 73, in accept
          _watch(*cmd, log=log)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/clusteradm.py", line 95, in _watch
          for line in commands.watch(*cmd):
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/commands.py", line 148, in watch
          raise Error(args, error, exitcode=p.returncode)
      drenv.commands.Error: Command failed:
         command: ('clusteradm', 'accept', '--clusters', 'dr1', '--wait', '--context', 'hub')
         exitcode: 1
         error:
            Error: timed out waiting for the condition

(ramen) [e2e@localhost test]$ 
(ramen) [e2e@localhost test]$ drenv start envs/regional-dr.yaml 
2023-10-26 20:39:08,319 INFO    [rdr] Starting environment
2023-10-26 20:39:10,045 INFO    [dr1] Starting minikube cluster
2023-10-26 20:39:10,318 INFO    [dr2] Starting minikube cluster
2023-10-26 20:39:11,353 INFO    [hub] Starting minikube cluster
2023-10-26 20:39:22,798 INFO    [hub] Cluster started in 11.44 seconds
2023-10-26 20:39:22,798 INFO    [hub] Waiting until all deployments are available
2023-10-26 20:39:32,147 INFO    [dr2] Cluster started in 21.83 seconds
2023-10-26 20:39:32,147 INFO    [dr2] Waiting until all deployments are available
2023-10-26 20:39:32,234 INFO    [dr1] Cluster started in 22.19 seconds
2023-10-26 20:39:32,234 INFO    [dr1] Waiting until all deployments are available
2023-10-26 20:39:55,184 INFO    [hub] Deployments are available in 32.39 seconds
2023-10-26 20:39:55,185 INFO    [hub/0] Running addons/ocm-hub/start
2023-10-26 20:39:55,186 INFO    [hub/1] Running addons/submariner/start
2023-10-26 20:39:58,491 INFO    [hub/0] addons/ocm-hub/start completed in 3.31 seconds
2023-10-26 20:39:58,491 INFO    [hub/0] Running addons/ocm-controller/start
2023-10-26 20:40:06,983 INFO    [dr2] Deployments are available in 34.84 seconds
2023-10-26 20:40:06,983 INFO    [dr2/0] Running addons/cert-manager/start
2023-10-26 20:40:06,984 INFO    [dr2/1] Running addons/ocm-cluster/start
2023-10-26 20:40:06,985 INFO    [dr2/2] Running addons/csi-addons/start
2023-10-26 20:40:07,312 INFO    [dr1] Deployments are available in 35.08 seconds
2023-10-26 20:40:07,313 INFO    [dr1/0] Running addons/cert-manager/start
2023-10-26 20:40:07,316 INFO    [dr1/2] Running addons/csi-addons/start
2023-10-26 20:40:07,313 INFO    [dr1/1] Running addons/ocm-cluster/start
2023-10-26 20:40:09,601 INFO    [dr1/2] addons/csi-addons/start completed in 2.29 seconds
2023-10-26 20:40:09,601 INFO    [dr1/2] Running addons/olm/start
2023-10-26 20:40:09,816 INFO    [dr2/2] addons/csi-addons/start completed in 2.83 seconds
2023-10-26 20:40:09,816 INFO    [dr2/2] Running addons/olm/start
2023-10-26 20:40:10,891 INFO    [dr2/0] addons/cert-manager/start completed in 3.91 seconds
2023-10-26 20:40:10,891 INFO    [dr2/0] Running addons/rook-operator/start
2023-10-26 20:40:11,095 INFO    [dr1/0] addons/cert-manager/start completed in 3.78 seconds
2023-10-26 20:40:11,096 INFO    [dr1/0] Running addons/rook-operator/start
2023-10-26 20:40:17,256 INFO    [dr1/0] addons/rook-operator/start completed in 6.16 seconds
2023-10-26 20:40:17,256 INFO    [dr1/0] Running addons/rook-cluster/start
2023-10-26 20:40:17,275 INFO    [dr2/0] addons/rook-operator/start completed in 6.38 seconds
2023-10-26 20:40:17,275 INFO    [dr2/0] Running addons/rook-cluster/start
2023-10-26 20:40:18,984 INFO    [dr2/0] addons/rook-cluster/start completed in 1.71 seconds
2023-10-26 20:40:18,984 INFO    [dr2/0] Running addons/rook-pool/start
2023-10-26 20:40:19,004 INFO    [dr1/0] addons/rook-cluster/start completed in 1.75 seconds
2023-10-26 20:40:19,004 INFO    [dr1/0] Running addons/rook-pool/start
2023-10-26 20:40:19,020 INFO    [dr1/2] addons/olm/start completed in 9.42 seconds
2023-10-26 20:40:19,020 INFO    [dr1/2] Running addons/minio/start
2023-10-26 20:40:19,148 INFO    [dr2/2] addons/olm/start completed in 9.33 seconds
2023-10-26 20:40:19,148 INFO    [dr2/2] Running addons/minio/start
2023-10-26 20:40:20,327 INFO    [dr2/2] addons/minio/start completed in 1.18 seconds
2023-10-26 20:40:20,328 INFO    [dr2/2] Running addons/velero/start
2023-10-26 20:40:20,338 INFO    [dr1/2] addons/minio/start completed in 1.32 seconds
2023-10-26 20:40:20,338 INFO    [dr1/2] Running addons/velero/start
2023-10-26 20:40:20,502 INFO    [dr1/0] addons/rook-pool/start completed in 1.50 seconds
2023-10-26 20:40:20,502 INFO    [dr1/0] Running addons/rook-toolbox/start
2023-10-26 20:40:20,519 INFO    [dr2/0] addons/rook-pool/start completed in 1.53 seconds
2023-10-26 20:40:20,519 INFO    [dr2/0] Running addons/rook-toolbox/start
2023-10-26 20:40:21,392 INFO    [dr2/0] addons/rook-toolbox/start completed in 0.87 seconds
2023-10-26 20:40:21,432 INFO    [dr1/0] addons/rook-toolbox/start completed in 0.93 seconds
2023-10-26 20:40:26,572 INFO    [dr1/2] addons/velero/start completed in 6.23 seconds
2023-10-26 20:40:26,572 INFO    [dr1/2] Running addons/velero/test
2023-10-26 20:40:26,648 INFO    [dr2/2] addons/velero/start completed in 6.32 seconds
2023-10-26 20:40:26,648 INFO    [dr2/2] Running addons/velero/test
2023-10-26 20:40:32,214 INFO    [hub/0] addons/ocm-controller/start completed in 33.72 seconds
2023-10-26 20:40:32,214 INFO    [hub/0] Running addons/cert-manager/start
2023-10-26 20:40:35,311 INFO    [hub/0] addons/cert-manager/start completed in 3.10 seconds
2023-10-26 20:40:35,311 INFO    [hub/0] Running addons/olm/start
2023-10-26 20:40:42,698 INFO    [hub/0] addons/olm/start completed in 7.39 seconds
2023-10-26 20:41:02,706 INFO    [dr2/2] addons/velero/test completed in 36.06 seconds
2023-10-26 20:41:04,213 INFO    [dr1/2] addons/velero/test completed in 37.64 seconds
2023-10-26 20:41:10,391 INFO    [hub/1] addons/submariner/start completed in 75.21 seconds
2023-10-26 20:41:10,392 INFO    [hub/1] Running addons/submariner/test
2023-10-26 20:42:26,112 INFO    [hub/1] addons/submariner/test completed in 75.72 seconds
2023-10-26 20:45:09,118 ERROR   [dr2/1] Cluster failed
Traceback (most recent call last):
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 128, in execute
    f.result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/lib64/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 284, in run_worker
    run_addon(addon, worker["name"], hooks=hooks, allow_failure=allow_failure)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 300, in run_addon
    run_hook(hook, addon["args"], name, allow_failure=allow_failure)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 307, in run_hook
    run(hook, *args, name=name)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 322, in run
    for line in commands.watch(*cmd):
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/commands.py", line 148, in watch
    raise Error(args, error, exitcode=p.returncode)
drenv.commands.Error: Command failed:
   command: ('addons/ocm-cluster/start', 'dr2', 'hub')
   exitcode: 1
   error:
      Traceback (most recent call last):
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 130, in <module>
          deploy(cluster_name, hub_name)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 46, in deploy
          accept_cluster(cluster, hub)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 100, in accept_cluster
          clusteradm.accept([cluster], wait=True, context=hub)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/clusteradm.py", line 73, in accept
          _watch(*cmd, log=log)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/clusteradm.py", line 95, in _watch
          for line in commands.watch(*cmd):
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/commands.py", line 148, in watch
          raise Error(args, error, exitcode=p.returncode)
      drenv.commands.Error: Command failed:
         command: ('clusteradm', 'accept', '--clusters', 'dr2', '--wait', '--context', 'hub')
         exitcode: 1
         error:
            Error: timed out waiting for the condition

2023-10-26 20:45:09,505 ERROR   [dr1/1] Cluster failed
Traceback (most recent call last):
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 128, in execute
    f.result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib64/python3.10/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
  File "/usr/lib64/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 284, in run_worker
    run_addon(addon, worker["name"], hooks=hooks, allow_failure=allow_failure)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 300, in run_addon
    run_hook(hook, addon["args"], name, allow_failure=allow_failure)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 307, in run_hook
    run(hook, *args, name=name)
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/__main__.py", line 322, in run
    for line in commands.watch(*cmd):
  File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/commands.py", line 148, in watch
    raise Error(args, error, exitcode=p.returncode)
drenv.commands.Error: Command failed:
   command: ('addons/ocm-cluster/start', 'dr1', 'hub')
   exitcode: 1
   error:
      Traceback (most recent call last):
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 130, in <module>
          deploy(cluster_name, hub_name)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 46, in deploy
          accept_cluster(cluster, hub)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/addons/ocm-cluster/start", line 100, in accept_cluster
          clusteradm.accept([cluster], wait=True, context=hub)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/clusteradm.py", line 73, in accept
          _watch(*cmd, log=log)
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/clusteradm.py", line 95, in _watch
          for line in commands.watch(*cmd):
        File "/home/e2e/go/src/github.com/ramendr/ramen/test/drenv/commands.py", line 148, in watch
          raise Error(args, error, exitcode=p.returncode)
      drenv.commands.Error: Command failed:
         command: ('clusteradm', 'accept', '--clusters', 'dr1', '--wait', '--context', 'hub')
         exitcode: 1
         error:
            Error: timed out waiting for the condition

(ramen) [e2e@localhost test]$ 

@ShyamsundarR
Copy link
Member

I "think" I hit this in the case where I started a cluster, stopped it and started it again. At this point it fails to start with the following logs (tried to (re)start twice, hence 2 log outputs below). The trick could be to stop and start the env to see if the problem is reproducible:

Used the changes from this PR to move forward, worked to resolve the issue (not a review, just a "works" comment)

@nirs
Copy link
Member

nirs commented Nov 19, 2023

I finally reproduced this when trying to start a stopped environment.

When running the ocm-cluster/start script manually we see:

$ addons/ocm-cluster/start dr1 hub
Waiting until cluster 'hub' is ready
Cluster 'hub' is ready
'namespace/open-cluster-management-hub' output='jsonpath={.metadata.name}' found in 0.05 seconds
'deploy/cluster-manager-placement-controller' output='jsonpath={.metadata.name}' found in 0.05 seconds
deployment "cluster-manager-placement-controller" successfully rolled out
'deploy/cluster-manager-registration-controller' output='jsonpath={.metadata.name}' found in 0.06 seconds
deployment "cluster-manager-registration-controller" successfully rolled out
'deploy/cluster-manager-registration-webhook' output='jsonpath={.metadata.name}' found in 0.05 seconds
deployment "cluster-manager-registration-webhook" successfully rolled out
'deploy/cluster-manager-work-webhook' output='jsonpath={.metadata.name}' found in 0.05 seconds
deployment "cluster-manager-work-webhook" successfully rolled out
Joining cluster 'hub'
Please log onto the hub cluster and run the following command:

    clusteradm accept --clusters dr1

Accepting cluster
no CSR to approve for cluster dr1
hubAcceptsClient already set for managed cluster dr1

 Your managed cluster dr1 has joined the Hub successfully. Visit https://open-cluster-management.io/scenarios or https://github.com/open-cluster-management-io/OCM/tree/main/solutions for next steps.
no CSR to approve for cluster dr1
hubAcceptsClient already set for managed cluster dr1

 Your managed cluster dr1 has joined the Hub successfully. Visit https://open-cluster-management.io/scenarios or https://github.com/open-cluster-management-io/OCM/tree/main/solutions for next steps.
no CSR to approve for cluster dr1
hubAcceptsClient already set for managed cluster dr1

 Your managed cluster dr1 has joined the Hub successfully. Visit https://open-cluster-management.io/scenarios or https://github.com/open-cluster-management-io/OCM/tree/main/solutions for next steps.
no CSR to approve for cluster dr1
hubAcceptsClient already set for managed cluster dr1

This should be fixed in clusteradm but we can do a termporary fix in drenv for now.


managed_clusters = clusteradm.get("clusters", output="yaml", context=hub)

managed_clusters = managed_clusters.split("---")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not parse all documents using yaml.safe_load_all()?

@@ -6,6 +6,7 @@
import json
import os
import sys
import yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml is 3rd party library, so we separate it from standard library imports. See how it is imported in other files.


if metadata_name == cluster and hub_accepts_client:
print(f"Cluster '{cluster}' already accepted")
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a temporary hack until this is fixed in clusteradm, it will be better to put this code in a helper like is_accepted(), something like:

# TODO: Remove when {link to clusteradm issue} is fixed.
if is_accepted(cluster, hub):
    return

@nirs
Copy link
Member

nirs commented Nov 22, 2023

I reported the clusteadm issue here: open-cluster-management-io/clusteradm#395

@nirs
Copy link
Member

nirs commented Nov 23, 2023

Mike Ng suggested a simpler way to fix this issue, see #1148

@nirs
Copy link
Member

nirs commented Jan 21, 2024

This is already fixed in #1148, we can close this PR.

@raghavendra-talur raghavendra-talur deleted the rtalur-drenv-hub-accept branch January 30, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants