Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[operator] bump operator base image to 1.36.0 #857

Merged
merged 2 commits into from
Dec 23, 2024

Conversation

jmazzitelli
Copy link
Contributor

@jmazzitelli jmazzitelli commented Dec 20, 2024

It is easier now to bump to the upstream image because the upstream image doesn't have collections; our Dockerfile installs the collections we want and that match downstream.

To test, first you need to check out this PR and the associated helm chart PR 304. Then run this set of commands to install minikube with test infrastructure, build/push/deploy the dev images to the cluster, and run the molecule tests:

./hack/k8s-minikube.sh --dex-enabled true -mp ci start \
&& ./hack/k8s-minikube.sh -mp ci istio \
&& make CLUSTER_TYPE=minikube MINIKUBE_PROFILE=ci HELM_CHARTS_REPO_PULL=false build build-ui cluster-push \
&& ./hack/run-molecule-tests.sh --helm-charts-repo-pull false --client-exe "$(which kubectl)" --minikube-exe $(which minikube) --minikube-profile ci --cluster-type minikube -udi true

NOTE! If you do not want to wait for the entire test suite to pass (it will take about an hour), you can pass in the operation -at <tests> to that last command ./hack/run-molecule-tests.sh to just run a few tests. For example, a good set of tests to exercise the operator will be -at "accessible-namespaces-test metrics-test roles-tests"

An important test to look at is "metrics-test" - that originally failed because ansible operator base image 1.36 changed its metrics port default value from the previous 1.35 (for details, see the PR comments below). The operator and helm chart PRs addresses this. So the metrics-test should pass now.

NOTE 2! If you don't even want to run the molecule tests, just install the operator and kiali server via make cluster-push operator-create kiali-create and if kiali server installs successfully, then things are working. You could also see that the operator's metrics endpoint is still at 8080 by running this command and getting prometheus metrics successfully: kubectl exec -it -n kiali-operator deployments/kiali-operator -- curl http://localhost:8080/metrics

@jmazzitelli jmazzitelli self-assigned this Dec 20, 2024
@jmazzitelli
Copy link
Contributor Author

On minikube, all molecule tests pass EXCEPT for metrics-test. I'll investigate why that fails. But I will next run on OpenShift/CRC to see how the molecule tests run on that platform.

@jmazzitelli
Copy link
Contributor Author

jmazzitelli commented Dec 20, 2024

metrics-test also failed on OpenShift (all other molecule tests passed).

So this PR is good except for the operator metrics testing, which we need to figure out what changed.

@jmazzitelli
Copy link
Contributor Author

TASK [Query Prometheus from Kiali Operator Pod] ********************************
fatal: [localhost]: FAILED! => {"changed": true, "rc": 3, "return_code": 3, "stderr": "", "stderr_lines": [], "stdout":
 "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}", "stdout_lines": ["{\"status\":\"success\",\
"data\":{\"resultType\":\"vector\",\"result\":[]}}"]}
...ignoring

TASK [Raw Prometheus query results] ********************************************
ok: [localhost] => {
    "msg": "URL: http://prometheus.istio-system:9090/api/v1/query }}; Request: {'query': 'workqueue_work_duration_secon
ds_count{app=\\\\\"kiali-operator\\\\\",namespace=\\\\\"kiali-operator\\\\\"}', 'time': '2024-12-20T19:12:47Z'}; Results: {
'changed': True, 'stdout': '{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}', 'stderr': '', 'rc
': 3, 'return_code': 3, 'stdout_lines': ['{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[]}}'], 's
tderr_lines': [], 'failed': True}"
}

TASK [set_fact] ****************************************************************
ok: [localhost]

TASK [fail] ********************************************************************
skipping: [localhost]

TASK [assert] ******************************************************************
fatal: [localhost]: FAILED! => {
    "assertion": "prometheus_query_results.json.data.result | length > 0",
    "changed": false,
    "evaluated_to": false,
    "msg": "Assertion failed"
}

PLAY RECAP *********************************************************************
localhost                  : ok=23   changed=1    unreachable=0    failed=1    skipped=5   

@jmazzitelli
Copy link
Contributor Author

Yeah, something is different with ansible operator base image 1.36.0... the metric endpoint changed.

When on 1.35, the operator has a metrics endpoint that can be reached. But on 1.36.0 (without changing any config), that endpoint fails. I didn't see anything in the 1.36.0 release notes that talk about the metrics endpoint going away.

With 1.36.0 as the base image (error!):

$ kubectl exec -it -n kiali-operator deployments/kiali-operator -- curl http://localhost:8080/metrics
curl: (7) Failed to connect to localhost port 8080: Connection refused
command terminated with exit code 7

With 1.35.0 as the base image (successfully see metrics):

$ kubectl exec -it -n kiali-operator deployments/kiali-operator -- curl http://localhost:8080/metrics
# HELP ansible_operator_build_info Build information for the ansible-operator binary
# TYPE ansible_operator_build_info gauge
ansible_operator_build_info{commit="42b5d80c75f1ddda8f2dbe1629b9454d366a8d6a",version="v1.35.0"} 1
# HELP certwatcher_read_certificate_errors_total Total number of certificate read errors
# TYPE certwatcher_read_certificate_errors_total counter
certwatcher_read_certificate_errors_total 0
....

@jmazzitelli
Copy link
Contributor Author

The problem is --metrics-bind-address default changed from 8080 to 8443. We need to pass that in explicitly to the operator startup command and set it back to 8080.

@jmazzitelli
Copy link
Contributor Author

jmazzitelli commented Dec 20, 2024

we need to make the change to the cmd line (to add --metrics-bind-address=:8080) to the OLM metadata for the upstream. We will do that after the next sprint release is performed because this change won't go in until Kiali 2.4. After the sprint release is done, we'll want to add this to the metadata file: manifests/kiali-upstream/2.4.0/manifests/kiali.v2.4.0.clusterserviceversion.yaml

@jmazzitelli
Copy link
Contributor Author

With the change in the operator command line (--metrics-bind-address=:8080) the metrics-test molecule test now passes.

@@ -261,6 +261,7 @@ spec:
- "--leader-election-id=kiali-operator"
- "--watches-file=./$(WATCHES_FILE)"
- "--health-probe-bind-address=:6789"
- "--metrics-bind-address=:8080"
Copy link
Contributor Author

@jmazzitelli jmazzitelli Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: it is OK changing this for the OSSM OLM metadata because this option exists in earlier versions, too. This includes the base image used downstream, as you see verified below:

4.15:

$ podman run  -it --rm registry.redhat.io/openshift4/ose-ansible-operator:v4.15 --help | grep metrics-bind-address
      --metrics-bind-address string    The address the metric endpoint binds to (default ":8080")

4.17:

$ podman run  -it --rm registry.redhat.io/openshift4/ose-ansible-operator:v4.17 --help | grep metrics-bind-address
      --metrics-bind-address string    The address the metric endpoint binds to (default ":8080")

But notice the default of these older versions is 8080, which is what we did not expect to change but did in newer versions of the upstream base image:

$ podman run -it --rm quay.io/operator-framework/ansible-operator:v1.35.0 --help | grep metrics-bind-address
      --metrics-bind-address string    The address the metric endpoint binds to (default ":8080")

$ podman run -it --rm quay.io/operator-framework/ansible-operator:v1.36.0 --help | grep metrics-bind-address
      --metrics-bind-address string    The address the metric endpoint binds to (default ":8443")

@jmazzitelli jmazzitelli marked this pull request as ready for review December 23, 2024 13:41
It is easier now to bump to the upstream image because the upstream image doesn't have collections; our Dockerfile installs the collections we want and that match downstream.
@jmazzitelli
Copy link
Contributor Author

jmazzitelli commented Dec 23, 2024

we need to make the change to the cmd line (to add --metrics-bind-address=:8080) to the OLM metadata for the upstream. We will do that after the next sprint release is performed because this change won't go in until Kiali 2.4. After the sprint release is done, we'll want to add this to the metadata file: manifests/kiali-upstream/2.4.0/manifests/kiali.v2.4.0.clusterserviceversion.yaml

This PR has this done now. Ready for review and merge. Test instructions are found in this PR's description.

@jmazzitelli
Copy link
Contributor Author

Molecule tests all pass:

=====================
=== TEST RESULTS: ===
=====================

              accessible-namespaces-test... success [7m 48s]
     affinity-tolerations-resources-test... success [4m 22s]
                cluster-wide-access-test... success [8m 51s]
                      config-values-test... success [2m 51s]
                  default-namespace-test... success [2m 37s]
                            grafana-test... success [3m 45s]
                        header-auth-test... success [2m 9s]
                      instance-name-test... success [3m 8s]
                             jaeger-test... success [3m 4s]
                            metrics-test... success [4m 12s]
                     null-cr-values-test... success [2m 16s]
                only-view-only-mode-test... success [3m 2s]
                             openid-test... success [2m 14s]
                   os-console-links-test... skipped
          ossmconsole-config-values-test... skipped
           remote-cluster-resources-test... success [3m 20s]
                              roles-test... success [5m 16s]
                    rolling-restart-test... success [2m 55s]
                              token-test... success [2m 48s]

@jmazzitelli jmazzitelli merged commit 1acbf39 into kiali:master Dec 23, 2024
1 check passed
@jmazzitelli jmazzitelli deleted the update-base-image branch December 23, 2024 18:00
@jshaughn jshaughn added the test: n/a PR does not need test additions or updates label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
requires helm chart PR test: n/a PR does not need test additions or updates
Projects
Development

Successfully merging this pull request may close these issues.

3 participants