Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-48177: Exclude etcd readiness checks from /readyz to ignore temporary etcd hiccups #753

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ingvagabund
Copy link
Member

Explicitly exclude etcd and etcd-readiness checks (OCPBUGS-48177) and have etcd operator take responsibility for properly reporting etcd readiness. Justification: kube-apiserver instances get removed from a load balancer when etcd starts to report not ready (as will KA's /readyz). Client connections can withstand etcd unreadiness longer than the readiness timeout is. Thus, it is not necessary to drop connections in case etcd resumes its readiness before a client connection times out naturally.

Follow up for openshift/kubernetes#2174

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jan 20, 2025
@openshift-ci-robot
Copy link
Contributor

@ingvagabund: This pull request references Jira Issue OCPBUGS-48177, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.19.0) matches configured target version for branch (4.19.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @wangke19

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Explicitly exclude etcd and etcd-readiness checks (OCPBUGS-48177) and have etcd operator take responsibility for properly reporting etcd readiness. Justification: kube-apiserver instances get removed from a load balancer when etcd starts to report not ready (as will KA's /readyz). Client connections can withstand etcd unreadiness longer than the readiness timeout is. Thus, it is not necessary to drop connections in case etcd resumes its readiness before a client connection times out naturally.

Follow up for openshift/kubernetes#2174

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from wangke19, ibihim and liouk January 20, 2025 13:24
Copy link
Contributor

openshift-ci bot commented Jan 20, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ingvagabund
Once this PR has been reviewed and has the lgtm label, please assign deads2k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ingvagabund ingvagabund force-pushed the exclude-etcd-from-readyz branch 2 times, most recently from 8cc0e59 to 52172a0 Compare January 20, 2025 14:32
…iccups

Explicitly exclude etcd and etcd-readiness checks (OCPBUGS-48177)
and have etcd operator take responsibility for properly reporting etcd readiness.
Justification: kube-apiserver instances get removed from a load balancer when etcd starts
to report not ready (as will KA's /readyz). Client connections can withstand etcd unreadiness
longer than the readiness timeout is. Thus, it is not necessary to drop connections
in case etcd resumes its readiness before a client connection times out naturally.
@ingvagabund ingvagabund force-pushed the exclude-etcd-from-readyz branch from 52172a0 to 304071a Compare January 20, 2025 15:09
Copy link
Contributor

openshift-ci bot commented Jan 20, 2025

@ingvagabund: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-single-node 304071a link false /test e2e-aws-single-node
ci/prow/test-operator-integration 304071a link false /test test-operator-integration

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants