-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify application pods not scheduled on master nodes #42
Comments
I don't think this requires a new option in the config, but it is OpenShift specific - not applicable to generic Kubernetes. We need to make a decision on having a subset of cerberus be OpenShift specific before implementing this. |
@mffiedler I'm thinking here that I should do an "oc describe node/" on all the master nodes and verify under the taints section has node-role.kubernetes.io/master:NoSchedule each time. Looking at the bugzilla link it looked like the Taints section when they ran into the bug was . Is this around what you were thinking? Do you know of an easier way to do that? |
Checking for the taint would be correct. Bigger picture, I think we need an option for "verbose health checks" that would enable a selection of smaller, optional, nice to have checks. There are a lot of details about a cluster that can be checked to vet cluster health, but our current set of checks already take 30+ seconds to run on all openshift-* namespaces. This is likely an issue for a separate issue, but we might want to tackle that before this. I will open one and we can discuss there /cc: @yashashreesuresh @chaitanyaenr |
See issue #53 |
I have created branch schedMaster for this issue. I have just done a check for the schedule of the master nodes as an always run check, not set in the config. |
@mffiedler Do you think this does not require an additional check in the config? |
I don't see a need for a config check here. This should be part of the standard monitor/check loop. |
Fixed by #57. Thanks @paigerube14. |
This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme to add info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.
This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme to add info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.
This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme to add info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.
This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme with info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.
This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme with info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.
This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme with info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.
This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme with info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.
This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in #42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme with info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.
This might be pushing the scope of what cerberus is intended for, but an easy "health check" would be making sure master nodes are not schedule-able for application workloads. See https://bugzilla.redhat.com/show_bug.cgi?id=1827996
The text was updated successfully, but these errors were encountered: