Verify application pods not scheduled on master nodes #42

mffiedler · 2020-04-27T13:24:47Z

This might be pushing the scope of what cerberus is intended for, but an easy "health check" would be making sure master nodes are not schedule-able for application workloads. See https://bugzilla.redhat.com/show_bug.cgi?id=1827996

mffiedler · 2020-05-11T14:36:47Z

I don't think this requires a new option in the config, but it is OpenShift specific - not applicable to generic Kubernetes. We need to make a decision on having a subset of cerberus be OpenShift specific before implementing this.

paigerube14 · 2020-05-14T18:37:34Z

@mffiedler I'm thinking here that I should do an "oc describe node/" on all the master nodes and verify under the taints section has node-role.kubernetes.io/master:NoSchedule each time. Looking at the bugzilla link it looked like the Taints section when they ran into the bug was .

Is this around what you were thinking? Do you know of an easier way to do that?

mffiedler · 2020-05-14T19:13:14Z

Checking for the taint would be correct.

Bigger picture, I think we need an option for "verbose health checks" that would enable a selection of smaller, optional, nice to have checks. There are a lot of details about a cluster that can be checked to vet cluster health, but our current set of checks already take 30+ seconds to run on all openshift-* namespaces. This is likely an issue for a separate issue, but we might want to tackle that before this. I will open one and we can discuss there /cc: @yashashreesuresh @chaitanyaenr

mffiedler · 2020-05-14T19:23:19Z

See issue #53

paigerube14 · 2020-05-15T19:32:35Z

I have created branch schedMaster for this issue. I have just done a check for the schedule of the master nodes as an always run check, not set in the config.

paigerube14 · 2020-05-18T15:42:56Z

@mffiedler Do you think this does not require an additional check in the config?

mffiedler · 2020-05-19T19:29:09Z

I don't see a need for a config check here. This should be part of the standard monitor/check loop.

chaitanyaenr · 2020-05-21T20:36:32Z

Fixed by #57. Thanks @paigerube14.

This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme to add info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.

This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in krkn-chaos#42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme with info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.

This commit: - Adds a high level config option called distribution to be able to run operations which are specific to OpenShift in addition to kube as mentioned in #42 like inspect_component mode which enables the user to collect logs/events related to the failed component using oc inspect command. - Updates readme with info about setting the namespaces to monitor in the config depending on the distribution as defaults assumes OpenShift. It also adds blogs and other useful resources related to Cerberus.

mffiedler assigned paigerube14 May 11, 2020

mffiedler mentioned this issue May 14, 2020

Ability to add/enable collections of optional monitors #53

Open

mffiedler added the enhancement New feature or request label May 14, 2020

paigerube14 mentioned this issue May 20, 2020

Verify not scheduled status on master nodes #57

Merged

chaitanyaenr closed this as completed May 21, 2020

chaitanyaenr mentioned this issue May 22, 2020

Separate operations based on distribution #61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify application pods not scheduled on master nodes #42

Verify application pods not scheduled on master nodes #42

mffiedler commented Apr 27, 2020

mffiedler commented May 11, 2020

paigerube14 commented May 14, 2020

mffiedler commented May 14, 2020

mffiedler commented May 14, 2020

paigerube14 commented May 15, 2020 •

edited

Loading

paigerube14 commented May 18, 2020

mffiedler commented May 19, 2020

chaitanyaenr commented May 21, 2020

Verify application pods not scheduled on master nodes #42

Verify application pods not scheduled on master nodes #42

Comments

mffiedler commented Apr 27, 2020

mffiedler commented May 11, 2020

paigerube14 commented May 14, 2020

mffiedler commented May 14, 2020

mffiedler commented May 14, 2020

paigerube14 commented May 15, 2020 • edited Loading

paigerube14 commented May 18, 2020

mffiedler commented May 19, 2020

chaitanyaenr commented May 21, 2020

paigerube14 commented May 15, 2020 •

edited

Loading