Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify application pods not scheduled on master nodes #42

Closed
mffiedler opened this issue Apr 27, 2020 · 8 comments
Closed

Verify application pods not scheduled on master nodes #42

mffiedler opened this issue Apr 27, 2020 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@mffiedler
Copy link
Collaborator

This might be pushing the scope of what cerberus is intended for, but an easy "health check" would be making sure master nodes are not schedule-able for application workloads. See https://bugzilla.redhat.com/show_bug.cgi?id=1827996

@mffiedler
Copy link
Collaborator Author

I don't think this requires a new option in the config, but it is OpenShift specific - not applicable to generic Kubernetes. We need to make a decision on having a subset of cerberus be OpenShift specific before implementing this.

@paigerube14
Copy link
Collaborator

@mffiedler I'm thinking here that I should do an "oc describe node/" on all the master nodes and verify under the taints section has node-role.kubernetes.io/master:NoSchedule each time. Looking at the bugzilla link it looked like the Taints section when they ran into the bug was .

Is this around what you were thinking? Do you know of an easier way to do that?

@mffiedler
Copy link
Collaborator Author

Checking for the taint would be correct.

Bigger picture, I think we need an option for "verbose health checks" that would enable a selection of smaller, optional, nice to have checks. There are a lot of details about a cluster that can be checked to vet cluster health, but our current set of checks already take 30+ seconds to run on all openshift-* namespaces. This is likely an issue for a separate issue, but we might want to tackle that before this. I will open one and we can discuss there /cc: @yashashreesuresh @chaitanyaenr

@mffiedler
Copy link
Collaborator Author

See issue #53

@mffiedler mffiedler added the enhancement New feature or request label May 14, 2020
@paigerube14
Copy link
Collaborator

paigerube14 commented May 15, 2020

I have created branch schedMaster for this issue. I have just done a check for the schedule of the master nodes as an always run check, not set in the config.

@paigerube14
Copy link
Collaborator

@mffiedler Do you think this does not require an additional check in the config?

@mffiedler
Copy link
Collaborator Author

I don't see a need for a config check here. This should be part of the standard monitor/check loop.

@chaitanyaenr
Copy link
Collaborator

Fixed by #57. Thanks @paigerube14.

chaitanyaenr added a commit to chaitanyaenr/cerberus that referenced this issue May 22, 2020
This commit:
- Adds a high level config option called distribution to be able to
  run operations which are specific to OpenShift in addition to kube
  as mentioned in krkn-chaos#42
  like inspect_component mode which enables the user to
  collect logs/events related to the failed component using oc inspect
  command.
- Updates readme to add info about setting the namespaces to monitor
  in the config depending on the distribution as defaults assumes
  OpenShift. It also adds blogs and other useful resources related
  to Cerberus.
chaitanyaenr added a commit to chaitanyaenr/cerberus that referenced this issue May 22, 2020
This commit:
- Adds a high level config option called distribution to be able to
  run operations which are specific to OpenShift in addition to kube
  as mentioned in krkn-chaos#42
  like inspect_component mode which enables the user to
  collect logs/events related to the failed component using oc inspect
  command.
- Updates readme to add info about setting the namespaces to monitor
  in the config depending on the distribution as defaults assumes
  OpenShift. It also adds blogs and other useful resources related
  to Cerberus.
chaitanyaenr added a commit to chaitanyaenr/cerberus that referenced this issue May 22, 2020
This commit:
- Adds a high level config option called distribution to be able to
  run operations which are specific to OpenShift in addition to kube
  as mentioned in krkn-chaos#42
  like inspect_component mode which enables the user to
  collect logs/events related to the failed component using oc inspect
  command.
- Updates readme to add info about setting the namespaces to monitor
  in the config depending on the distribution as defaults assumes
  OpenShift. It also adds blogs and other useful resources related
  to Cerberus.
chaitanyaenr added a commit to chaitanyaenr/cerberus that referenced this issue May 22, 2020
This commit:
- Adds a high level config option called distribution to be able to
  run operations which are specific to OpenShift in addition to kube
  as mentioned in krkn-chaos#42
  like inspect_component mode which enables the user to
  collect logs/events related to the failed component using oc inspect
  command.
- Updates readme with info about setting the namespaces to monitor
  in the config depending on the distribution as defaults assumes
  OpenShift. It also adds blogs and other useful resources related
  to Cerberus.
chaitanyaenr added a commit to chaitanyaenr/cerberus that referenced this issue May 22, 2020
This commit:
- Adds a high level config option called distribution to be able to
  run operations which are specific to OpenShift in addition to kube
  as mentioned in krkn-chaos#42
  like inspect_component mode which enables the user to
  collect logs/events related to the failed component using oc inspect
  command.
- Updates readme with info about setting the namespaces to monitor
  in the config depending on the distribution as defaults assumes
  OpenShift. It also adds blogs and other useful resources related
  to Cerberus.
chaitanyaenr added a commit to chaitanyaenr/cerberus that referenced this issue May 22, 2020
This commit:
- Adds a high level config option called distribution to be able to
  run operations which are specific to OpenShift in addition to kube
  as mentioned in krkn-chaos#42
  like inspect_component mode which enables the user to
  collect logs/events related to the failed component using oc inspect
  command.
- Updates readme with info about setting the namespaces to monitor
  in the config depending on the distribution as defaults assumes
  OpenShift. It also adds blogs and other useful resources related
  to Cerberus.
chaitanyaenr added a commit to chaitanyaenr/cerberus that referenced this issue May 25, 2020
This commit:
- Adds a high level config option called distribution to be able to
  run operations which are specific to OpenShift in addition to kube
  as mentioned in krkn-chaos#42
  like inspect_component mode which enables the user to
  collect logs/events related to the failed component using oc inspect
  command.
- Updates readme with info about setting the namespaces to monitor
  in the config depending on the distribution as defaults assumes
  OpenShift. It also adds blogs and other useful resources related
  to Cerberus.
chaitanyaenr added a commit that referenced this issue May 26, 2020
This commit:
- Adds a high level config option called distribution to be able to
  run operations which are specific to OpenShift in addition to kube
  as mentioned in #42
  like inspect_component mode which enables the user to
  collect logs/events related to the failed component using oc inspect
  command.
- Updates readme with info about setting the namespaces to monitor
  in the config depending on the distribution as defaults assumes
  OpenShift. It also adds blogs and other useful resources related
  to Cerberus.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants