-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix kubernetes_daemon_set not waiting for rollout. Fixes #2092 #2419
base: main
Are you sure you want to change the base?
Conversation
fa9e3fe
to
8426fa4
Compare
8426fa4
to
d330044
Compare
d330044
to
4187873
Compare
|
||
resources { | ||
limits = { | ||
cpu = "0.5" | ||
memory = "512Mi" | ||
"nvidia/gpu" = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this is causing the pods are unschedulable and thus rollout fails and the test never passes.
4187873
to
fde82c2
Compare
@@ -912,6 +916,7 @@ func testAccKubernetesDaemonSetV1ConfigWithContainerSecurityContextSeccompProfil | |||
} | |||
} | |||
} | |||
wait_for_rollout = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: rollout of this resource fails on K8s nodes that do not have any seccomp profiles locally available (e.g. nodes in a kind
cluster), so let's not wait for rollout, as it's not important for this test.
@@ -962,6 +967,7 @@ func testAccKubernetesDaemonSetV1ConfigWithContainerSecurityContextSeccompProfil | |||
} | |||
} | |||
} | |||
wait_for_rollout = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: rollout of this resource fails on K8s nodes that do not have any seccomp profiles locally available (e.g. nodes in a kind
cluster), so let's not wait for rollout, as it's not important for this test.
@@ -685,6 +685,7 @@ func testAccKubernetesDaemonSetV1ConfigWithTemplateMetadata(depName, imageName s | |||
container { | |||
image = "%s" | |||
name = "containername" | |||
command = ["sleep", "infinity"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: the container commands must be added as by default the container just exits and the test is thus failing. This is now added for the majority of the test cases for this resource.
fde82c2
to
5f6ee8a
Compare
8321d84
to
f4cbd9b
Compare
f4cbd9b
to
be1c10e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbocinec thank you for contributing this fix and the very detailed diagnosis of the issue.
The changes look good to me. While we wait for the tests to run, would you mind expanding the change log entry a bit to describe the change in behaviour that users can expect after upgrading?
@@ -0,0 +1,4 @@ | |||
```release-note:bug | |||
`resource/kubernetes_daemon_set_v1`: fix an issue with the provider not waiting for rollout with `wait_for_rollout = true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest a mention of the change in behaviour here, so users are well advised of what to expect when upgrading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexsomesan That's a great point. I'm not sure about the wording of the note. I'm thinking about adding the following to the existing entry: "Note: As the wait_for_rollout
is true
by default, users might experience the apply
operation of the existing code taking longer."
Or should the UPGRADE NOTES
section be used instead as I see it was used in past e.g. https://github.com/hashicorp/terraform-provider-kubernetes/blob/main/CHANGELOG.md#161-april-18-2019 . In this case, I'm not sure how to add this extra section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is undesirable to change the semantics of the existing wait_for_rollout
attribute (due to potential breakage of existing code), perhaps a new really_wait_for_rollout
attribute can be introduced instead, and the old one deprecated.
This fix ensures the resource correctly waits for the DaemonSet rollout by using similar logic than the kubernetes_deployment. Additionally, waitForDaemonSetReplicasFunc is renamed to waitForDaemonSetPodsFunc to correctly describe the operation (there are no replicas in a DaemonSet).
be1c10e
to
080bb15
Compare
This is a very useful fix as it obviates the need for nasty hacky workarounds (such as a null resource with a local-exec provisioner running kubectl). Can it be included in the July release? |
moving this to |
How about making this change non-breaking by introducing a differently-named parameter?
|
Description
Fixes
kubernetes_daemon_set_v1
resource to wait for rollout. Current check for the rollout whenwait_for_rollout = true
is ineffective and is not waiting for rollout at all as it checks a wrong status field of a DaemonSet.❗ What is worse, there is currently no way how to correctly manage a DaemonSet using
kubernetes
provider as alsokubernetes_manifest
is affected by a nasty bug so every DaemonSet change ends up with a failure..Impacted by the issue, I prepared a reproducer TF code here. Troubleshooting the issue by applying a new resource and changing existing one I noticed, the current code is checking a wrong field, so
wait_for_rollout = true
has no effect as incorrect status fiels is check.Additionally, majority of the daemonset resource tests must have been fixed as after fixing the rollout issues, these have been failing as the the original manifests failed to be rolled out.
wait_for_rollout
is set to true in the resource. Until now, waiting for rollout was ineffective, though after it's fixed, apply is going to wait until all the daemonset pods are in Ready state. I expect this might cause some issues for users using this resource as their existing TF code might start to time out on apply. This is also visible on the number of test case changes, that must have been updated.Troubleshooting - apply of a new resource
terraform apply
The apply is almost instant. While the apply is running,, this is how the daemonset status looks like: `
As you can see, checking
CurrentNumberScheduled
is not the right property to check if when we need to wait for the rollout as this number does not represent the DaemonSet rollout status. We must check fornumberReady
here.Troubleshooting - change of an existing resource
Apply of a change:
DS status output:
Acceptance tests
Output from acceptance testing:
Release Note
Release note for CHANGELOG:
References
Community Note