Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disabling VisibilityOnDemand feature gate blocks namespace deletion #3943

Open
dgrove-oss opened this issue Jan 8, 2025 · 5 comments · May be fixed by #3947
Open

Disabling VisibilityOnDemand feature gate blocks namespace deletion #3943

dgrove-oss opened this issue Jan 8, 2025 · 5 comments · May be fixed by #3947
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@dgrove-oss
Copy link
Contributor

What happened:

I installed Kueue with the VisbilityOnDemand feature disabled via --feature-gates=VisibilityOnDemand=false.

I then created and attempted to delete a namespace. The namespace deletion stalled indefinitely.

What you expected to happen:

I expected to be able to delete a namespace.

How to reproduce it (as minimally and precisely as possible):

Deploy Kueue with --feature-gates=VisibilityOnDemand=false. I happened to install from master (bf4657a) to verify
that #3908 didn't fix the problem, but I also saw the same incorrect behavior on Kueue 0.10 during the Christmas break.

kubectl create ns test

kubectl delete ns test

Namespace deletion will hang.

Anything else we need to know?:

Doing a get on the namespace gets:

apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: "2025-01-08T16:29:06Z"
  deletionTimestamp: "2025-01-08T16:29:15Z"
  labels:
    kubernetes.io/metadata.name: test
  name: test
  resourceVersion: "2497"
  uid: 461a7652-ae28-4029-acf9-104800161b17
spec:
  finalizers:
  - kubernetes
status:
  conditions:
  - lastTransitionTime: "2025-01-08T16:29:20Z"
    message: 'Discovery failed for some groups, 1 failing: unable to retrieve the
      complete list of server APIs: visibility.kueue.x-k8s.io/v1beta1: stale GroupVersion
      discovery: visibility.kueue.x-k8s.io/v1beta1'
    reason: DiscoveryFailed
    status: "True"
    type: NamespaceDeletionDiscoveryFailure
  - lastTransitionTime: "2025-01-08T16:29:20Z"
    message: All legacy kube types successfully parsed
    reason: ParsedGroupVersions
    status: "False"
    type: NamespaceDeletionGroupVersionParsingFailure
  - lastTransitionTime: "2025-01-08T16:29:20Z"
    message: All content successfully deleted, may be waiting on finalization
    reason: ContentDeleted
    status: "False"
    type: NamespaceDeletionContentFailure
  - lastTransitionTime: "2025-01-08T16:29:20Z"
    message: All content successfully removed
    reason: ContentRemoved
    status: "False"
    type: NamespaceContentRemaining
  - lastTransitionTime: "2025-01-08T16:29:20Z"
    message: All content-preserving finalizers finished
    reason: ContentHasNoFinalizers
    status: "False"
    type: NamespaceFinalizersRemaining
  phase: Terminating
@dgrove-oss dgrove-oss added the kind/bug Categorizes issue or PR as related to a bug. label Jan 8, 2025
@dgrove-oss
Copy link
Contributor Author

/cc @varshaprasad96

@tenzen-y
Copy link
Member

tenzen-y commented Jan 8, 2025

I suspect that you installed the APIService resource (https://github.com/kubernetes-sigs/kueue/blob/main/config/components/visibility/apiservice_v1beta1.yaml). Could you check if your cluster has Kueue APIService resources. If your cluster has it, what if you remove that?

@dgrove-oss
Copy link
Contributor Author

Good guess :)

Yes, it installed:

(base) dgrove@Dave's IBM Mac kueue % kubectl get APIService 
NAME                                   SERVICE                                AVAILABLE                      AGE
...
v1beta1.visibility.kueue.x-k8s.io      kueue-system/kueue-visibility-server   False (FailedDiscoveryCheck)   3h11m
...

After I do kubectl delete APIService v1beta1.visibility.kueue.x-k8s.io, then namespace deletion works as expected.

I guess this is mainly a documentation issue then? Disabling the feature requires more than just setting the feature flag to false.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 8, 2025

Thank you for checking that.

I guess this is mainly a documentation issue then? Disabling the feature requires more than just setting the feature flag to false.

I think so, too. Would you mind opening PR to add notifications for APIService manifest in case of situations where they disable the VisibilityOndemand feature gate? I think we can add it to https://kueue.sigs.k8s.io/docs/tasks/manage/monitor_pending_workloads/pending_workloads_on_demand/.

After we remove the VisibilityOnDemand feature gate, we can remove the notification as well. We typically remove the GA feature gates after the GA feature has been two minor releases.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 8, 2025

/remove-kind bug
/kind support

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants