Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP support / extension of support for serving workloads #2717

Closed
1 task
Tracked by #3192
mimowo opened this issue Jul 29, 2024 · 17 comments · Fixed by #2813, #3001 or #3312
Closed
1 task
Tracked by #3192

MVP support / extension of support for serving workloads #2717

mimowo opened this issue Jul 29, 2024 · 17 comments · Fixed by #2813, #3001 or #3312
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@mimowo
Copy link
Contributor

mimowo commented Jul 29, 2024

What would you like to be added:

I would like to make sure we have basic support for running serving workloads for the use case of running AI inference.
In particular I would like to have support for Deployments, StatefulSets, and LeaderWorkerSets.

In the MVP work the integrations are based on single Plain pods (for Deployments) or Pod Groups (for StatefulSets).

  1. Deployments
    This is a follow up to Document how to use Kueue for Deployments #2677.

What is needed:

  • introduce a dedicated Deployment integration, and validate that it can only be enabled when pod integration is enabled
  • copy the queue-name from Deployment down to PodTemlates
  1. StatefulSets

What is needed:

  • introduce a dedicated StatefulSet integration, and validate that it can only be enabled when pod integration is enabled
  • copy the queue-name from StatefulSet down to PodTemlates
  • set the PodTemaplate labels for the PodGroup:
  • kueue.x-k8s.io/queue-name - from STS
  • kueue.x-k8s.io/pod-group-name - STS_ + STS name (+ probably some hash to avoid collisions as for workloads)
  • kueue.x-k8s.io/pod-group-total-count - STS replica count

In the longer run to support scaling of stateful sets we may need to do #77, but this is out of scope for the issue,

  1. LeaderWorkerSet support is moved to a dedicated issue: MVP support for serving workloads running as LeaderWorkerSet #3232

Why is this needed:

To support use cases of running AI training and inference in the same clusters, where the access to GPU is constrained by Kueue.

Completion requirements:

The API changes required are minimal (just potentially new labels / annotations), so I believe a new KEP is not required, but we need a proper documentation.

This enhancement requires the following artifacts:

  • Docs update

The artifacts should be linked in subsequent comments.

@mimowo mimowo added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 29, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Jul 29, 2024

/assign @trasc

@mimowo
Copy link
Contributor Author

mimowo commented Jul 29, 2024

/cc @mwielgus @tenzen-y @dgrove-oss

@kannon92
Copy link
Contributor

/cc @liurupeng @ahg-g
for LWS.

@kannon92
Copy link
Contributor

For LWS, would including a suspend field be a better forward thinking strategy?

@mimowo
Copy link
Contributor Author

mimowo commented Jul 29, 2024

For LWS, would including a suspend field be a better forward thinking strategy?

For now complete "suspend" for serving workload isn't a use case we hear about. The preference is to reduce capacity by preempting individual pods, so that stopping a serving workload completely is the last resort option.

However, it is hard to say "never" in the long run, but I would keep it out of scope for this enhancement.

@kannon92
Copy link
Contributor

Sounds good. I guess in LWS case preemptiong would be the entire leader-worker group? Or preempting some workers?

@mimowo
Copy link
Contributor Author

mimowo commented Jul 29, 2024

For now, the entire group.

@vladikkuzn
Copy link
Contributor

/assign

@tenzen-y
Copy link
Member

It looks like that this contains LWS and StatefulSet.
/reopen

@k8s-ci-robot
Copy link
Contributor

@tenzen-y: Reopened this issue.

In response to this:

It looks like that this contains LWS and StatefulSet.
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kerthcet
Copy link
Contributor

For now complete "suspend" for serving workload isn't a use case we hear about.

+1 on behave of LWS. And evicting the entire group(leaderPod + workerSts) is the right path because they working as an unit, what you need to do is just reduce the Replicas to the resource boundary.

Some other feedbacks, as the maintainer of llmaz, another inference platform, what we need most is the capacity of accelerator fungibility, the same model could be served by several different kinds of GPUs for the sake of cost and performance. I think kueue can help in some ways, actually part of our integration roadmap. Ourself will implement the capacity as well but considering our customers are also using kueue, this could be a centralized control plane.

@tenzen-y
Copy link
Member

@mimowo Couldn't we split LWS to a separate issue as we mentioned in the next release issue?

@mimowo
Copy link
Contributor Author

mimowo commented Oct 14, 2024

Sure, we can, would you like to do so? Otherwise I can split it tomorrow.

@tenzen-y
Copy link
Member

Sure, we can, would you like to do so? Otherwise I can split it tomorrow.

I'm not in a hurry. So, I'm ok with tomorrow.

@mimowo
Copy link
Contributor Author

mimowo commented Oct 15, 2024

Done: #3232. PTAL

@mimowo
Copy link
Contributor Author

mimowo commented Oct 23, 2024

/reopen
Let's close it when documentation for StatefulSet lends. cc @vladikkuzn

@k8s-ci-robot
Copy link
Contributor

@mimowo: Reopened this issue.

In response to this:

/reopen
Let's close it when documentation for StatefulSet lends. cc @vladikkuzn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
7 participants