-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MVP support / extension of support for serving workloads #2717
Comments
/assign @trasc |
/cc @liurupeng @ahg-g |
For LWS, would including a suspend field be a better forward thinking strategy? |
For now complete "suspend" for serving workload isn't a use case we hear about. The preference is to reduce capacity by preempting individual pods, so that stopping a serving workload completely is the last resort option. However, it is hard to say "never" in the long run, but I would keep it out of scope for this enhancement. |
Sounds good. I guess in LWS case preemptiong would be the entire leader-worker group? Or preempting some workers? |
For now, the entire group. |
/assign |
It looks like that this contains LWS and StatefulSet. |
@tenzen-y: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
+1 on behave of LWS. And evicting the entire group(leaderPod + workerSts) is the right path because they working as an unit, what you need to do is just reduce the Replicas to the resource boundary. Some other feedbacks, as the maintainer of llmaz, another inference platform, what we need most is the capacity of accelerator fungibility, the same model could be served by several different kinds of GPUs for the sake of cost and performance. I think kueue can help in some ways, actually part of our integration roadmap. Ourself will implement the capacity as well but considering our customers are also using kueue, this could be a centralized control plane. |
@mimowo Couldn't we split LWS to a separate issue as we mentioned in the next release issue? |
Sure, we can, would you like to do so? Otherwise I can split it tomorrow. |
I'm not in a hurry. So, I'm ok with tomorrow. |
Done: #3232. PTAL |
/reopen |
@mimowo: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What would you like to be added:
I would like to make sure we have basic support for running serving workloads for the use case of running AI inference.
In particular I would like to have support for Deployments, StatefulSets, and LeaderWorkerSets.
In the MVP work the integrations are based on single Plain pods (for Deployments) or Pod Groups (for StatefulSets).
This is a follow up to Document how to use Kueue for Deployments #2677.
What is needed:
What is needed:
In the longer run to support scaling of stateful sets we may need to do #77, but this is out of scope for the issue,
Why is this needed:
To support use cases of running AI training and inference in the same clusters, where the access to GPU is constrained by Kueue.
Completion requirements:
The API changes required are minimal (just potentially new labels / annotations), so I believe a new KEP is not required, but we need a proper documentation.
This enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: