-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Configuration API Design #1
Conversation
* Init go project * Initial API design documentation * Code Generation * Initial API struct Signed-off-by: Lionel Jouin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve, wholeheartedly! The conversation herein is questions and thoughts that I have. But, at this point -- I believe getting the repository started with the first PR is most important, and those details can be hammered out, nothing here in my comments that's a show stopper.
- macvlan-eth0 | ||
opaque: | ||
driver: cni.dra.networking.k8s.io | ||
parameters: # CNIParameters with the GVK, interface name and CNI Config (in YAML format). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is.... quasi-uncharted territory, but... I think it'll work, and I think people will like it overall. It's an improvement over "straight up json in the yaml" -- for sure. Here's the tricky part... It's hard to validate. I wonder what it'll look like if it's invalid, a CNI plugin kicks it back, and then we see the failing pod... I'm curious how that experience will be for the user. This is a common CNI problem I see users facing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Doug mentioned, somewhere we need to describe 'the way to convert from JSON to YAML' because even if the conversion is intuitive, currently no official document for that. Otherwise, we will get some issue in this repo: How to apply my current CNI JSON file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, maybe a solution could be to accept json format in the config
field (type runtime.RawExtension
) and have a mutating webhook that will json.Unmarshal
the CNI and put it back in the config
field (config.raw
). This way, the cni config will be transformed in a yaml format while the user applied it in json.
* [cni.dev](https://www.cni.dev/) | ||
* [k8snetworkplumbingwg/multus-cni](https://github.com/k8snetworkplumbingwg/multus-cni) | ||
* [KEP-4817 - Resource Claim Status with possible standardized network interface data](https://github.com/kubernetes/enhancements/issues/4817) | ||
* [containernetworking/cni#1132 - VALIDATE Operation](https://github.com/containernetworking/cni/issues/1132) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well clearly you've already thought about the validation, because... Here it is :) So... Yeah. This may be inevitable. Maybe each terminal CNI plugin can provide it's own kind of schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's the idea with this CNI issue, each CNI should provide a validation schema. I will try to work on it soon in CNI, once this is in place we could have full validation.
One of the issue with the CNI-DRA-Driver might be how to gather all schemas.
|
||
If the invocation of the CNI `ADD` operation fails, the `DEL` operation will be invoked immediately to clean up the failing interface. An error will also be returned during pod creation to fail the pod creation. | ||
|
||
The interface will still be reported in the `status.devices` field of the ResourceClaim but with the `Ready` condition type set to `False`, with the `reason` field set to `NetworkInterfaceNotReady`, and with the `message` field set with the error returned while calling the CNI `ADD` operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is really clean to have that message from the failed ADD available. Good one. Do we need any messaging other than the static "CNI-DRA-Driver configured the interface" on a successful ADD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have anything else in mind?
uid: 680f0a77-8d0b-4e21-8599-62581e335ed6 | ||
``` | ||
|
||
#### Failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to have anything about failed CNI DELs? Or, does it not matter since the pod object is being deleted at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need something right. The pod could be deleted, but maybe we could block the deletion of the ResourceClaim and keep retrying DEL?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dougbtv, LionelJouin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- macvlan-eth0 | ||
opaque: | ||
driver: cni.dra.networking.k8s.io | ||
parameters: # CNIParameters with the GVK, interface name and CNI Config (in YAML format). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Doug mentioned, somewhere we need to describe 'the way to convert from JSON to YAML' because even if the conversion is intuitive, currently no official document for that. Otherwise, we will get some issue in this repo: How to apply my current CNI JSON file?
|
||
* [cni.dev](https://www.cni.dev/) | ||
* [k8snetworkplumbingwg/multus-cni](https://github.com/k8snetworkplumbingwg/multus-cni) | ||
* [KEP-4817 - Resource Claim Status with possible standardized network interface data](https://github.com/kubernetes/enhancements/issues/4817) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to mention https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4817-resource-claim-device-status as well as above.
|
||
Opaque Parameter validation: | ||
* All properties in the CNI object must be valid. | ||
* The CNI config must follow correct syntax and semantics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd liked to know how to validate the CNI object? For example, will cni-dra-driver checks each plugin's parameter, somehow, as following? (this is invalid config sample and it has 3 syntax/semantics error. do you see what is wrong?)
- type: bridge
master: ens2f0np0
hairpinMode: yes-we-need!
vlan: "110"
(snip)
Yeah, I understand to validate the user-input is important, but in this spec, I'd like to know 'what should be validate' and 'what should be skipped to validate'. Currently CNI community does not have 'CNI config validator' and actually impossible to implement it because each CNI plugin introduces its own parameter and we cannot clarify which is 'required' or 'optional'. From the CNI semantics point of view, I'm wondering whether we can verify CNI config or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was to rely on the CNI project to provides the tools for that (not existing yet, but we can build it from this issue: containernetworking/cni#1132). I don't think the CNI-DRA-Driver should try to validate by itself anything else in the CNI config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From above description, it (validation) seems to be 'must have'. How about to add some text to mention that validation is optional. At least, current CNI plugin does not have containernetworking/cni#1132, so you cannot use it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, I added a note to mention a feature must be first provided from the CNI project.
* A ResourceClaim utilizing the device class `cni.networking.k8s.io` must be claimed by one and only one pod. | ||
|
||
Opaque Parameter validation: | ||
* All properties in the CNI object must be valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does cni-dra-driver validate in the CNI object? I mean, cni-dra-driver checks CNI object syntax? or more? I'd like to know how cni-dra-driver checks each filed (from syntax/semantics).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the other properties of the CNI object like IfName
, I clarified with latest commit.
@@ -0,0 +1,15 @@ | |||
/* | |||
Copyright The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have YEAR
in this line as https://github.com/kubernetes-sigs/controller-tools/blob/main/boilerplate.go.txt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied it from gateway-api project: https://github.com/kubernetes-sigs/gateway-api/blob/main/hack/boilerplate/boilerplate.generatego.txt
@LionelJouin looking forward to reviewing this tomorrow! |
/cc |
… object Signed-off-by: Lionel Jouin <[email protected]>
- opaque: | ||
driver: cni.dra.networking.k8s.io | ||
parameters: | ||
apiVersion: cni.dra.networking.k8s.io/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is going to be an opaque config of the drivers I feel we should not use the reserved namspace k8s.io
and instead use x-k8s.io
https://github.com/kubernetes/community/blob/master/sig-architecture/api-review-process.md#voluntary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I also replaced the driver name, deviceclass name. I am not sure if it should apply there
|
||
## Design | ||
|
||
A new API with the Kind `CNI` will be introduced under the `cni.networking.k8s.io` Group. This API will be providing the CNI configuration along with necessary parameters and optional fields for invoking CNI plugins. The configuration data specified in the `opaque.parameters` will be reflected in the ResourceClaim status by Kubernetes. This reported status will provide the necessary details for executing CNI operations such as `ADD` on pod creation and `DEL` on pod deletion which will enable seamless lifecycle management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it sounds more like CNIConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me CNI config is the "JSON-serialized plugin configuration object", there is also the "Required environment parameters". That's why I called this CNI
, but a better name is welcome.
https://www.cni.dev/docs/spec/#add-add-container-to-network-or-apply-modifications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a reader of the Kind: CNIConfig I would understand what its purpose is. CNIConfig is a good suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I renamed the CNI
struct to CNIConfig
.
// IfName represents the name of the network interface requested. | ||
IfName string `json:"ifName"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I miss more detail about the entire workflow, let's see, there is a driver that exposes Devices in a ResourceSlice
with some Attributes
and Quantities
, what is the driver going to expose there?
Then there is an user that is going to choose to schedule a Pod using a Claim that references one of these Devices ... and he can use a DeviceClass
and/or ResourceClaim
to pass configuration ... that I assume is this object , how do the driver merge the configs from Class and Claim?
how do the driver executes against the exposed Device? this seems to create an entire new device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have to investigate this part more in details and I wanted to try to address those in another design document. I am not sure the current ResourceSlice API supports it, as, as you mentioned, a new interface is being created with this driver. Here are some ideas/example I had in mind to address with the scheduler part (some have been mentioned some times ago during the multi-network meeting):
- Interface Availability: Macvlan is requested, the pod must be scheduled on a node where the master interface exist.
- IP Pool Exhaustion: The pod must be scheduled on a node where the IPAM can provide IPs (let's say a subnet is provided per node).
- CNI Availabitity: The CNI must exist on the node.
- Bandwidth limits: A pod must be scheduled on a node where the master interface/interface provided will fulfills the bandwidth requirement.
Otherwise for the DeviceClass
, I don't really have any configuration idea. This will be provided by this project (cni-dra-driver) right? Only the ResourceClaim
will be provided by the user himself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's start from the use cases then and walk through tehm
Interface Availability: Macvlan is requested, the pod must be scheduled on a node where the master interface exist.
ok , so we need to expose "master interfaces" and users need to be able to "share" this master interface
Talked with @johnbelamaric the other day about this, the simplest solution as today seems to add some kind of quota in the driver, say "interface X has 50 macvlan type subinterfaces", and users can request them and the driver can track them
IP Pool Exhaustion: The pod must be scheduled on a node where the IPAM can provide IPs (let's say a subnet is provided per node).
is this the primary IPAM or secondary ipams, we need to model this in terms of resources, is the resources an IPPool?
CNI Availabitity: The CNI must exist on the node.
can you be more specific on what cni availability means? if in the example of macvlan we mean a macvlan cni plugin, then that is determined by the resources exposed, you expose macvlan subinterfaces in the driver only if you have that capability
Bandwidth limits: A pod must be scheduled on a node where the master interface/interface provided will fulfills the bandwidth requirement.
IIUIC this means that the driver need to expose the total bandwith of the uplink and each pod in the node must claim some bandwidth , no? this only works if the pods claim bandwith so the driver can track it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok , so we need to expose "master interfaces" and users need to be able to "share" this master interface
Talked with @johnbelamaric the other day about this, the simplest solution as today seems to add some kind of quota in the driver, say "interface X has 50 macvlan type subinterfaces", and users can request them and the driver can track them
thank you for checking this. This seems fine to me, I believe that would be the first step for the scheduling part.
is this the primary IPAM or secondary ipams, we need to model this in terms of resources, is the resources an IPPool?
I was referreing to any IPAM (primary, secondary, doesn't really matter). I am not sure this could be solved with this driver. Maybe the IPAMs could publish and model the resources by themselves somehow?
can you be more specific on what cni availability means? if in the example of macvlan we mean a macvlan cni plugin, then that is determined by the resources exposed, you expose macvlan subinterfaces in the driver only if you have that capability
I meant the CNI binary itself exists on the node.
IIUIC this means that the driver need to expose the total bandwith of the uplink and each pod in the node must claim some bandwidth , no? this only works if the pods claim bandwith so the driver can track it
Right, this is correct. Probably if a pod use specific bandwidth of an interface, then only pod requesting specific bandwidth could be scheduled on that node?
These are some ideas to solve with the scheduling part, there is also the VLAN/VxLAN IDs, a pod requesting a VLAN with ID 2000 cannot be scheduled if that VLAN is already in use on that node (with the same master interface).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Please allow me to join this discussion. I'm also interested in this project especially to support shared network device. However, I don't think this discussion should block this PR.
I have checked the latest code of Kubernetes's DRA allocator. There is no reduction of allocated amount from the device. They are used only for matching in my understanding. However, I also think it will be useful for some other cases as well to have the features to deduct and remain the available capacity from the attribute and allow the device to share.
Code pointers:
- allocator https://github.com/kubernetes/kubernetes/blob/9709d36dda75ff82d1a8121edb23568a1d65ab82/staging/src/k8s.io/dynamic-resource-allocation/structured/allocator.go#L107
- deviceMatches
https://github.com/kubernetes/kubernetes/blob/9709d36dda75ff82d1a8121edb23568a1d65ab82/staging/src/k8s.io/dynamic-resource-allocation/cel/compile.go#L196 - allocatedDevices (insert without considering partial consumed quantity)
https://github.com/kubernetes/kubernetes/blob/9709d36dda75ff82d1a8121edb23568a1d65ab82/test/integration/scheduler_perf/dra.go#L320
I do some PoC by allowing to store latest claim status and merging the results to have macvlan work with as-is idea of shared resource claim (as in the gpu example). The merged results I have from this example for now are:
devices:
- conditions: null
data:
results:
- cniVersion: 1.0.0
interfaces:
- mac: 66:19:a8:eb:09:ce
name: net1
sandbox: /var/run/netns/cni-4f0a3a8f-f68d-a71b-82de-7e4c166f4501
ips:
- address: 10.10.1.55/24
gateway: 10.10.1.1
interface: 0
- cniVersion: 1.0.0
interfaces:
- mac: 96:44:e9:f3:fb:99
name: net1
sandbox: /var/run/netns/cni-acb0eed4-39d8-e525-f722-89c310f4b222
ips:
- address: 10.10.1.56/24
gateway: 10.10.1.1
interface: 0
device: cni
driver: poc.dra.networking
networkData:
hardwareAddress: 66:19:a8:eb:09:ce,96:44:e9:f3:fb:99
interfaceName: net1
ips:
- 10.10.1.55/24
- 10.10.1.56/24
pool: kind-worker
reservedFor:
- name: demo-b-1
resource: pods
uid: 25fc07d1-e28a-4fb7-b99b-9f0c475c90d0
- name: demo-b-2
resource: pods
uid: 07b785a9-c5f6-4b70-b310-e10d32b1b219
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no blocking at ll, what I'm trying to mean is that to design an API we need to understand the use cases that we want to support, let's pick 1 or 2 to start with , the ones you are sure about it, and then we walk through the use cases and represent how the user is expected to use them ... is like writing an e2e test before coding.
I think we have somehow more or less clear how the claim config will look like and its output, but we need to define before how the driver is going to expose the things and how the user will claim them, those are fundamental pieces at least I fail to see here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea Thank you for your explanation. I somehow see the point.
To have a picture of result from the defined CNI configuration, how's about adding an example result of configured devices inside the Pod (from macvlan+host-local in the config example)? Or having an illustration how the driver pass the configure from claim config to the CNI.
About exposing the network device, I also see that the document still miss the definitions of the resource naming and attributes that will be published to the resource slice. I would like to suggest to also include host device's common attribute such as pci address, vendor, product + ip address. I have exposed and used these information in muti-nic chi project's interface discovery function to cover configure macvlan, ipvlan, and sriov networks.
For host-device CNI plugin, I think we are good for the current design. User/admin can use resource claim or resource claim template for that as described in the doc. We may need to add example of this use case.
For shared device like macvlan, ipvlan, the pod-end device is created on the fly and I don't think we can have multiple resource claim linked to the same device for current version. In my understanding, we need pre-defined resource claim (per each master) and have user pods linked to it.
Is the goal of this document to design something worked with current version or for the future of DRA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document defines the configuration API. My goal is to have a separated document for the allocation API (scheduling).
In that new document (allocation API) we could define each use cases (virtual devices, macvlan, vlan, bandwidth...) and start with the most simple ones for now (1 or 2 would be good as @aojea mentioned, macvlan and host-device seems fine with me). More advanced use cases might require DRA changes/updates/KEP, but we will see at that time I guess.
apis/v1alpha1/types.go
Outdated
) | ||
|
||
// +genclient | ||
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the generated code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully sure yet. Just the example DRA drivers were doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those were based on the CRDs and the legacy/classic DRA, we may not need that complexity
Signed-off-by: Lionel Jouin <[email protected]>
Signed-off-by: Lionel Jouin <[email protected]>
|
||
## Design | ||
|
||
A new API with the Kind `CNI` will be introduced under the `cni.networking.x-k8s.io` Group. This API will be providing the CNI configuration along with necessary parameters and optional fields for invoking CNI plugins. The configuration data specified in the `opaque.parameters` will be reflected in the ResourceClaim status by Kubernetes. This reported status will provide the necessary details for executing CNI operations such as `ADD` on pod creation and `DEL` on pod deletion which will enable seamless lifecycle management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing I figured I would ask while I'm here: is/how-is the version of the underlying CNI itself being handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe since it'll have basically a yaml representation of a CNI config JSON, it'll have the version information there for when it's exec'd. (Definitely correct me if I'm wrong there Lionel)
As each ResourceClaim can be claimed by at most one pod, a ResourceClaimTemplate can be used to link the same ResourceClaim configuration to multiple pods. | ||
|
||
To support scenarios where multiple network interfaces are required, a ResourceClaim can contain one or multiple requests and a pod can reference one or multiple ResourceClaims. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding to this section something about the roles of users which are expected to use or be affected by this API and its consequences? Could it be valuable to explicitly spell out who the users/roles are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You refer to Cluster operator, Application developers... right?
Maybe it could be useful yes. Maybe we should have another document about that? Or maybe this should be part of the DRA documentation instead?
From what I understand, ResourceClaims are deployed mainly by developers, and Drivers are deployed by operators.
|
||
The design aims to support all possible functionalities the Container Network Interface (CNI) project offers while ensuring compliance with its specifications. | ||
|
||
## Design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this design introduces a whole new API, do you think it would be valuable to add a ## Motivation
section here for future readers to better understand the "why"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, I added the motivation section.
* Add Motivation section for the device configuration API * Rename CNI struct to CNIConfig in the API * Remove generated code and codegen * Add note for the CNI config validation Signed-off-by: Lionel Jouin <[email protected]>
The document to review is
docs/design/device-configuration-api.md
.I took the
hack/update-codegen.sh
code from the Gateway-API project and adapted it to this project.This is the initial API, it will be extended later. For now we can use it to invoke the CNI ADD and DEL.