From f8840a6be39acaa7257a762b5bd34d01b62134ce Mon Sep 17 00:00:00 2001
From: kubevirt-bot Node balancing with DeschedulerThis annotation will cause, that the descheduler will be able to evict the VM's pod which can then be
scheduled by scheduler on different nodes. A VirtualMachine will never restart or re-create a
VirtualMachineInstance until the current instance of the VirtualMachineInstance is deleted from the cluster.
When the VM rollout strategy is set to LiveUpdate
, changes to a VM's
+node selector or affinities will dynamically propagate to the VMI (unless the RestartRequired
condition is set).
+Changes to tolerations will not dynamically propagate, and will trigger a RestartRequired
condition if changed on a
+running VM.
Modifications of the node selector / affinities will only take effect on next migration, the change +alone will not trigger one.
diff --git a/search/search_index.json b/search/search_index.json index 6ac1512e..6d9f0760 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-,:!=\\[\\]\\(\\)\"/]+|\\.(?!\\d)","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome","text":"The KubeVirt User Guide is divided into the following sections:
Kubevirt on Killercoda: https://killercoda.com/kubevirt
Kubevirt on Minikube: https://kubevirt.io/quickstart_minikube/
Kubevirt on Kind: https://kubevirt.io/quickstart_kind/
Kubevirt on cloud providers: https://kubevirt.io/quickstart_cloud/
Use KubeVirt
Experiment with Containerized Data Importer (CDI)
Experiment with KubeVirt Upgrades
Live Migration
File a bug: https://github.com/kubevirt/kubevirt/issues
Mailing list: https://groups.google.com/forum/#!forum/kubevirt-dev
Slack: https://kubernetes.slack.com/messages/virtualization
Start contributing: Contributing
API Reference: http://kubevirt.io/api-reference/
Check our privacy policy at: https://kubevirt.io/privacy/
We do use https://netlify.com Open Source Plan for Rendering Pull Requests to the documentation repository
KubeVirt is built using a service oriented architecture and a choreography pattern.
"},{"location":"architecture/#stack","title":"Stack","text":" +---------------------+\n | KubeVirt |\n~~+---------------------+~~\n | Orchestration (K8s) |\n +---------------------+\n | Scheduling (K8s) |\n +---------------------+\n | Container Runtime |\n~~+---------------------+~~\n | Operating System |\n +---------------------+\n | Virtual(kvm) |\n~~+---------------------+~~\n | Physical |\n +---------------------+\n
Users requiring virtualization services are speaking to the Virtualization API (see below) which in turn is speaking to the Kubernetes cluster to schedule requested Virtual Machine Instances (VMIs). Scheduling, networking, and storage are all delegated to Kubernetes, while KubeVirt provides the virtualization functionality.
"},{"location":"architecture/#additional-services","title":"Additional Services","text":"KubeVirt provides additional functionality to your Kubernetes cluster, to perform virtual machine management
If we recall how Kubernetes is handling Pods, then we remember that Pods are created by posting a Pod specification to the Kubernetes API Server. This specification is then transformed into an object inside the API Server, this object is of a specific type or kind - that is how it's called in the specification. A Pod is of the type Pod
. Controllers within Kubernetes know how to handle these Pod objects. Thus once a new Pod object is seen, those controllers perform the necessary actions to bring the Pod alive, and to match the required state.
This same mechanism is used by KubeVirt. Thus KubeVirt delivers three things to provide the new functionality:
Once all three steps have been completed, you are able to
virt-handler
- is taking care of a host - alongside the kubelet
- to launch the VMI and configure it until it matches the required state.One final note; both controllers and daemons are running as Pods (or similar) on top of the Kubernetes cluster, and are not installed alongside it. The type is - as said before - even defined inside the Kubernetes API server. This allows users to speak to Kubernetes, but modify VMIs.
The following diagram illustrates how the additional controllers and daemons communicate with Kubernetes and where the additional types are stored:
And a simplified version:
"},{"location":"architecture/#application-layout","title":"Application Layout","text":"VirtualMachineInstance (VMI) is the custom resource that represents the basic ephemeral building block of an instance. In a lot of cases this object won't be created directly by the user but by a high level resource. High level resources for VMI can be:
KubeVirt is deployed on top of a Kubernetes cluster. This means that you can continue to run your Kubernetes-native workloads next to the VMIs managed through KubeVirt.
Furthermore: if you can run native workloads, and you have KubeVirt installed, you should be able to run VM-based workloads, too. For example, Application Operators should not require additional permissions to use cluster features for VMs, compared to using that feature with a plain Pod.
Security-wise, installing and using KubeVirt must not grant users any permission they do not already have regarding native workloads. For example, a non-privileged Application Operator must never gain access to a privileged Pod by using a KubeVirt feature.
"},{"location":"architecture/#the-razor","title":"The Razor","text":"We love virtual machines, think that they are very important and work hard to make them easy to use in Kubernetes. But even more than VMs, we love good design and modular, reusable components. Quite frequently, we face a dilemma: should we solve a problem in KubeVirt in a way that is best optimized for VMs, or should we take a longer path and introduce the solution to Pod-based workloads too?
To decide these dilemmas we came up with the KubeVirt Razor: \"If something is useful for Pods, we should not implement it only for VMs\".
For example, we debated how we should connect VMs to external network resources. The quickest way seems to introduce KubeVirt-specific code, attaching a VM to a host bridge. However, we chose the longer path of integrating with Multus and CNI and improving them.
"},{"location":"architecture/#virtualmachine","title":"VirtualMachine","text":"A VirtualMachine
provides additional management capabilities to a VirtualMachineInstance inside the cluster. That includes:
API stability
Start/stop/restart capabilities on the controller level
Offline configuration change with propagation on VirtualMachineInstance recreation
Ensure that the VirtualMachineInstance is running if it should be running
It focuses on a 1:1 relationship between the controller instance and a virtual machine instance. In many ways it is very similar to a StatefulSet with spec.replica
set to 1
.
A VirtualMachine will make sure that a VirtualMachineInstance object with an identical name will be present in the cluster, if spec.running
is set to true
. Further it will make sure that a VirtualMachineInstance will be removed from the cluster if spec.running
is set to false
.
There exists a field spec.runStrategy
which can also be used to control the state of the associated VirtualMachineInstance object. To avoid confusing and contradictory states, these fields are mutually exclusive.
An extended explanation of spec.runStrategy
vs spec.running
can be found in Run Strategies
After creating a VirtualMachine it can be switched on or off like this:
# Start the virtual machine:\nvirtctl start vm\n\n# Stop the virtual machine:\nvirtctl stop vm\n
kubectl
can be used too:
# Start the virtual machine:\nkubectl patch virtualmachine vm --type merge -p \\\n '{\"spec\":{\"running\":true}}'\n\n# Stop the virtual machine:\nkubectl patch virtualmachine vm --type merge -p \\\n '{\"spec\":{\"running\":false}}'\n
Find more details about a VM's life-cycle in the relevant section
"},{"location":"architecture/#controller-status","title":"Controller status","text":"Once a VirtualMachineInstance is created, its state will be tracked via status.created
and status.ready
fields of the VirtualMachine. If a VirtualMachineInstance exists in the cluster, status.created
will equal true
. If the VirtualMachineInstance is also ready, status.ready
will equal true
too.
If a VirtualMachineInstance reaches a final state but the spec.running
equals true
, the VirtualMachine controller will set status.ready
to false
and re-create the VirtualMachineInstance.
Additionally, the status.printableStatus
field provides high-level summary information about the state of the VirtualMachine. This information is also displayed when listing VirtualMachines using the CLI:
$ kubectl get virtualmachines\nNAME AGE STATUS VOLUME\nvm1 4m Running\nvm2 11s Stopped\n
Here's the list of states currently supported and their meanings. Note that states may be added/removed in future releases, so caution should be used if consumed by automated programs.
A VirtualMachineInstance restart can be triggered by deleting the VirtualMachineInstance. This will also propagate configuration changes from the template in the VirtualMachine:
# Restart the virtual machine (you delete the instance!):\nkubectl delete virtualmachineinstance vm\n
To restart a VirtualMachine named vm using virtctl:
$ virtctl restart vm\n
This would perform a normal restart for the VirtualMachineInstance and would reschedule the VirtualMachineInstance on a new virt-launcher Pod
To force restart a VirtualMachine named vm using virtctl:
$ virtctl restart vm --force --grace-period=0\n
This would try to perform a normal restart, and would also delete the virt-launcher Pod of the VirtualMachineInstance with setting GracePeriodSeconds to the seconds passed in the command.
Currently, only setting grace-period=0 is supported.
Note
Force restart can cause data corruption, and should be used in cases of kernel panic or VirtualMachine being unresponsive to normal restarts.
"},{"location":"architecture/#fencing-considerations","title":"Fencing considerations","text":"A VirtualMachine will never restart or re-create a VirtualMachineInstance until the current instance of the VirtualMachineInstance is deleted from the cluster.
"},{"location":"architecture/#exposing-as-a-service","title":"Exposing as a Service","text":"A VirtualMachine can be exposed as a service. The actual service will be available once the VirtualMachineInstance starts without additional interaction.
For example, exposing SSH port (22) as a ClusterIP
service using virtctl
after the VirtualMachine was created, but before it started:
$ virtctl expose virtualmachine vmi-ephemeral --name vmiservice --port 27017 --target-port 22\n
All service exposure options that apply to a VirtualMachineInstance apply to a VirtualMachine.
See Service Objects for more details.
"},{"location":"architecture/#when-to-use-a-virtualmachine","title":"When to use a VirtualMachine","text":""},{"location":"architecture/#when-api-stability-is-required-between-restarts","title":"When API stability is required between restarts","text":"A VirtualMachine
makes sure that VirtualMachineInstance API configurations are consistent between restarts. A classical example are licenses which are bound to the firmware UUID of a virtual machine. The VirtualMachine
makes sure that the UUID will always stay the same without the user having to take care of it.
One of the main benefits is that a user can still make use of defaulting logic, although a stable API is needed.
"},{"location":"architecture/#when-config-updates-should-be-picked-up-on-the-next-restart","title":"When config updates should be picked up on the next restart","text":"If the VirtualMachineInstance configuration should be modifiable inside the cluster and these changes should be picked up on the next VirtualMachineInstance restart. This means that no hotplug is involved.
"},{"location":"architecture/#when-you-want-to-let-the-cluster-manage-your-individual-virtualmachineinstance","title":"When you want to let the cluster manage your individual VirtualMachineInstance","text":"Kubernetes as a declarative system can help you to manage the VirtualMachineInstance. You tell it that you want this VirtualMachineInstance with your application running, the VirtualMachine will try to make sure it stays running.
Note
The current belief is that if it is defined that the VirtualMachineInstance should be running, it should be running. This is different from many classical virtualization platforms, where VMs stay down if they were switched off. Restart policies may be added if needed. Please provide your use-case if you need this!
"},{"location":"architecture/#example","title":"Example","text":"apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-cirros\n name: vm-cirros\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-cirros\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - cloudInitNoCloud:\n userDataBase64: IyEvYmluL3NoCgplY2hvICdwcmludGVkIGZyb20gY2xvdWQtaW5pdCB1c2VyZGF0YScK\n name: cloudinitdisk\n
Saving this manifest into vm.yaml
and submitting it to Kubernetes will create the controller instance:
$ kubectl create -f vm.yaml\nvirtualmachine \"vm-cirros\" created\n
Since spec.running
is set to false
, no vmi will be created:
$ kubectl get vmis\nNo resources found.\n
Let's start the VirtualMachine:
$ virtctl start vm vm-cirros\n
As expected, a VirtualMachineInstance called vm-cirros
got created:
$ kubectl describe vm vm-cirros\nName: vm-cirros\nNamespace: default\nLabels: kubevirt.io/vm=vm-cirros\nAnnotations: <none>\nAPI Version: kubevirt.io/v1\nKind: VirtualMachine\nMetadata:\n Cluster Name:\n Creation Timestamp: 2018-04-30T09:25:08Z\n Generation: 0\n Resource Version: 6418\n Self Link: /apis/kubevirt.io/v1/namespaces/default/virtualmachines/vm-cirros\n UID: 60043358-4c58-11e8-8653-525500d15501\nSpec:\n Running: true\n Template:\n Metadata:\n Creation Timestamp: <nil>\n Labels:\n Kubevirt . Io / Ovmi: vm-cirros\n Spec:\n Domain:\n Devices:\n Disks:\n Disk:\n Bus: virtio\n Name: containerdisk\n Volume Name: containerdisk\n Disk:\n Bus: virtio\n Name: cloudinitdisk\n Volume Name: cloudinitdisk\n Machine:\n Type:\n Resources:\n Requests:\n Memory: 64M\n Termination Grace Period Seconds: 0\n Volumes:\n Name: containerdisk\n Registry Disk:\n Image: kubevirt/cirros-registry-disk-demo:latest\n Cloud Init No Cloud:\n User Data Base 64: IyEvYmluL3NoCgplY2hvICdwcmludGVkIGZyb20gY2xvdWQtaW5pdCB1c2VyZGF0YScK\n Name: cloudinitdisk\nStatus:\n Created: true\n Ready: true\nEvents:\n Type Reason Age From Message\n ---- ------ ---- ---- -------\n Normal SuccessfulCreate 15s virtualmachine-controller Created virtual machine: vm-cirros\n
"},{"location":"architecture/#kubectl-commandline-interactions","title":"Kubectl commandline interactions","text":"Whenever you want to manipulate the VirtualMachine through the commandline you can use the kubectl command. The following are examples demonstrating how to do it.
# Define a virtual machine:\n kubectl create -f vm.yaml\n\n # Start the virtual machine:\n kubectl patch virtualmachine vm --type merge -p \\\n '{\"spec\":{\"running\":true}}'\n\n # Look at virtual machine status and associated events:\n kubectl describe virtualmachine vm\n\n # Look at the now created virtual machine instance status and associated events:\n kubectl describe virtualmachineinstance vm\n\n # Stop the virtual machine instance:\n kubectl patch virtualmachine vm --type merge -p \\\n '{\"spec\":{\"running\":false}}'\n\n # Restart the virtual machine (you delete the instance!):\n kubectl delete virtualmachineinstance vm\n\n # Implicit cascade delete (first deletes the virtual machine and then the virtual machine instance)\n kubectl delete virtualmachine vm\n\n # Explicit cascade delete (first deletes the virtual machine and then the virtual machine instance)\n kubectl delete virtualmachine vm --cascade=true\n\n # Orphan delete (The running virtual machine is only detached, not deleted)\n # Recreating the virtual machine would lead to the adoption of the virtual machine instance\n kubectl delete virtualmachine vm --cascade=false\n
"},{"location":"contributing/","title":"Contributing","text":"Welcome!! And thank you for taking the first step to contributing to the KubeVirt project. On this page you should be able to find all the information required to get started on your contirbution journey, as well as information on how to become a community member and grow into roles of responsibility.
If you think something might be missing from this page, please help us by raising a bug!
"},{"location":"contributing/#prerequisites","title":"Prerequisites","text":"Reviewing the following will prepare you for contributing:
For code contributors:
The following will help you decide where to start:
good-first-issue
for issues that make good entry points.You should familiarize yourself with the following documents, which are critical to being a member of the community:
Killercoda provides an interactive environment for exploring KubeVirt scenarios:
Guides for deploying KubeVirt with different Kubernetes tools:
KubeVirt on minikube
KubeVirt on kind
KubeVirt on cloud providers
Released on: Tue Mar 05 2024
KubeVirt v1.2 is built for Kubernetes v1.29 and additionally supported for the previous two versions. See the KubeVirt support matrix for more information.
"},{"location":"release_notes/#api-change","title":"API change","text":"Status.GuestOSInfo.Version
vmRolloutStrategy
setting to define whether changes to VMs should either be always staged or live-updated when possible.kubevirt.io:default
clusterRole to get,list kubevirtsMachine
Released on: Tue Nov 07 2023
"},{"location":"release_notes/#api-change_1","title":"API change","text":"common-instancetypes
resources can now deployed by virt-operator
using the CommonInstancetypesDeploymentGate
feature gate.spec.config.machineType
in KubeVirt CR.ControllerRevisions
containing instancetype.kubevirt.io
CRDs
are now decorated with labels detailing specific metadata of the underlying stashed objectnodeSelector
and schedulerName
fields have been added to VirtualMachineInstancetype spec.virtctl create clone
marshalling and replacement of kubectl
with kubectl virt
AutoResourceLimits
FeatureGate is enabledkubevirt.io/schedulable
label when finding lowest TSC frequency on the clusterquay.io/kubevirt/network-slirp-binding:20230830_638c60fc8
. On next release (v1.2.0) no default image will be set and registering an image would be mandatory.list
and watch
verbs from virt-controller's RBACinstancetype.kubevirt.io:view
ClusterRole
has been introduced that can be bound to users via a ClusterRoleBinding
to provide read only access to the cluster scoped VirtualMachineCluster{Instancetype,Preference}
resources.kubevirt_vmi_*_usage_seconds
from Gauge to Counterkubevirt_vmi_vcpu_delay_seconds_total
reporting amount of seconds VM spent in waiting in the queue instead of running.kubevirt_vmi_cpu_affinity
and use sum as valuekubevirt_vmi_phase_count
not being createdReleased on: Thu Jul 11 17:39:42 2023 +0000
"},{"location":"release_notes/#api-changes","title":"API changes","text":"podConfigDone
field in favor of a new source option in infoSource
.Name
of a {Instancetype,Preference}Matcher
without also updating the RevisionName
are now rejected.dedicatedCPUPlacement
attribute is once again supported within the VirtualMachineInstancetype
and VirtualMachineClusterInstancetype
CRDs after a recent bugfix improved VirtualMachine
validations, ensuring defaults are applied before any attempt to validate.RUNBOOK_URL_TEMPLATE
for the runbooks URL templateReleased on: Wed Mar 1 16:49:27 2023 +0000
dedicatedCPUPlacement
attribute is once again supported within the VirtualMachineInstancetype
and VirtualMachineClusterInstancetype
CRDs after a recent bugfix improved VirtualMachine
validations, ensuring defaults are applied before any attempt to validate./dev/vhost-vsock
explicitly to ensure that the right vsock module is loadedinferFromVolume
now uses labels instead of annotations to lookup default instance type and preference details from a referenced Volume
. This has changed in order to provide users with a way of looking up suitably decorated resources through these labels before pointing to them within the VirtualMachine
.inferFromVolume
attributes have been introduced to the {Instancetype,Preference}Matchers
of a VirtualMachine
. When provided the Volume
referenced by the attribute is checked for the following annotations with which to populate the {Instancetype,Preference}Matchers
:kubevirt-prometheus-metrics
now sets ClusterIP
to None
to make it a headless service.Timer
is now correctly omitted from Clock
fixing bug #8844.virtqemud
daemon instead of libvirtd
Released on: Thu Feb 11 00:08:46 2023 +0000
Released on: Thu Oct 13 00:24:51 2022 +0000
tlsConfiguration
to Kubevirt ConfigurationDockerSELinuxMCSWorkaround
feature gate before upgradingReleased on: Mon Sep 12 14:00:44 2022 +0000
AutoattachInputDevice
has been added to Devices
allowing an Input
device to be automatically attached to a VirtualMachine
on start up. PreferredAutoattachInputDevice
has also been added to DevicePreferences
allowing users to control this behaviour with a set of preferences.Released on: Thu Aug 18 20:10:29 2022 +0000
VirtualMachine{Flavor,ClusterFlavor}
are renamed to instancetype and VirtualMachine{Instancetype,ClusterInstancetype}
.virtctl expose
ip-family
parameter to be empty value instead of IPv4.VirtualMachine
defines any CPU
or Memory
resource requests.Released on: Thu Jul 14 16:33:25 2022 +0000
ControllerRevisions
of any VirtualMachineFlavorSpec
or VirtualMachinePreferenceSpec
are stored during the initial start of a VirtualMachine
and used for subsequent restarts ensuring changes to the original VirtualMachineFlavor
or VirtualMachinePreference
do not modify the VirtualMachine
and the VirtualMachineInstance
it creates.make generate
to fail when API code comments contain backticks. (#7844, @janeczku)VirtualMachineInstance
at runtime.Released on: Wed Jun 8 14:15:43 2022 +0000
nil
values) of Address
and Driver
fields in XML will be omitted.virtualmachines/migrate
subresource to admin/edit usersDisk
or Filesystem
for each Volume
associated with a VirtualMachine
has been removed. Any Volumes
without a Disk
or Filesystem
defined will have a Disk
defined within the VirtualMachineInstance
at runtime.Released on: Tue May 17 14:55:54 2022 +0000
Released on: Mon May 9 14:02:20 2022 +0000
virtctl scp
to ease copying files from and to VMs and VMIsLiveMigrate
as a workload-update strategy if the LiveMigration
feature gate is not enabled.virtctl ssh
Released on: Fri Apr 8 16:17:56 2022 +0000
KubeVirtComponentExceedsRequestedMemory
alert complaining about many-to-many matching not allowed.--address [ip_address]
when using virtctl vnc
rather than only using 127.0.0.1kubectl logs <vmi-pod>
and kubectl exec <vmi-pod>
.Released on: Tue Mar 8 21:06:59 2022 +0000
Released on: Wed Feb 9 18:01:08 2022 +0000
time.Ticker
in agent poller and fix default values for qemu-*-interval
flagsmigrate_cancel
was added to virtctl. It cancels an active VM migration.Released on: Tue Jan 11 17:27:09 2022 +0000
virtctl
exposed services IPFamilyPolicyType
default to IPFamilyPolicyPreferDualStack
make
and make test
Released on: Wed Dec 15 15:11:55 2021 +0000
Released on: Mon Dec 6 18:26:51 2021 +0000
Released on: Thu Nov 11 15:52:59 2021 +0000
Released on: Tue Oct 19 15:41:10 2021 +0000
Released on: Fri Oct 8 21:12:33 2021 +0000
ssh
command to virtctl
that can be used to open SSH sessions to VMs/VMIs.Released on: Tue Oct 19 15:39:42 2021 +0000
Released on: Wed Sep 8 13:56:47 2021 +0000
Released on: Tue Oct 19 15:38:22 2021 +0000
Released on: Thu Oct 7 12:55:34 2021 +0000
Released on: Thu Aug 12 12:28:02 2021 +0000
Released on: Mon Aug 9 14:20:14 2021 +0000
/portforward
subresource to VirtualMachine
and VirtualMachineInstance
that can tunnel TCP traffic through the API Server using a websocket stream.guestfs
to virtctl--force --gracePeriod 0
Released on: Tue Oct 19 15:36:32 2021 +0000
Released on: Fri Jul 9 15:46:22 2021 +0000
spec.migrations.disableTLS
to the KubeVirt CR to allow disabling encrypted migrations. They stay secure by default.LifeMigrate
and request the invtsc
cpuflag are now live-migrateablemain
for kubevirt/kubevirt
repositoryNotReady
after migration when Istio is used.virtctl start --paused
Released on: Tue Oct 19 15:34:37 2021 +0000
Released on: Thu Jun 10 01:31:52 2021 +0000
Released on: Tue Jun 8 12:09:49 2021 +0000
Released on: Tue Oct 19 15:31:59 2021 +0000
Released on: Thu Aug 12 16:35:43 2021 +0000
--force --gracePeriod 0
Released on: Wed Jul 28 12:13:19 2021 -0400
"},{"location":"release_notes/#v0411","title":"v0.41.1","text":"Released on: Wed Jul 28 12:08:42 2021 -0400
"},{"location":"release_notes/#v0410","title":"v0.41.0","text":"Released on: Wed May 12 14:30:49 2021 +0000
docker save
and docker push
issues with released kubevirt imagesvmIPv6NetworkCIDR
under NetworkSource.pod
to support custom IPv6 CIDR for the vm network when using masquerade binding.Released on: Tue Oct 19 13:33:33 2021 +0000
docker save
issues with kubevirt imagesReleased on: Mon Apr 19 12:25:41 2021 +0000
permittedHostDevices
section will now remove all user-defined host device plugins.Released on: Tue Oct 19 13:29:33 2021 +0000
docker save
issues with kubevirt imagesReleased on: Tue Apr 13 12:10:13 2021 +0000
"},{"location":"release_notes/#v0390","title":"v0.39.0","text":"Released on: Wed Mar 10 14:51:58 2021 +0000
CHECK
RPC call, will not cause VMI pods to enter a failed state.Released on: Tue Oct 19 13:24:57 2021 +0000
Released on: Mon Feb 8 19:00:24 2021 +0000
Released on: Mon Feb 8 13:15:32 2021 +0000
Released on: Wed Jan 27 17:49:36 2021 +0000
Released on: Thu Jan 21 16:20:52 2021 +0000
Released on: Mon Jan 18 17:57:03 2021 +0000
Released on: Mon Feb 22 10:20:40 2021 -0500
"},{"location":"release_notes/#v0361","title":"v0.36.1","text":"Released on: Tue Jan 19 12:30:33 2021 +0100
"},{"location":"release_notes/#v0360","title":"v0.36.0","text":"Released on: Wed Dec 16 14:30:37 2020 +0000
domain
label removed from metric kubevirt_vmi_memory_unused_bytes
Released on: Mon Nov 9 13:08:27 2020 +0000
ip-family
to the virtctl expose
command.virt-launcher
Pods to speed up Pod instantiation and decrease Kubelet load in namespaces with many services.kubectl explain
for Kubevirt resources.Released on: Tue Nov 17 08:13:22 2020 -0500
"},{"location":"release_notes/#v0341","title":"v0.34.1","text":"Released on: Mon Nov 16 08:22:56 2020 -0500
"},{"location":"release_notes/#v0340","title":"v0.34.0","text":"Released on: Wed Oct 7 13:59:50 2020 +0300
bootOrder
will no longer be candidates for boot when using the BIOS bootloader, as documentedconfiguration
key. The usage of the kubevirt-config configMap will be deprecated in the future.customizeComponents
to the kubevirt apiReleased on: Tue Sep 15 14:46:00 2020 +0000
Released on: Tue Aug 11 19:21:56 2020 +0000
Released on: Thu Jul 9 16:08:18 2020 +0300
Released on: Mon Oct 26 11:57:21 2020 -0400
"},{"location":"release_notes/#v0306","title":"v0.30.6","text":"Released on: Wed Aug 12 10:55:31 2020 +0200
"},{"location":"release_notes/#v0305","title":"v0.30.5","text":"Released on: Fri Jul 17 05:26:37 2020 -0400
"},{"location":"release_notes/#v0304","title":"v0.30.4","text":"Released on: Fri Jul 10 07:44:00 2020 -0400
"},{"location":"release_notes/#v0303","title":"v0.30.3","text":"Released on: Tue Jun 30 17:39:42 2020 -0400
"},{"location":"release_notes/#v0302","title":"v0.30.2","text":"Released on: Thu Jun 25 17:05:59 2020 -0400
"},{"location":"release_notes/#v0301","title":"v0.30.1","text":"Released on: Tue Jun 16 13:10:17 2020 -0400
"},{"location":"release_notes/#v0300","title":"v0.30.0","text":"Released on: Fri Jun 5 12:19:57 2020 +0200
Released on: Mon May 25 21:15:30 2020 +0200
"},{"location":"release_notes/#v0291","title":"v0.29.1","text":"Released on: Tue May 19 10:03:27 2020 +0200
"},{"location":"release_notes/#v0290","title":"v0.29.0","text":"Released on: Wed May 6 15:01:57 2020 +0200
Released on: Thu Apr 9 23:01:29 2020 +0200
Released on: Fri Mar 6 22:40:34 2020 +0100
Released on: Tue Apr 14 15:07:04 2020 -0400
"},{"location":"release_notes/#v0264","title":"v0.26.4","text":"Released on: Mon Mar 30 03:43:48 2020 +0200
"},{"location":"release_notes/#v0263","title":"v0.26.3","text":"Released on: Tue Mar 10 08:57:27 2020 -0400
"},{"location":"release_notes/#v0262","title":"v0.26.2","text":"Released on: Tue Mar 3 12:31:56 2020 -0500
"},{"location":"release_notes/#v0261","title":"v0.26.1","text":"Released on: Fri Feb 14 20:42:46 2020 +0100
"},{"location":"release_notes/#v0260","title":"v0.26.0","text":"Released on: Fri Feb 7 09:40:07 2020 +0100
Released on: Mon Jan 13 20:37:15 2020 +0100
Released on: Tue Dec 3 15:34:34 2019 +0100
Released on: Tue Jan 21 13:17:20 2020 -0500
"},{"location":"release_notes/#v0232","title":"v0.23.2","text":"Released on: Fri Jan 10 10:36:36 2020 -0500
"},{"location":"release_notes/#v0231","title":"v0.23.1","text":"Released on: Thu Nov 28 09:36:41 2019 +0100
"},{"location":"release_notes/#v0230","title":"v0.23.0","text":"Released on: Mon Nov 4 16:42:54 2019 +0100
Released on: Thu Oct 10 18:55:08 2019 +0200
Released on: Mon Sep 9 09:59:08 2019 +0200
virtctl migrate
Released on: Thu Oct 3 12:03:40 2019 +0200
"},{"location":"release_notes/#v0207","title":"v0.20.7","text":"Released on: Fri Sep 27 15:21:56 2019 +0200
"},{"location":"release_notes/#v0206","title":"v0.20.6","text":"Released on: Wed Sep 11 06:09:47 2019 -0400
"},{"location":"release_notes/#v0205","title":"v0.20.5","text":"Released on: Thu Sep 5 17:48:59 2019 +0200
"},{"location":"release_notes/#v0204","title":"v0.20.4","text":"Released on: Mon Sep 2 18:55:35 2019 +0200
"},{"location":"release_notes/#v0203","title":"v0.20.3","text":"Released on: Tue Aug 27 16:58:15 2019 +0200
"},{"location":"release_notes/#v0202","title":"v0.20.2","text":"Released on: Tue Aug 20 15:51:07 2019 +0200
"},{"location":"release_notes/#v0201","title":"v0.20.1","text":"Released on: Fri Aug 9 19:48:17 2019 +0200
virtctl
by using the basename of the call, this enables nicer output when installed via krew plugin package managerkubevirt_vm_
to kubevirt_vmi_
to better reflect their purposeReleased on: Fri Aug 9 16:42:41 2019 +0200
virtctl
by using the basename of the call, this enables nicer output when installed via krew plugin package managerkubevirt_vm_
to kubevirt_vmi_
to better reflect their purposeReleased on: Fri Jul 5 12:52:16 2019 +0200
Released on: Thu Jun 13 12:00:56 2019 +0200
"},{"location":"release_notes/#v0180","title":"v0.18.0","text":"Released on: Wed Jun 5 22:25:09 2019 +0200
Released on: Tue Jun 25 07:49:12 2019 -0400
"},{"location":"release_notes/#v0173","title":"v0.17.3","text":"Released on: Wed Jun 19 12:00:45 2019 -0400
"},{"location":"release_notes/#v0172","title":"v0.17.2","text":"Released on: Wed Jun 5 08:12:04 2019 -0400
"},{"location":"release_notes/#v0171","title":"v0.17.1","text":"Released on: Tue Jun 4 14:41:10 2019 -0400
"},{"location":"release_notes/#v0170","title":"v0.17.0","text":"Released on: Mon May 6 16:18:01 2019 +0200
Released on: Thu May 2 23:51:08 2019 +0200
"},{"location":"release_notes/#v0162","title":"v0.16.2","text":"Released on: Fri Apr 26 12:24:33 2019 +0200
"},{"location":"release_notes/#v0161","title":"v0.16.1","text":"Released on: Tue Apr 23 19:31:19 2019 +0200
"},{"location":"release_notes/#v0160","title":"v0.16.0","text":"Released on: Fri Apr 5 23:18:22 2019 +0200
Released on: Tue Mar 5 10:35:08 2019 +0100
Released on: Mon Feb 4 22:04:14 2019 +0100
Released on: Mon Oct 28 17:02:35 2019 -0400
"},{"location":"release_notes/#v0136","title":"v0.13.6","text":"Released on: Wed Sep 25 17:19:44 2019 +0200
"},{"location":"release_notes/#v0135","title":"v0.13.5","text":"Released on: Thu Aug 1 11:25:00 2019 -0400
"},{"location":"release_notes/#v0134","title":"v0.13.4","text":"Released on: Thu Aug 1 09:52:35 2019 -0400
"},{"location":"release_notes/#v0133","title":"v0.13.3","text":"Released on: Mon Feb 4 15:46:48 2019 -0500
"},{"location":"release_notes/#v0132","title":"v0.13.2","text":"Released on: Thu Jan 24 23:24:06 2019 +0100
"},{"location":"release_notes/#v0131","title":"v0.13.1","text":"Released on: Thu Jan 24 11:16:20 2019 +0100
"},{"location":"release_notes/#v0130","title":"v0.13.0","text":"Released on: Tue Jan 15 08:26:25 2019 +0100
Released on: Fri Jan 11 22:22:02 2019 +0100
Released on: Thu Dec 13 10:21:56 2018 +0200
"},{"location":"release_notes/#v0110","title":"v0.11.0","text":"Released on: Thu Dec 6 10:15:51 2018 +0100
Released on: Thu Nov 8 15:21:34 2018 +0100
Released on: Thu Nov 22 17:14:18 2018 +0100
"},{"location":"release_notes/#v095","title":"v0.9.5","text":"Released on: Thu Nov 8 09:57:48 2018 +0100
"},{"location":"release_notes/#v094","title":"v0.9.4","text":"Released on: Wed Nov 7 08:22:14 2018 -0500
"},{"location":"release_notes/#v093","title":"v0.9.3","text":"Released on: Mon Oct 22 09:04:02 2018 -0400
"},{"location":"release_notes/#v092","title":"v0.9.2","text":"Released on: Thu Oct 18 12:14:09 2018 +0200
"},{"location":"release_notes/#v091","title":"v0.9.1","text":"Released on: Fri Oct 5 09:01:51 2018 +0200
"},{"location":"release_notes/#v090","title":"v0.9.0","text":"Released on: Thu Oct 4 14:42:28 2018 +0200
Released on: Thu Sep 6 14:25:22 2018 +0200
Released on: Wed Jul 4 17:41:33 2018 +0200
Released on: Tue Aug 21 17:29:28 2018 +0300
"},{"location":"release_notes/#v063","title":"v0.6.3","text":"Released on: Mon Jul 30 16:14:22 2018 +0200
"},{"location":"release_notes/#v062","title":"v0.6.2","text":"Released on: Wed Jul 4 17:49:37 2018 +0200
Released on: Mon Jun 18 17:07:48 2018 -0400
"},{"location":"release_notes/#v060","title":"v0.6.0","text":"Released on: Mon Jun 11 09:30:28 2018 +0200
Released on: Fri May 4 18:25:32 2018 +0200
Released on: Thu Apr 12 11:46:09 2018 +0200
Released on: Fri Apr 6 16:40:31 2018 +0200
Released on: Thu Mar 8 10:21:57 2018 +0100
Released on: Fri Jan 5 16:30:45 2018 +0100
Released on: Fri Dec 8 20:43:06 2017 +0100
Released on: Tue Nov 7 11:51:45 2017 +0100
Released on: Fri Oct 6 10:21:16 2017 +0200
Released on: Mon Sep 4 21:12:46 2017 +0200
virtctl
KubeVirt has a set of features that are not mature enough to be enabled by default. As such, they are protected by a Kubernetes concept called feature gates.
"},{"location":"cluster_admin/activating_feature_gates/#how-to-activate-a-feature-gate","title":"How to activate a feature gate","text":"You can activate a specific feature gate directly in KubeVirt's CR, by provisioning the following yaml, which uses the LiveMigration
feature gate as an example:
cat << END > enable-feature-gate.yaml\n---\napiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration: \n featureGates:\n - LiveMigration\nEND\n\nkubectl apply -f enable-feature-gate.yaml\n
Alternatively, the existing kubevirt CR can be altered:
kubectl edit kubevirt kubevirt -n kubevirt\n
...\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - DataVolumes\n - LiveMigration\n
Note: the name of the feature gates is case sensitive.
The snippet above assumes KubeVirt is installed in the kubevirt
namespace. Change the namespace to suite your installation.
The list of feature gates (which evolve in time) can be checked directly from the source code.
"},{"location":"cluster_admin/annotations_and_labels/","title":"Annotations and labels","text":"KubeVirt builds on and exposes a number of labels and annotations that either are used for internal implementation needs or expose useful information to API users. This page documents the labels and annotations that may be useful for regular API consumers. This page intentionally does not list labels and annotations that are merely part of internal implementation.
Note: Annotations and labels that are not specific to KubeVirt are also documented here.
"},{"location":"cluster_admin/annotations_and_labels/#kubevirtio","title":"kubevirt.io","text":"Example: kubevirt.io=virt-launcher
Used on: Pod
This label marks resources that belong to KubeVirt. An optional value may indicate which specific KubeVirt component a resource belongs to. This label may be used to list all resources that belong to KubeVirt, for example, to uninstall it from a cluster.
"},{"location":"cluster_admin/annotations_and_labels/#kubevirtioschedulable","title":"kubevirt.io/schedulable","text":"Example: kubevirt.io/schedulable=true
Used on: Node
This label declares whether a particular node is available for scheduling virtual machine instances on it.
"},{"location":"cluster_admin/annotations_and_labels/#kubevirtioheartbeat","title":"kubevirt.io/heartbeat","text":"Example: kubevirt.io/heartbeat=2018-07-03T20:07:25Z
Used on: Node
This annotation is regularly updated by virt-handler to help determine if a particular node is alive and hence should be available for new virtual machine instance scheduling.
"},{"location":"cluster_admin/api_validation/","title":"API Validation","text":"The KubeVirt VirtualMachineInstance API is implemented using a Kubernetes Custom Resource Definition (CRD). Because of this, KubeVirt is able to leverage a couple of features Kubernetes provides in order to perform validation checks on our API as objects created and updated on the cluster.
"},{"location":"cluster_admin/api_validation/#how-api-validation-works","title":"How API Validation Works","text":""},{"location":"cluster_admin/api_validation/#crd-openapiv3-schema","title":"CRD OpenAPIv3 Schema","text":"The KubeVirt API is registered with Kubernetes at install time through a series of CRD definitions. KubeVirt includes an OpenAPIv3 schema in these definitions which indicates to the Kubernetes Apiserver some very basic information about our API, such as what fields are required and what type of data is expected for each value.
This OpenAPIv3 schema validation is installed automatically and requires no thought on the users part to enable.
"},{"location":"cluster_admin/api_validation/#admission-control-webhooks","title":"Admission Control Webhooks","text":"The OpenAPIv3 schema validation is limited. It only validates the general structure of a KubeVirt object looks correct. It does not however verify that the contents of that object make sense.
With OpenAPIv3 validation alone, users can easily make simple mistakes (like not referencing a volume's name correctly with a disk) and the cluster will still accept the object. However, the VirtualMachineInstance will of course not start if these errors in the API exist. Ideally we'd like to catch configuration issues as early as possible and not allow an object to even be posted to the cluster if we can detect there's a problem with the object's Spec.
In order to perform this advanced validation, KubeVirt implements its own admission controller which is registered with kubernetes as an admission controller webhook. This webhook is registered with Kubernetes at install time. As KubeVirt objects are posted to the cluster, the Kubernetes API server forwards Creation requests to our webhook for validation before persisting the object into storage.
Note however that the KubeVirt admission controller requires features to be enabled on the cluster in order to be enabled.
"},{"location":"cluster_admin/api_validation/#enabling-kubevirt-admission-controller-on-kubernetes","title":"Enabling KubeVirt Admission Controller on Kubernetes","text":"When provisioning a new Kubernetes cluster, ensure that both the MutatingAdmissionWebhook and ValidatingAdmissionWebhook values are present in the Apiserver's --admission-control cli argument.
Below is an example of the --admission-control values we use during development
--admission-control='Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota'\n
Note that the old --admission-control flag was deprecated in 1.10 and replaced with --enable-admission-plugins. MutatingAdmissionWebhook and ValidatingAdmissionWebhook are enabled by default.
"},{"location":"cluster_admin/api_validation/#enabling-kubevirt-admission-controller-on-okd","title":"Enabling KubeVirt Admission Controller on OKD","text":"OKD also requires the admission control webhooks to be enabled at install time. The process is slightly different though. With OKD, we enable webhooks using an admission plugin.
These admission control plugins can be configured in openshift-ansible by setting the following value in ansible inventory file.
openshift_master_admission_plugin_config={\"ValidatingAdmissionWebhook\":{\"configuration\":{\"kind\": \"DefaultAdmissionConfig\",\"apiVersion\": \"v1\",\"disable\": false}},\"MutatingAdmissionWebhook\":{\"configuration\":{\"kind\": \"DefaultAdmissionConfig\",\"apiVersion\": \"v1\",\"disable\": false}}}\n
"},{"location":"cluster_admin/authorization/","title":"Authorization","text":"KubeVirt authorization is performed using Kubernetes's Resource Based Authorization Control system (RBAC). RBAC allows cluster admins to grant access to cluster resources by binding RBAC roles to users.
For example, an admin creates an RBAC role that represents the permissions required to create a VirtualMachineInstance. The admin can then bind that role to users in order to grant them the permissions required to launch a VirtualMachineInstance.
With RBAC roles, admins can grant users targeted access to various KubeVirt features.
"},{"location":"cluster_admin/authorization/#kubevirt-default-rbac-clusterroles","title":"KubeVirt Default RBAC ClusterRoles","text":"KubeVirt comes with a set of predefined RBAC ClusterRoles that can be used to grant users permissions to access KubeVirt Resources.
"},{"location":"cluster_admin/authorization/#default-view-role","title":"Default View Role","text":"The kubevirt.io:view ClusterRole gives users permissions to view all KubeVirt resources in the cluster. The permissions to create, delete, modify or access any KubeVirt resources beyond viewing the resource's spec are not included in this role. This means a user with this role could see that a VirtualMachineInstance is running, but neither shutdown nor gain access to that VirtualMachineInstance via console/VNC.
"},{"location":"cluster_admin/authorization/#default-edit-role","title":"Default Edit Role","text":"The kubevirt.io:edit ClusterRole gives users permissions to modify all KubeVirt resources in the cluster. For example, a user with this role can create new VirtualMachineInstances, delete VirtualMachineInstances, and gain access to both console and VNC.
"},{"location":"cluster_admin/authorization/#default-admin-role","title":"Default Admin Role","text":"The kubevirt.io:admin ClusterRole grants users full permissions to all KubeVirt resources, including the ability to delete collections of resources.
The admin role also grants users access to view and modify the KubeVirt runtime config. This config exists within the Kubevirt Custom Resource under the configuration
key in the namespace the KubeVirt operator is running.
NOTE Users are only guaranteed the ability to modify the kubevirt runtime configuration if a ClusterRoleBinding is used. A RoleBinding will work to provide kubevirt CR access only if the RoleBinding targets the same namespace that the kubevirt CR exists in.
"},{"location":"cluster_admin/authorization/#binding-default-clusterroles-to-users","title":"Binding Default ClusterRoles to Users","text":"The KubeVirt default ClusterRoles are granted to users by creating either a ClusterRoleBinding or RoleBinding object.
"},{"location":"cluster_admin/authorization/#binding-within-all-namespaces","title":"Binding within All Namespaces","text":"With a ClusterRoleBinding, users receive the permissions granted by the role across all namespaces.
"},{"location":"cluster_admin/authorization/#binding-within-single-namespace","title":"Binding within Single Namespace","text":"With a RoleBinding, users receive the permissions granted by the role only within a targeted namespace.
"},{"location":"cluster_admin/authorization/#extending-kubernetes-default-roles-with-kubevirt-permissions","title":"Extending Kubernetes Default Roles with KubeVirt permissions","text":"The aggregated ClusterRole Kubernetes feature facilitates combining multiple ClusterRoles into a single aggregated ClusterRole. This feature is commonly used to extend the default Kubernetes roles with permissions to access custom resources that do not exist in the Kubernetes core.
In order to extend the default Kubernetes roles to provide permission to access KubeVirt resources, we need to add the following labels to the KubeVirt ClusterRoles.
kubectl label clusterrole kubevirt.io:admin rbac.authorization.k8s.io/aggregate-to-admin=true\nkubectl label clusterrole kubevirt.io:edit rbac.authorization.k8s.io/aggregate-to-edit=true\nkubectl label clusterrole kubevirt.io:view rbac.authorization.k8s.io/aggregate-to-view=true\n
By adding these labels, any user with a RoleBinding or ClusterRoleBinding involving one of the default Kubernetes roles will automatically gain access to the equivalent KubeVirt roles as well.
More information about aggregated cluster roles can be found here
"},{"location":"cluster_admin/authorization/#creating-custom-rbac-roles","title":"Creating Custom RBAC Roles","text":"If the default KubeVirt ClusterRoles are not expressive enough, admins can create their own custom RBAC roles to grant user access to KubeVirt resources. The creation of a RBAC role is inclusive only, meaning there's no way to deny access. Instead access is only granted.
Below is an example of what KubeVirt's default admin ClusterRole looks like. A custom RBAC role can be created by reducing the permissions in this example role.
apiVersion: rbac.authorization.k8s.io/v1beta1\nkind: ClusterRole\nmetadata:\n name: my-custom-rbac-role\n labels:\n kubevirt.io: \"\"\nrules:\n - apiGroups:\n - subresources.kubevirt.io\n resources:\n - virtualmachineinstances/console\n - virtualmachineinstances/vnc\n verbs:\n - get\n - apiGroups:\n - kubevirt.io\n resources:\n - virtualmachineinstances\n - virtualmachines\n - virtualmachineinstancepresets\n - virtualmachineinstancereplicasets\n verbs:\n - get\n - delete\n - create\n - update\n - patch\n - list\n - watch\n - deletecollection\n
"},{"location":"cluster_admin/confidential_computing/","title":"Confidential computing","text":""},{"location":"cluster_admin/confidential_computing/#amd-secure-encrypted-virtualization-sev","title":"AMD Secure Encrypted Virtualization (SEV)","text":"FEATURE STATE: KubeVirt v0.49.0 (experimental support)
Secure Encrypted Virtualization (SEV) is a feature of AMD's EPYC CPUs that allows the memory of a virtual machine to be encrypted on the fly.
KubeVirt supports running confidential VMs on AMD EPYC hardware with SEV feature.
"},{"location":"cluster_admin/confidential_computing/#preconditions","title":"Preconditions","text":"In order to run an SEV guest the following condition must be met:
WorkloadEncryptionSEV
feature gate must be enabled.SEV memory encryption can be requested by setting the spec.domain.launchSecurity.sev
element in the VMI definition:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n launchSecurity:\n sev: {}\n firmware:\n bootloader:\n efi:\n secureBoot: false\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n
"},{"location":"cluster_admin/confidential_computing/#current-limitations","title":"Current limitations","text":"If the patch created is invalid KubeVirt will not be able to update or deploy the system. This is intended for special use cases and should not be used unless you know what you are doing.
Valid resource types are: Deployment, DaemonSet, Service, ValidatingWebhookConfiguraton, MutatingWebhookConfiguration, APIService, and CertificateSecret. More information can be found in the API spec.
Example customization patch:
---\napiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n certificateRotateStrategy: {}\n configuration: {}\n customizeComponents:\n patches:\n - resourceType: Deployment\n resourceName: virt-controller\n patch: '[{\"op\": \"remove\", \"path\": \"/spec/template/spec/containers/0/livenessProbe\"}]'\n type: json\n - resourceType: Deployment\n resourceName: virt-controller\n patch: '{\"metadata\":{\"annotations\":{\"patch\": \"true\"}}}'\n type: strategic\n
The above example will update the virt-controller
deployment to have an annotation in it's metadata that says patch: true
and will remove the livenessProbe from the container definition.
If the flags are invalid or become invalid on update the component will not be able to run
By using the customize flag option, whichever component the flags are to be applied to, all default flags will be removed and only the flags specified will be used. The available resources to change the flags on are api
, controller
and handler
. You can find our more details about the API in the API spec.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n certificateRotateStrategy: {}\n configuration: {}\n customizeComponents:\n flags:\n api:\n v: \"5\"\n port: \"8443\"\n console-server-port: \"8186\"\n subresources-only: \"true\"\n
The above example would produce a virt-api
pod with the following command
...\nspec:\n ....\n container:\n - name: virt-api\n command:\n - virt-api\n - --v\n - \"5\"\n - --console-server-port\n - \"8186\"\n - --port\n - \"8443\"\n - --subresources-only\n - \"true\"\n ...\n
"},{"location":"cluster_admin/device_status_on_Arm64/","title":"Device Status on Arm64","text":"This page is based on https://github.com/kubevirt/kubevirt/issues/8916
Devices Description Status on Arm64 DisableHotplug supported Disks sata/ virtio bus support virtio bus Watchdog i6300esb not supported UseVirtioTransitional virtio-transitional supported Interfaces e1000/ virtio-net-device support virtio-net-device Inputs tablet virtio/usb bus supported AutoattachPodInterface connect to /net/tun (devices.kubevirt.io/tun) supported AutoattachGraphicsDevice create a virtio-gpu device / vga device support virtio-gpu AutoattachMemBalloon virtio-balloon-pci-non-transitional supported AutoattachInputDevice auto add tablet supported Rng virtio-rng-pci-non-transitional host:/dev/urandom supported BlockMultiQueue \"driver\":\"virtio-blk-pci-non-transitional\",\"num-queues\":$cpu_number supported NetworkInterfaceMultiQueue -netdev tap,fds=21:23:24:25,vhost=on,vhostfds=26:27:28:29,id=hostua-default#fd number equals to queue number supported GPUs not verified Filesystems virtiofs, vhost-user-fs-pci, need to enable featuregate: ExperimentalVirtiofsSupport supported ClientPassthrough https://www.linaro.org/blog/kvm-pciemsi-passthrough-armarm64/on x86_64, iommu need to be enabled not verified Sound ich9/ ac97 not supported TPM tpm-tis-devicehttps://qemu.readthedocs.io/en/latest/specs/tpm.html supported Sriov vfio-pci not verified"},{"location":"cluster_admin/feature_gate_status_on_Arm64/","title":"Feature Gate Status on Arm64","text":"This page is based on https://github.com/kubevirt/kubevirt/issues/9749 It records the feature gate status on Arm64 platform. Here is the explanation of the status:
-blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-private/downwardapi-disks/vhostmd0\",\"node-name\":\"libvirt-1-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}
But unable to get information via vm-dump-metrics
:LIBMETRICS: read_mdisk(): Unable to read metrics disk
LIBMETRICS: get_virtio_metrics(): Unable to export metrics: open(/dev/virtio-ports/org.github.vhostmd.1) No such file or directory
LIBMETRICS: get_virtio_metrics(): Unable to read metrics
NonRootDeprecated Supported NonRoot Supported Root Supported ClusterProfiler Supported WorkloadEncryptionSEV Not supported SEV is only available on x86_64 VSOCKGate Supported HotplugNetworkIfacesGate Not supported yet Need to setup multus-cni and multus-dynamic-networks-controller: https://github.com/k8snetworkplumbingwg/multus-cni cat ./deployments/multus-daemonset-thick.yml \\| kubectl apply -f -
https://github.com/k8snetworkplumbingwg/multus-dynamic-networks-controller kubectl apply -f manifests/dynamic-networks-controller.yaml
Currently, the image ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick does not support Arm64 server. For more information please refer to https://github.com/k8snetworkplumbingwg/multus-cni/pull/1027. CommonInstancetypesDeploymentGate Not supported yet Support of common-instancetypes instancetypes needs to be tested, common-instancetypes preferences for ARM workloads are still missing"},{"location":"cluster_admin/gitops/","title":"Managing KubeVirt with GitOps","text":"The GitOps way uses Git repositories as a single source of truth to deliver infrastructure as code. Automation is employed to keep the desired and the live state of clusters in sync at all times. This means any change to a repository is automatically applied to one or more clusters while changes to a cluster will be automatically reverted to the state described in the single source of truth.
With GitOps the separation of testing and production environments, improving the availability of applications and working with multi-cluster environments becomes considerably easier.
"},{"location":"cluster_admin/gitops/#demo-repository","title":"Demo repository","text":"A demo with detailed explanation on how to manage KubeVirt with GitOps can be found here.
The demo is using Open Cluster Management and ArgoCD to deploy KubeVirt and virtual machines across multiple clusters.
"},{"location":"cluster_admin/installation/","title":"Installation","text":"KubeVirt is a virtualization add-on to Kubernetes and this guide assumes that a Kubernetes cluster is already installed.
If installed on OKD, the web console is extended for management of virtual machines.
"},{"location":"cluster_admin/installation/#requirements","title":"Requirements","text":"A few requirements need to be met before you can begin:
--allow-privileged=true
in order to run KubeVirt's privileged DaemonSet.kubectl
client utilityKubeVirt is currently supported on the following container runtimes:
Other container runtimes, which do not use virtualization features, should work too. However, the mentioned ones are the main target.
"},{"location":"cluster_admin/installation/#integration-with-apparmor","title":"Integration with AppArmor","text":"In most of the scenarios, KubeVirt can run normally on systems with AppArmor. However, there are several known use cases that may require additional user interaction.
On a system with AppArmor enabled, the locally installed profiles may block the execution of the KubeVirt privileged containers. That usually results in initialization failure of the virt-handler
pod:
$ kubectl get pods -n kubevirt\nNAME READY STATUS RESTARTS AGE\nvirt-api-77df5c4f87-7mqv4 1/1 Running 1 (17m ago) 27m\nvirt-api-77df5c4f87-wcq44 1/1 Running 1 (17m ago) 27m\nvirt-controller-749d8d99d4-56gb7 1/1 Running 1 (17m ago) 27m\nvirt-controller-749d8d99d4-78j6x 1/1 Running 1 (17m ago) 27m\nvirt-handler-4w99d 0/1 Init:Error 14 (5m18s ago) 27m\nvirt-operator-564f568975-g9wh4 1/1 Running 1 (17m ago) 31m\nvirt-operator-564f568975-wnpz8 1/1 Running 1 (17m ago) 31m\n\n$ kubectl logs -n kubevirt virt-handler-4w99d virt-launcher\nerror: failed to get emulator capabilities\n\nerror: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: libvirt: error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied\n\n$ journalctl -b | grep DEN\n...\nMay 18 16:44:20 debian audit[6316]: AVC apparmor=\"DENIED\" operation=\"exec\" profile=\"libvirtd\" name=\"/usr/libexec/qemu-kvm\" pid=6316 comm=\"rpc-worker\" requested_mask=\"x\" denied_mask=\"x\" fsuid=107 ouid=0\nMay 18 16:44:20 debian kernel: audit: type=1400 audit(1652888660.539:39): apparmor=\"DENIED\" operation=\"exec\" profile=\"libvirtd\" name=\"/usr/libexec/qemu-kvm\" pid=6316 comm=\"rpc-worker\" requested_mask=\"x\" denied_mask=\"x\" fsuid=107 ouid=0\n...\n
Here, the host AppArmor profile for libvirtd
does not allow the execution of the /usr/libexec/qemu-kvm
binary. In the future this will hopefully work out of the box (tracking issue), but until then there are a couple of possible workarounds.
The first (and simplest) one is to remove the libvirt package from the host: assuming the host is a dedicated Kubernetes node, you likely won't need it anyway.
If you actually need libvirt to be present on the host, then you can add the following rule to the AppArmor profile for libvirtd (usually /etc/apparmor.d/usr.sbin.libvirtd
):
# vim /etc/apparmor.d/usr.sbin.libvirtd\n...\n/usr/libexec/qemu-kvm PUx,\n...\n# apparmor_parser -r /etc/apparmor.d/usr.sbin.libvirtd # or systemctl reload apparmor.service\n
The default AppArmor profile used by the container runtimes usually denies mount
call for the workloads. That may prevent from running VMs with VirtIO-FS. This is a known issue. The current workaround is to run such a VM as unconfined
by adding the following annotation to the VM or VMI object:
annotations:\n container.apparmor.security.beta.kubernetes.io/compute: unconfined\n
Hardware with virtualization support is recommended. You can use virt-host-validate to ensure that your hosts are capable of running virtualization workloads:
$ virt-host-validate qemu\n QEMU: Checking for hardware virtualization : PASS\n QEMU: Checking if device /dev/kvm exists : PASS\n QEMU: Checking if device /dev/kvm is accessible : PASS\n QEMU: Checking if device /dev/vhost-net exists : PASS\n QEMU: Checking if device /dev/net/tun exists : PASS\n...\n
"},{"location":"cluster_admin/installation/#selinux-support","title":"SELinux support","text":"SELinux-enabled nodes need Container-selinux installed. The minimum version is documented inside the kubevirt/kubevirt repository, in docs/getting-started.md, under \"SELinux support\".
For (older) release branches that don't specify a container-selinux version, version 2.170.0 or newer is recommended.
"},{"location":"cluster_admin/installation/#installing-kubevirt-on-kubernetes","title":"Installing KubeVirt on Kubernetes","text":"KubeVirt can be installed using the KubeVirt operator, which manages the lifecycle of all the KubeVirt core components. Below is an example of how to install KubeVirt's latest official release. It supports to deploy KubeVirt on both x86_64 and Arm64 platforms.
# Point at latest release\n$ export RELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)\n# Deploy the KubeVirt operator\n$ kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml\n# Create the KubeVirt CR (instance deployment request) which triggers the actual installation\n$ kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml\n# wait until all KubeVirt components are up\n$ kubectl -n kubevirt wait kv kubevirt --for condition=Available\n
If hardware virtualization is not available, then a software emulation fallback can be enabled using by setting in the KubeVirt CR spec.configuration.developerConfiguration.useEmulation
to true
as follows:
$ kubectl edit -n kubevirt kubevirt kubevirt\n
Add the following to the kubevirt.yaml
file
spec:\n ...\n configuration:\n developerConfiguration:\n useEmulation: true\n
Note: Prior to release v0.20.0 the condition for the kubectl wait
command was named \"Ready\" instead of \"Available\"
Note: Prior to KubeVirt 0.34.2 a ConfigMap called kubevirt-config
in the install-namespace was used to configure KubeVirt. Since 0.34.2 this method is deprecated. The configmap still has precedence over configuration
on the CR exists, but it will not receive future updates and you should migrate any custom configurations to spec.configuration
on the KubeVirt CR.
All new components will be deployed under the kubevirt
namespace:
kubectl get pods -n kubevirt\nNAME READY STATUS RESTARTS AGE\nvirt-api-6d4fc3cf8a-b2ere 1/1 Running 0 1m\nvirt-controller-5d9fc8cf8b-n5trt 1/1 Running 0 1m\nvirt-handler-vwdjx 1/1 Running 0 1m\n...\n
"},{"location":"cluster_admin/installation/#installing-kubevirt-on-okd","title":"Installing KubeVirt on OKD","text":"The following SCC needs to be added prior KubeVirt deployment:
$ oc adm policy add-scc-to-user privileged -n kubevirt -z kubevirt-operator\n
Once privileges are granted, the KubeVirt can be deployed as described above.
"},{"location":"cluster_admin/installation/#web-user-interface-on-okd","title":"Web user interface on OKD","text":"No additional steps are required to extend OKD's web console for KubeVirt.
The virtualization extension is automatically enabled when KubeVirt deployment is detected.
"},{"location":"cluster_admin/installation/#from-service-catalog-as-an-apb","title":"From Service Catalog as an APB","text":"You can find KubeVirt in the OKD Service Catalog and install it from there. In order to do that please follow the documentation in the KubeVirt APB repository.
"},{"location":"cluster_admin/installation/#installing-kubevirt-on-k3os","title":"Installing KubeVirt on k3OS","text":"The following configuration needs to be added to all nodes prior KubeVirt deployment:
k3os:\n modules:\n - kvm\n - vhost_net\n
Once nodes are restarted with this configuration, the KubeVirt can be deployed as described above.
"},{"location":"cluster_admin/installation/#installing-the-daily-developer-builds","title":"Installing the Daily Developer Builds","text":"KubeVirt releases daily a developer build from the current main branch. One can see when the last release happened by looking at our nightly-build-jobs.
To install the latest developer build, run the following commands:
$ LATEST=$(curl -L https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/latest)\n$ kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-operator.yaml\n$ kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-cr.yaml\n
To find out which commit this build is based on, run:
$ LATEST=$(curl -L https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/latest)\n$ curl https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/commit\nd358cf085b5a86cc4fa516215f8b757a4e61def2\n
"},{"location":"cluster_admin/installation/#arm64-developer-builds","title":"ARM64 developer builds","text":"ARM64 developer builds can be installed like this:
$ LATEST=$(curl -L https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/latest-arm64)\n$ kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-operator-arm64.yaml\n$ kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-cr-arm64.yaml\n
"},{"location":"cluster_admin/installation/#deploying-from-source","title":"Deploying from Source","text":"See the Developer Getting Started Guide to understand how to build and deploy KubeVirt from source.
"},{"location":"cluster_admin/installation/#installing-network-plugins-optional","title":"Installing network plugins (optional)","text":"KubeVirt alone does not bring any additional network plugins, it just allows user to utilize them. If you want to attach your VMs to multiple networks (Multus CNI) or have full control over L2 (OVS CNI), you need to deploy respective network plugins. For more information, refer to OVS CNI installation guide.
Note: KubeVirt Ansible network playbook installs these plugins by default.
"},{"location":"cluster_admin/installation/#restricting-kubevirt-components-node-placement","title":"Restricting KubeVirt components node placement","text":"You can restrict the placement of the KubeVirt components across your cluster nodes by editing the KubeVirt CR:
.spec.infra.nodePlacement
field in the KubeVirt CR..spec.workloads.nodePlacement
field in the KubeVirt CR.For each of these .nodePlacement
objects, the .affinity
, .nodeSelector
and .tolerations
sub-fields can be configured. See the description in the API reference for further information about using these fields.
For example, to restrict the virt-controller and virt-api pods to only run on the control-plane nodes:
kubectl patch -n kubevirt kubevirt kubevirt --type merge --patch '{\"spec\": {\"infra\": {\"nodePlacement\": {\"nodeSelector\": {\"node-role.kubernetes.io/control-plane\": \"\"}}}}}'\n
To restrict the virt-handler pods to only run on nodes with the \"region=primary\" label:
kubectl patch -n kubevirt kubevirt kubevirt --type merge --patch '{\"spec\": {\"workloads\": {\"nodePlacement\": {\"nodeSelector\": {\"region\": \"primary\"}}}}}'\n
"},{"location":"cluster_admin/ksm/","title":"KSM Management","text":"Kernel Samepage Merging (KSM) allows de-duplication of memory. KSM tries to find identical Memory Pages and merge those to free memory.
Further Information: - KSM (Kernel Samepage Merging) feature - Kernel Same-page Merging (KSM)
"},{"location":"cluster_admin/ksm/#enabling-ksm-through-kubevirt-cr","title":"Enabling KSM through KubeVirt CR","text":"KSM can be enabled on nodes by spec.configuration.ksmConfiguration
in the KubeVirt CR. ksmConfiguration
instructs on which nodes KSM will be enabled, exposing a nodeLabelSelector
. nodeLabelSelector
is a LabelSelector and defines the filter, based on the node labels. If a node's labels match the label selector term, then on that node, KSM will be enabled.
NOTE If nodeLabelSelector
is nil KSM will not be enabled on any nodes. Empty nodeLabelSelector
will enable KSM on every node.
Enabling KSM on nodes in which the hostname is node01
or node03
:
spec:\n configuration:\n ksmConfiguration:\n nodeLabelSelector:\n matchExpressions:\n - key: kubernetes.io/hostname\n operator: In\n values:\n - node01\n - node03\n
Enabling KSM on nodes with labels kubevirt.io/first-label: true
, kubevirt.io/second-label: true
:
spec:\n configuration:\n ksmConfiguration:\n nodeLabelSelector:\n matchLabels:\n kubevirt.io/first-label: \"true\"\n kubevirt.io/second-label: \"true\"\n
Enabling KSM on every node:
spec:\n configuration:\n ksmConfiguration:\n nodeLabelSelector: {}\n
On those nodes where KubeVirt enables the KSM via configuration, an annotation will be added (kubevirt.io/ksm-handler-managed
). This annotation is an internal record to keep track of which nodes are currently managed by virt-handler, so that it is possible to distinguish which nodes should be restored in case of future ksmConfiguration changes.
Let's imagine this scenario:
node01
) has KSM externally enabled.node02
and node03
.Thanks to the annotation, the virt-handler is able to disable ksm on only those nodes where it itself had enabled it(node02
node03
), leaving the others unchanged (node01
).
KubeVirt can discover on which nodes KSM is enabled and will mark them with a special label (kubevirt.io/ksm-enabled
) with value true
. This label can be used to schedule the vms in nodes with KSM enabled or not.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: testvm\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: testvm\n spec:\n nodeSelector:\n kubevirt.io/ksm-enabled: \"true\"\n [...]\n
"},{"location":"cluster_admin/migration_policies/","title":"Migration Policies","text":"Migration policies provides a new way of applying migration configurations to Virtual Machines. The policies can refine Kubevirt CR's MigrationConfiguration
that sets the cluster-wide migration configurations. This way, the cluster-wide settings serve as a default that can be refined (i.e. changed, removed or added) by the migration policy.
Please bear in mind that migration policies are in version v1alpha1
. This means that this API is not fully stable yet and that APIs may change in the future.
KubeVirt supports Live Migrations of Virtual Machine workloads. Before migration policies were introduced, migration settings could be configurable only on the cluster-wide scope by editing KubevirtCR's spec or more specifically MigrationConfiguration CRD.
Several aspects (although not all) of migration behaviour that can be customized are: - Bandwidth - Auto-convergence - Post/Pre-copy - Max number of parallel migrations - Timeout
Migration policies generalize the concept of defining migration configurations, so it would be possible to apply different configurations to specific groups of VMs.
Such capability can be useful for a lot of different use cases on which there is a need to differentiate between different workloads. Differentiation of different configurations could be needed because different workloads are considered to be in different priorities, security segregation, workloads with different requirements, help to converge workloads which aren't migration-friendly, and many other reasons.
"},{"location":"cluster_admin/migration_policies/#api-examples","title":"API Examples","text":""},{"location":"cluster_admin/migration_policies/#migration-configurations","title":"Migration Configurations","text":"Currently the MigrationPolicy spec will only include the following configurations from KubevirtCR's MigrationConfiguration (in the future more configurations that aren't part of Kubevirt CR are intended to be added):
apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\n spec:\n allowAutoConverge: true\n bandwidthPerMigration: 217Ki\n completionTimeoutPerGiB: 23\n allowPostCopy: false\n
All above fields are optional. When omitted, the configuration will be applied as defined in KubevirtCR's MigrationConfiguration. This way, KubevirtCR will serve as a configurable set of defaults for both VMs that are not bound to any MigrationPolicy and VMs that are bound to a MigrationPolicy that does not define all fields of the configurations.
"},{"location":"cluster_admin/migration_policies/#matching-policies-to-vms","title":"Matching Policies to VMs","text":"Next in the spec are the selectors that define the group of VMs on which to apply the policy. The options to do so are the following.
This policy applies to the VMs in namespaces that have all the required labels:
apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\n spec:\n selectors:\n namespaceSelector:\n hpc-workloads: true # Matches a key and a value \n
This policy applies for the VMs that have all the required labels:
apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\n spec:\n selectors:\n virtualMachineInstanceSelector:\n workload-type: db # Matches a key and a value \n
It is also possible to combine the previous two:
apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\n spec:\n selectors:\n namespaceSelector:\n hpc-workloads: true\n virtualMachineInstanceSelector:\n workload-type: db\n
"},{"location":"cluster_admin/migration_policies/#full-manifest","title":"Full Manifest:","text":"apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\nmetadata:\n name: my-awesome-policy\nspec:\n # Migration Configuration\n allowAutoConverge: true\n bandwidthPerMigration: 217Ki\n completionTimeoutPerGiB: 23\n allowPostCopy: false\n\n # Matching to VMs\n selectors:\n namespaceSelector:\n hpc-workloads: true\n virtualMachineInstanceSelector:\n workload-type: db\n
"},{"location":"cluster_admin/migration_policies/#policies-precedence","title":"Policies' Precedence","text":"It is possible that multiple policies apply to the same VMI. In such cases, the precedence is in the same order as the bullets above (VMI labels first, then namespace labels). It is not allowed to define two policies with the exact same selectors.
If multiple policies apply to the same VMI: * The most detailed policy will be applied, that is, the policy with the highest number of matching labels
For example, let's imagine a VMI with the following labels:
size: small
os: fedora
gpu: nvidia
And let's say the namespace to which the VMI belongs contains the following labels:
priority: high
bandwidth: medium
hpc-workload: true
The following policies are listed by their precedence (high to low):
1) VMI labels: {size: small, gpu: nvidia}
, Namespace labels: {priority:high, bandwidth: medium}
bandwidth
.2) VMI labels: {size: small, gpu: nvidia}
, Namespace labels: {priority:high, hpc-workload:true}
gpu
.3) VMI labels: {size: small, gpu: nvidia}
, Namespace labels: {priority:high}
gpu
.4) VMI labels: {size: small}
, Namespace labels: {priority:high, hpc-workload:true}
hpc-workload
.5) VMI labels: {gpu: nvidia}
, Namespace labels: {priority:high}
gpu
.6) VMI labels: {gpu: nvidia}
, Namespace labels: {}
gpu
.7) VMI labels: {gpu: intel}
, Namespace labels: {priority:high}
Before removing a kubernetes node from the cluster, users will want to ensure that VirtualMachineInstances have been gracefully terminated before powering down the node. Since all VirtualMachineInstances are backed by a Pod, the recommended method of evicting VirtualMachineInstances is to use the kubectl drain command, or in the case of OKD the oc adm drain command.
"},{"location":"cluster_admin/node_maintenance/#evict-all-vms-from-a-node","title":"Evict all VMs from a Node","text":"Select the node you'd like to evict VirtualMachineInstances from by identifying the node from the list of cluster nodes.
kubectl get nodes
The following command will gracefully terminate all VMs on a specific node. Replace <node-name>
with the name of the node where the eviction should occur.
kubectl drain <node-name> --delete-local-data --ignore-daemonsets=true --force --pod-selector=kubevirt.io=virt-launcher
Below is a break down of why each argument passed to the drain command is required.
kubectl drain <node-name>
is selecting a specific node as a target for the eviction
--delete-local-data
is a required flag that is necessary for removing any pod that utilizes an emptyDir volume. The VirtualMachineInstance Pod does use emptyDir volumes, however the data in those volumes are ephemeral which means it is safe to delete after termination.
--ignore-daemonsets=true
is a required flag because every node running a VirtualMachineInstance will also be running our helper DaemonSet called virt-handler. DaemonSets are not allowed to be evicted using kubectl drain. By default, if this command encounters a DaemonSet on the target node, the command will fail. This flag tells the command it is safe to proceed with the eviction and to just ignore DaemonSets.
--force
is a required flag because VirtualMachineInstance pods are not owned by a ReplicaSet or DaemonSet controller. This means kubectl can't guarantee that the pods being terminated on the target node will get re-scheduled replacements placed else where in the cluster after the pods are evicted. KubeVirt has its own controllers which manage the underlying VirtualMachineInstance pods. Each controller behaves differently to a VirtualMachineInstance being evicted. That behavior is outlined further down in this document.
--pod-selector=kubevirt.io=virt-launcher
means only VirtualMachineInstance pods managed by KubeVirt will be removed from the node.
By removing the -pod-selector
argument from the previous command, we can issue the eviction of all Pods on a node. This command ensures Pods associated with VMs as well as all other Pods are evicted from the target node.
kubectl drain <node name> --delete-local-data --ignore-daemonsets=true --force
If the LiveMigration
feature gate is enabled, it is possible to specify an evictionStrategy
on VMIs which will react with live-migrations on specific taints on nodes. The following snippet on a VMI or the VMI templates in a VM ensures that the VMI is migrated during node eviction:
spec:\n evictionStrategy: LiveMigrate\n
Here a full VMI:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 30\n evictionStrategy: LiveMigrate\n domain:\n resources:\n requests:\n memory: 1024M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - disk:\n bus: virtio\n name: cloudinitdisk\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n
Behind the scenes a PodDisruptionBudget is created for each VMI which has an evictionStrategy defined. This ensures that evictions are be blocked on these VMIs and that we can guarantee that a VMI will be migrated instead of shut off. Note Prior to v0.34 the drain process with live migrations was detached from the kubectl drain
itself and required in addition specifying a special taint on the nodes: kubectl taint nodes foo kubevirt.io/drain=draining:NoSchedule
. This is no longer needed. The taint will still be respected if provided but is obsolete.
The kubectl drain will result in the target node being marked as unschedulable. This means the node will not be eligible for running new VirtualMachineInstances or Pods.
If it is decided that the target node should become schedulable again, the following command must be run.
kubectl uncordon <node name>
or in the case of OKD.
oc adm uncordon <node name>
From KubeVirt's perspective, a node is safe to shutdown once all VirtualMachineInstances have been evicted from the node. In a multi-use cluster where VirtualMachineInstances are being scheduled alongside other containerized workloads, it is up to the cluster admin to ensure all other pods have been safely evicted before powering down the node.
"},{"location":"cluster_admin/node_maintenance/#virtualmachine-evictions","title":"VirtualMachine Evictions","text":"The eviction of any VirtualMachineInstance that is owned by a VirtualMachine set to running=true will result in the VirtualMachineInstance being re-scheduled to another node.
The VirtualMachineInstance in this case will be forced to power down and restart on another node. In the future once KubeVirt introduces live migration support, the VM will be able to seamlessly migrate to another node during eviction.
"},{"location":"cluster_admin/node_maintenance/#virtualmachineinstancereplicaset-eviction-behavior","title":"VirtualMachineInstanceReplicaSet Eviction Behavior","text":"The eviction of VirtualMachineInstances owned by a VirtualMachineInstanceReplicaSet will result in the VirtualMachineInstanceReplicaSet scheduling replacements for the evicted VirtualMachineInstances on other nodes in the cluster.
"},{"location":"cluster_admin/node_maintenance/#virtualmachineinstance-eviction-behavior","title":"VirtualMachineInstance Eviction Behavior","text":"VirtualMachineInstances not backed by either a VirtualMachineInstanceReplicaSet or an VirtualMachine object will not be re-scheduled after eviction.
"},{"location":"cluster_admin/operations_on_Arm64/","title":"Arm64 Operations","text":"This page summarizes all operations that are not supported on Arm64.
"},{"location":"cluster_admin/operations_on_Arm64/#hotplug-network-interfaces","title":"Hotplug Network Interfaces","text":"Hotplug Network Interfaces are not supported on Arm64, because the image ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick does not support for the Arm64 platform. For more information please refer to https://github.com/k8snetworkplumbingwg/multus-cni/pull/1027.
"},{"location":"cluster_admin/operations_on_Arm64/#hotplug-volumes","title":"Hotplug Volumes","text":"Hotplug Volumes are not supported on Arm64, because the Containerized Data Importer is not supported on Arm64 for now.
"},{"location":"cluster_admin/operations_on_Arm64/#hugepages-support","title":"Hugepages support","text":"Hugepages feature is not supported on Arm64. The hugepage mechanism differs between X86_64 and Arm64. Now we only verify KubeVirt on 4k pagesize systems.
"},{"location":"cluster_admin/operations_on_Arm64/#containerized-data-importer","title":"Containerized Data Importer","text":"For now, we have not supported this project on Arm64, but it is in our plan.
"},{"location":"cluster_admin/operations_on_Arm64/#export-api","title":"Export API","text":"Export API is partially supported on the Arm64 platform. As CDI is not supported yet, the export of DataVolumes and MemoryDump are not supported on Arm64.
"},{"location":"cluster_admin/operations_on_Arm64/#virtual-machine-memory-dump","title":"Virtual machine memory dump","text":"As explained above, MemoryDump requires CDI, and is not yet supported on Arm64.
"},{"location":"cluster_admin/operations_on_Arm64/#mediated-devices-and-virtual-gpus","title":"Mediated devices and virtual GPUs","text":"This is not verified on Arm64 platform.
"},{"location":"cluster_admin/scheduler/","title":"KubeVirt Scheduler","text":"Scheduling is the process of matching Pods/VMs to Nodes. By default, the scheduler used is kube-scheduler. Further details can be found at Kubernetes Scheduler Documentation.
Custom schedulers can be used if the default scheduler does not satisfy your needs. For instance, you might want to schedule VMs using a load aware scheduler such as Trimaran Schedulers.
"},{"location":"cluster_admin/scheduler/#creating-a-custom-scheduler","title":"Creating a Custom Scheduler","text":"KubeVirt is compatible with custom schedulers. The configuration steps are described in the Official Kubernetes Documentation. Please note, the Kubernetes version KubeVirt is running on and the Kubernetes version used to build the custom scheduler have to match. To get the Kubernetes version KubeVirt is running on, you can run the following command:
$ kubectl version\nClient Version: version.Info{Major:\"1\", Minor:\"22\", GitVersion:\"v1.22.13\", GitCommit:\"a43c0904d0de10f92aa3956c74489c45e6453d6e\", GitTreeState:\"clean\", BuildDate:\"2022-08-17T18:28:56Z\", GoVersion:\"go1.16.15\", Compiler:\"gc\", Platform:\"linux/amd64\"}\nServer Version: version.Info{Major:\"1\", Minor:\"22\", GitVersion:\"v1.22.13\", GitCommit:\"a43c0904d0de10f92aa3956c74489c45e6453d6e\", GitTreeState:\"clean\", BuildDate:\"2022-08-17T18:23:45Z\", GoVersion:\"go1.16.15\", Compiler:\"gc\", Platform:\"linux/amd64\"}\n
Pay attention to the Server
line. In this case, the Kubernetes version is v1.22.13
. You have to checkout the matching Kubernetes version and build the Kubernetes project:
$ cd kubernetes\n$ git checkout v1.22.13\n$ make\n
Then, you can follow the configuration steps described here. Additionally, the ClusterRole system:kube-scheduler
needs permissions to use the verbs watch
, list
and get
on StorageClasses.
- apiGroups: \n - storage.k8s.io \n resources: \n - storageclasses \n verbs: \n - watch \n - list \n - get \n
"},{"location":"cluster_admin/scheduler/#scheduling-vms-with-the-custom-scheduler","title":"Scheduling VMs with the Custom Scheduler","text":"The second scheduler should be up and running. You can check it with:
$ kubectl get all -n kube-system\n
The deployment my-scheduler
should be up and running if everything is setup properly. In order to launch the VM using the custom scheduler, you need to set the SchedulerName
in the VM's spec to my-scheduler
. Here is an example VM definition:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\nspec:\n running: true\n template:\n spec:\n schedulerName: my-scheduler\n domain:\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n rng: {}\n resources:\n requests:\n memory: 1Gi\n terminationGracePeriodSeconds: 180\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n name: cloudinitdisk\n
In case the specified SchedulerName
does not match any existing scheduler, the virt-launcher
pod will stay in state Pending, until the specified scheduler can be found. You can check if the VM has been scheduled using the my-scheduler
checking the virt-launcher
pod events associated with the VM. The pod should have been scheduled with my-scheduler
. $ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vm-fedora-dpc87 2/2 Running 0 24m\n\n$ kubectl describe pod virt-launcher-vm-fedora-dpc87\n[...] \nEvents:\n Type Reason Age From Message\n ---- ------ ---- ---- -------\n Normal Scheduled 21m my-scheduler Successfully assigned default/virt-launcher-vm-fedora-dpc87 to node01\n[...]\n
"},{"location":"cluster_admin/tekton_tasks/","title":"KubeVirt Tekton","text":""},{"location":"cluster_admin/tekton_tasks/#prerequisites","title":"Prerequisites","text":"KubeVirt-specific Tekton Tasks, which are focused on:
KubeVirt Tekton Tasks and example Pipelines are available in artifacthub.io from where you can easily deploy them to your cluster.
"},{"location":"cluster_admin/tekton_tasks/#existing-tasks","title":"Existing Tasks","text":""},{"location":"cluster_admin/tekton_tasks/#create-virtual-machines","title":"Create Virtual Machines","text":"All these Tasks can be used for creating Pipelines. We prepared example Pipelines which show what can you do with the KubeVirt Tasks.
Windows efi installer - This Pipeline will prepare a Windows 10/11/2k22 datavolume with virtio drivers installed. User has to provide a working link to a Windows 10/11/2k22 iso file. The Pipeline is suitable for Windows versions, which requires EFI (e.g. Windows 10/11/2k22). More information about Pipeline can be found here
Windows customize - This Pipeline will install a SQL server or a VS Code in a Windows VM. More information about Pipeline can be found here
Note
kubevirt-os-images
namespace. baseDvNamespace
attribute in Pipeline), additional RBAC permissions will be required (list of all required RBAC permissions can be found here). KubeVirt has its own node daemon, called virt-handler. In addition to the usual k8s methods of detecting issues on nodes, the virt-handler daemon has its own heartbeat mechanism. This allows for fine-tuned error handling of VirtualMachineInstances.
"},{"location":"cluster_admin/unresponsive_nodes/#virt-handler-heartbeat","title":"virt-handler heartbeat","text":"virt-handler
periodically tries to update the kubevirt.io/schedulable
label and the kubevirt.io/heartbeat
annotation on the node it is running on:
$ kubectl get nodes -o yaml\napiVersion: v1\nitems:\n- apiVersion: v1\n kind: Node\n metadata:\n annotations:\n kubevirt.io/heartbeat: 2018-11-05T09:42:25Z\n creationTimestamp: 2018-11-05T08:55:53Z\n labels:\n beta.kubernetes.io/arch: amd64\n beta.kubernetes.io/os: linux\n cpumanager: \"false\"\n kubernetes.io/hostname: node01\n kubevirt.io/schedulable: \"true\"\n node-role.kubernetes.io/control-plane: \"\"\n
If a VirtualMachineInstance
gets scheduled, the scheduler is only considering nodes where kubevirt.io/schedulable
is true
. This can be seen when looking on the corresponding pod of a VirtualMachineInstance
:
$ kubectl get pods virt-launcher-vmi-nocloud-ct6mr -o yaml\napiVersion: v1\nkind: Pod\nmetadata:\n [...]\nspec:\n [...]\n nodeName: node01\n nodeSelector:\n kubevirt.io/schedulable: \"true\"\n [...]\n
In case there is a communication issue or the host goes down, virt-handler
can't update its labels and annotations any-more. Once the last kubevirt.io/heartbeat
timestamp is older than five minutes, the KubeVirt node-controller kicks in and sets the kubevirt.io/schedulable
label to false
. As a consequence no more VMIs will be schedule to this node until virt-handler is connected again.
In cases where virt-handler
has some issues but the node is in general fine, a VirtualMachineInstance
can be deleted as usual via kubectl delete vmi <myvm>
. Pods of a VirtualMachineInstance
will be told by the cluster-controllers they should shut down. As soon as the Pod is gone, the VirtualMachineInstance
will be moved to Failed
state, if virt-handler
did not manage to update it's heartbeat in the meantime. If virt-handler
could recover in the meantime, virt-handler
will move the VirtualMachineInstance
to failed state instead of the cluster-controllers.
If the whole node is unresponsive, deleting a VirtualMachineInstance
via kubectl delete vmi <myvmi>
alone will never remove the VirtualMachineInstance
. In this case all pods on the unresponsive node need to be force-deleted: First make sure that the node is really dead. Then delete all pods on the node via a force-delete: kubectl delete pod --force --grace-period=0 <mypod>
.
As soon as the pod disappears and the heartbeat from virt-handler timed out, the VMIs will be moved to Failed
state. If they were already marked for deletion they will simply disappear. If not, they can be deleted and will disappear almost immediately.
It takes up to five minutes until the KubeVirt cluster components can detect that virt-handler is unhealthy. During that time-frame it is possible that new VMIs are scheduled to the affected node. If virt-handler is not capable of connecting to these pods on the node, the pods will sooner or later go to failed state. As soon as the cluster finally detects the issue, the VMIs will be set to failed by the cluster.
"},{"location":"cluster_admin/updating_and_deletion/","title":"Updating and deletion","text":""},{"location":"cluster_admin/updating_and_deletion/#updating-kubevirt-control-plane","title":"Updating KubeVirt Control Plane","text":"Zero downtime rolling updates are supported starting with release v0.17.0
onward. Updating from any release prior to the KubeVirt v0.17.0
release is not supported.
Note: Updating is only supported from N-1 to N release.
Updates are triggered one of two ways.
By changing the imageTag value in the KubeVirt CR's spec.
For example, updating from v0.17.0-alpha.1
to v0.17.0
is as simple as patching the KubeVirt CR with the imageTag: v0.17.0
value. From there the KubeVirt operator will begin the process of rolling out the new version of KubeVirt. Existing VM/VMIs will remain uninterrupted both during and after the update succeeds.
$ kubectl patch kv kubevirt -n kubevirt --type=json -p '[{ \"op\": \"add\", \"path\": \"/spec/imageTag\", \"value\": \"v0.17.0\" }]'\n
Or, by updating the kubevirt operator if no imageTag value is set.
When no imageTag value is set in the kubevirt CR, the system assumes that the version of KubeVirt is locked to the version of the operator. This means that updating the operator will result in the underlying KubeVirt installation being updated as well.
$ export RELEASE=v0.26.0\n$ kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml\n
The first way provides a fine granular approach where you have full control over what version of KubeVirt is installed independently of what version of the KubeVirt operator you might be running. The second approach allows you to lock both the operator and operand to the same version.
Newer KubeVirt may require additional or extended RBAC rules. In this case, the #1 update method may fail, because the virt-operator present in the cluster doesn't have these RBAC rules itself. In this case, you need to update the virt-operator
first, and then proceed to update kubevirt. See this issue for more details.
Workload updates are supported as an opt in feature starting with v0.39.0
By default, when KubeVirt is updated this only involves the control plane components. Any existing VirtualMachineInstance (VMI) workloads that are running before an update occurs remain 100% untouched. The workloads continue to run and are not interrupted as part of the default update process.
It's important to note that these VMI workloads do involve components such as libvirt, qemu, and virt-launcher, which can optionally be updated during the KubeVirt update process as well. However that requires opting in to having virt-operator perform automated actions on workloads.
Opting in to VMI updates involves configuring the workloadUpdateStrategy
field on the KubeVirt CR. This field controls the methods virt-operator will use to when updating the VMI workload pods.
There are two methods supported.
LiveMigrate: Which results in VMIs being updated by live migrating the virtual machine guest into a new pod with all the updated components enabled.
Evict: Which results in the VMI's pod being shutdown. If the VMI is controlled by a higher level VirtualMachine object with runStrategy: always
, then a new VMI will spin up in a new pod with updated components.
The least disruptive way to update VMI workloads is to use LiveMigrate. Any VMI workload that is not live migratable will be left untouched. If live migration is not enabled in the cluster, then the only option available for virt-operator managed VMI updates is the Evict method.
Example: Enabling VMI workload updates via LiveMigration
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n imagePullPolicy: IfNotPresent\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - LiveMigrate\n
Example: Enabling VMI workload updates via Evict with batch tunings
The batch tunings allow configuring how quickly VMI's are evicted. In large clusters, it's desirable to ensure that VMI's are evicted in batches in order to distribute load.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n imagePullPolicy: IfNotPresent\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - Evict\n batchEvictionSize: 10\n batchEvictionInterval: \"1m\"\n
Example: Enabling VMI workload updates with both LiveMigrate and Evict
When both LiveMigrate and Evict are specified, then any workloads which are live migratable will be guaranteed to be live migrated. Only workloads which are not live migratable will be evicted.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n imagePullPolicy: IfNotPresent\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - LiveMigrate\n - Evict\n batchEvictionSize: 10\n batchEvictionInterval: \"1m\"\n
"},{"location":"cluster_admin/updating_and_deletion/#deleting-kubevirt","title":"Deleting KubeVirt","text":"To delete the KubeVirt you should first to delete KubeVirt
custom resource and then delete the KubeVirt operator.
$ export RELEASE=v0.17.0\n$ kubectl delete -n kubevirt kubevirt kubevirt --wait=true # --wait=true should anyway be default\n$ kubectl delete apiservices v1.subresources.kubevirt.io # this needs to be deleted to avoid stuck terminating namespaces\n$ kubectl delete mutatingwebhookconfigurations virt-api-mutator # not blocking but would be left over\n$ kubectl delete validatingwebhookconfigurations virt-operator-validator # not blocking but would be left over\n$ kubectl delete validatingwebhookconfigurations virt-api-validator # not blocking but would be left over\n$ kubectl delete -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml --wait=false\n
Note: If by mistake you deleted the operator first, the KV custom resource will get stuck in the Terminating
state, to fix it, delete manually finalizer from the resource.
Note: The apiservice
and the webhookconfigurations
need to be deleted manually due to a bug.
$ kubectl -n kubevirt patch kv kubevirt --type=json -p '[{ \"op\": \"remove\", \"path\": \"/metadata/finalizers\" }]'\n
"},{"location":"cluster_admin/virtual_machines_on_Arm64/","title":"Virtual Machines on Arm64","text":"This page summaries all unsupported Virtual Machines configurations and different default setups on Arm64 platform.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#virtual-hardware","title":"Virtual hardware","text":""},{"location":"cluster_admin/virtual_machines_on_Arm64/#machine-type","title":"Machine Type","text":"Currently, we only support one machine type, virt
, which is set by default.
On Arm64 platform, we only support UEFI boot which is set by default. UEFI secure boot is not supported.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#cpu","title":"CPU","text":""},{"location":"cluster_admin/virtual_machines_on_Arm64/#node-labeller","title":"Node-labeller","text":"Currently, Node-labeller is partially supported on Arm64 platform. It does not yet support parsing virsh_domcapabilities.xml and capabilities.xml, and extracting related information such as CPU features.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#model","title":"Model","text":"host-passthrough
is the only model that supported on Arm64. The CPU model is set by default on Arm64 platform.
kvm
and hyperv
timers are not supported on Arm64 platform.
We do not support vga devices but use virtio-gpu by default.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#hugepages","title":"Hugepages","text":"Hugepages are not supported on Arm64 platform.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#resources-requests-and-limits","title":"Resources Requests and Limits","text":"CPU pinning is supported on Arm64 platform.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#numa","title":"NUMA","text":"As Hugepages are a precondition of the NUMA feature, and Hugepages are not enabled on the Arm64 platform, the NUMA feature does not work on Arm64.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#disks-and-volumes","title":"Disks and Volumes","text":"Arm64 only supports virtio and scsi disk bus types.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#interface-and-networks","title":"Interface and Networks","text":""},{"location":"cluster_admin/virtual_machines_on_Arm64/#macvlan","title":"macvlan","text":"We do not support macvlan
network because the project https://github.com/kubevirt/macvtap-cni does not support Arm64.
This class of devices is not verified on the Arm64 platform.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#liveness-and-readiness-probes","title":"Liveness and Readiness Probes","text":"Watchdog
device is not supported on Arm64 platform.
KubeVirt included support for redirecting devices from the client's machine to the VMI with the support of virtctl command.
"},{"location":"compute/client_passthrough/#usb-redirection","title":"USB Redirection","text":"Support for redirection of client's USB device was introduced in release v0.44. This feature is not enabled by default. To enable it, add an empty clientPassthrough
under devices, as such:
spec:\n domain:\n devices:\n clientPassthrough: {}\n
This configuration currently adds 4 USB slots to the VMI that can only be used with virtctl.
There are two ways of redirecting the same USB devices: Either using its device's vendor and product information or the actual bus and device address information. In Linux, you can gather this info with lsusb
, a redacted example below:
> lsusb\nBus 002 Device 008: ID 0951:1666 Kingston Technology DataTraveler 100 G3/G4/SE9 G2/50\nBus 001 Device 003: ID 13d3:5406 IMC Networks Integrated Camera\nBus 001 Device 010: ID 0781:55ae SanDisk Corp. Extreme 55AE\n
"},{"location":"compute/client_passthrough/#using-vendor-and-product","title":"Using Vendor and Product","text":"Redirecting the Kingston storage device.
virtctl usbredir 0951:1666 vmi-name\n
"},{"location":"compute/client_passthrough/#using-bus-and-device-address","title":"Using Bus and Device address","text":"Redirecting the integrated camera
virtctl usbredir 01-03 vmi-name\n
"},{"location":"compute/client_passthrough/#requirements-for-virtctl-usbredir","title":"Requirements for virtctl usbredir
","text":"The virtctl
command uses an application called usbredirect
to handle client's USB device by unplugging the device from the Client OS and channeling the communication between the device and the VMI.
The usbredirect
binary comes from the usbredir project and is supported by most Linux distros. You can either fetch the latest release or MSI installer for Windows support.
Managing USB devices requires privileged access in most Operation Systems. The user running virtctl usbredir
would need to be privileged or run it in a privileged manner (e.g: with sudo
)
usbredirect
included in the PATH Enviroment Variable.The CPU hotplug feature was introduced in KubeVirt v1.0, making it possible to configure the VM workload to allow for adding or removing virtual CPUs while the VM is running.
"},{"location":"compute/cpu_hotplug/#abstract","title":"Abstract","text":"A virtual CPU (vCPU) is the CPU that is seen to the Guest VM OS. A VM owner can manage the amount of vCPUs from the VM spec template using the CPU topology fields (spec.template.spec.domain.cpu
). The cpu
object has the integers cores,sockets,threads
so that the virtual CPU is calculated by the following formula: cores * sockets * threads
.
Before CPU hotplug was introduced, the VM owner could change these integers in the VM template while the VM is running, and they were staged until the next boot cycle. With CPU hotplug, it is possible to patch the sockets
integer in the VM template and the change will take effect right away.
Per each new socket that is hot-plugged, the amount of new vCPUs that would be seen by the guest is cores * threads
, since the overall calculation of vCPUs is cores * sockets * threads
.
In order to enable CPU hotplug we need to add the VMLiveUpdateFeatures
feature gate in Kubevirt CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - VMLiveUpdateFeatures\n
"},{"location":"compute/cpu_hotplug/#configure-the-workload-update-strategy","title":"Configure the workload update strategy","text":"Current implementation of the hotplug process requires the VM to live-migrate. The migration will be triggered automatically by the workload updater. The workload update strategy in the KubeVirt CR must be configured with LiveMigrate
, as follows:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - LiveMigrate\n
"},{"location":"compute/cpu_hotplug/#configure-the-vm-rollout-strategy","title":"Configure the VM rollout strategy","text":"Hotplug requires a VM rollout strategy of LiveUpdate
, so that the changes made to the VM object propagate to the VMI without a restart. This is also done in the KubeVirt CR configuration:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmRolloutStrategy: \"LiveUpdate\"\n
More information can be found on the VM Rollout Strategies page
"},{"location":"compute/cpu_hotplug/#optional-set-maximum-sockets-or-hotplug-ratio","title":"[OPTIONAL] Set maximum sockets or hotplug ratio","text":"You can explicitly set the maximum amount of sockets in three ways:
maxSockets = ratio * sockets
).Note: the third way (cluster-level ratio) will also affect other quantitative hotplug resources like memory.
VM level
Cluster level value
Cluster level ratio
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nspec:\n template:\n spec:\n domain:\n cpu:\n maxSockets: 8\n
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n liveUpdateConfiguration:\n maxCpuSockets: 8\n
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n liveUpdateConfiguration:\n maxHotplugRatio: 4\n
The VM-level configuration will take precedence over the cluster-wide configuration.
"},{"location":"compute/cpu_hotplug/#hotplug-process","title":"Hotplug process","text":"Let's assume we have a running VM with the 4 vCPUs, which were configured with sockets:4 cores:1 threads:1
In the VMI status we can observe the current CPU topology the VM is running with:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\n...\nstatus:\n currentCPUTopology:\n cores: 1\n sockets: 4\n threads: 1\n
Now we want to hotplug another socket, by patching the VM object: kubectl patch vm vm-cirros --type='json' \\\n-p='[{\"op\": \"replace\", \"path\": \"/spec/template/spec/domain/cpu/sockets\", \"value\": 5}]'\n
We can observe the CPU hotplug process in the VMI status: status:\n conditions:\n - lastProbeTime: null\n lastTransitionTime: null\n status: \"True\"\n type: LiveMigratable\n - lastProbeTime: null\n lastTransitionTime: null\n status: \"True\"\n type: HotVCPUChange\n currentCPUTopology:\n cores: 1\n sockets: 4\n threads: 1\n
Please note the condition HotVCPUChange
that indicates the hotplug process is taking place. Also you can notice the VirtualMachineInstanceMigration object that was created for the VM in subject:
NAME PHASE VMI\nkubevirt-workload-update-kflnl Running vm-cirros\n
When the hotplug process has completed, the currentCPUTopology
will be updated with the new number of sockets and the migration is marked as successful. #kubectl get vmi vm-cirros -oyaml\n\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: vm-cirros\nspec:\n domain:\n cpu:\n cores: 1\n sockets: 5\n threads: 1\n...\n...\nstatus:\n currentCPUTopology:\n cores: 1\n sockets: 5\n threads: 1\n\n\n#kubectl get vmim -l kubevirt.io/vmi-name=vm-cirros\nNAME PHASE VMI\nkubevirt-workload-update-cgdgd Succeeded vm-cirros\n
"},{"location":"compute/cpu_hotplug/#limitations","title":"Limitations","text":"Certain workloads, requiring a predictable latency and enhanced performance during its execution would benefit from obtaining dedicated CPU resources. KubeVirt, relying on the Kubernetes CPU manager, is able to pin guest's vCPUs to the host's pCPUs.
"},{"location":"compute/dedicated_cpu_resources/#kubernetes-cpu-manager","title":"Kubernetes CPU manager","text":"Kubernetes CPU manager is a mechanism that affects the scheduling of workloads, placing it on a host which can allocate Guaranteed
resources and pin certain Pod's containers to host pCPUs, if the following requirements are met:
Additional information:
Setting spec.domain.cpu.dedicatedCpuPlacement
to true
in a VMI spec will indicate the desire to allocate dedicated CPU resource to the VMI
Kubevirt will verify that all the necessary conditions are met, for the Kubernetes CPU manager to pin the virt-launcher container to dedicated host CPUs. Once, virt-launcher is running, the VMI's vCPUs will be pinned to the pCPUS that has been dedicated for the virt-launcher container.
Expressing the desired amount of VMI's vCPUs can be done by either setting the guest topology in spec.domain.cpu
(sockets
, cores
, threads
) or spec.domain.resources.[requests/limits].cpu
to a whole number integer ([1-9]+) indicating the number of vCPUs requested for the VMI. Number of vCPUs is counted as sockets * cores * threads
or if spec.domain.cpu
is empty then it takes value from spec.domain.resources.requests.cpu
or spec.domain.resources.limits.cpu
.
Note: Users should not specify both spec.domain.cpu
and spec.domain.resources.[requests/limits].cpu
Note: spec.domain.resources.requests.cpu
must be equal to spec.domain.resources.limits.cpu
Note: Multiple cpu-bound microbenchmarks show a significant performance advantage when using spec.domain.cpu.sockets
instead of spec.domain.cpu.cores
.
All inconsistent requirements will be rejected.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n cpu:\n sockets: 2\n cores: 1\n threads: 1\n dedicatedCpuPlacement: true\n resources:\n limits:\n memory: 2Gi\n[...]\n
OR
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n cpu:\n dedicatedCpuPlacement: true\n resources:\n limits:\n cpu: 2\n memory: 2Gi\n[...]\n
"},{"location":"compute/dedicated_cpu_resources/#requesting-dedicated-cpu-for-qemu-emulator","title":"Requesting dedicated CPU for QEMU emulator","text":"A number of QEMU threads, such as QEMU main event loop, async I/O operation completion, etc., also execute on the same physical CPUs as the VMI's vCPUs. This may affect the expected latency of a vCPU. In order to enhance the real-time support in KubeVirt and provide improved latency, KubeVirt will allocate an additional dedicated CPU, exclusively for the emulator thread, to which it will be pinned. This will effectively \"isolate\" the emulator thread from the vCPUs of the VMI. In case ioThreadsPolicy
is set to auto
IOThreads will also be \"isolated\" and placed on the same physical CPU as the QEMU emulator thread.
This functionality can be enabled by specifying isolateEmulatorThread: true
inside VMI spec's Spec.Domain.CPU
section. Naturally, this setting has to be specified in a combination with a dedicatedCpuPlacement: true
.
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n cpu:\n dedicatedCpuPlacement: true\n isolateEmulatorThread: true\n resources:\n limits:\n cpu: 2\n memory: 2Gi\n
"},{"location":"compute/dedicated_cpu_resources/#compute-nodes-with-smt-enabled","title":"Compute Nodes with SMT Enabled","text":"When the following conditions are met:
dedicatedCpuPlacement
and isolateEmulatorThread
are enabledThe VM is scheduled, but rejected by the kubelet with the following event:
SMT Alignment Error: requested 3 cpus not multiple cpus per core = 2\n
In order to address this issue:
AlignCPUs
feature gate in the KubeVirt CR.alpha.kubevirt.io/EmulatorThreadCompleteToEvenParity:\n
KubeVirt will then add one or two dedicated CPUs for the emulator threads, in a way that completes the total CPU count to be even.
"},{"location":"compute/dedicated_cpu_resources/#identifying-nodes-with-a-running-cpu-manager","title":"Identifying nodes with a running CPU manager","text":"At this time, Kubernetes doesn't label the nodes that has CPU manager running on it.
KubeVirt has a mechanism to identify which nodes has the CPU manager running and manually add a cpumanager=true
label. This label will be removed when KubeVirt will identify that CPU manager is no longer running on the node. This automatic identification should be viewed as a temporary workaround until Kubernetes will provide the required functionality. Therefore, this feature should be manually enabled by activating the CPUManager
feature gate to the KubeVirt CR.
When automatic identification is disabled, cluster administrator may manually add the above label to all the nodes when CPU Manager is running.
Nodes' labels are view-able: kubectl describe nodes
Administrators may manually label a missing node: kubectl label node [node_name] cpumanager=true
Note: In order to run sidecar containers, KubeVirt requires the Sidecar
feature gate to be enabled in KubeVirt's CR.
According to the Kubernetes CPU manager model, in order the POD would reach the required QOS level Guaranteed
, all containers in the POD must express CPU and memory requirements. At this time, Kubevirt often uses a sidecar container to mount VMI's registry disk. It also uses a sidecar container of it's hooking mechanism. These additional resources can be viewed as an overhead and should be taken into account when calculating a node capacity.
Note: The current defaults for sidecar's resources: CPU: 200m
Memory: 64M
As the CPU resource is not expressed as a whole number, CPU manager will not attempt to pin the sidecar container to a host CPU.
KubeVirt provides a mechanism for assigning host devices to a virtual machine. This mechanism is generic and allows various types of PCI devices, such as accelerators (including GPUs) or any other devices attached to a PCI bus, to be assigned. It also allows Linux Mediated devices, such as pre-configured virtual GPUs to be assigned using the same mechanism.
"},{"location":"compute/host-devices/#host-preparation-for-pci-passthrough","title":"Host preparation for PCI Passthrough","text":"Host Devices passthrough requires the virtualization extension and the IOMMU extension (Intel VT-d or AMD IOMMU) to be enabled in the BIOS.
To enable IOMMU, depending on the CPU type, a host should be booted with an additional kernel parameter, intel_iommu=on
for Intel and amd_iommu=on
for AMD.
Append these parameters to the end of the GRUB_CMDLINE_LINUX line in the grub configuration file.
# vi /etc/default/grub\n...\nGRUB_CMDLINE_LINUX=\"nofb splash=quiet console=tty0 ... intel_iommu=on\n...\n\n# grub2-mkconfig -o /boot/grub2/grub.cfg\n\n# reboot\n
# modprobe vfio-pci\n
At this time, KubeVirt is only able to assign PCI devices that are using the vfio-pci
driver. To prepare a specific device for device assignment, it should first be unbound from its original driver and bound to the vfio-pci
driver.
$ lspci -DD|grep NVIDIA\n0000.65:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)\n
vfio-pci
driver: echo 0000:65:00.0 > /sys/bus/pci/drivers/nvidia/unbind\necho \"vfio-pci\" > /sys/bus/pci/devices/0000\\:65\\:00.0/driver_override\necho 0000:65:00.0 > /sys/bus/pci/drivers/vfio-pci/bind\n
In general, configuration of a Mediated devices (mdevs), such as vGPUs, should be done according to the vendor directions. KubeVirt can now facilitate the creation of the mediated devices / vGPUs on the cluster nodes. This assumes that the required vendor driver is already installed on the nodes. See the Mediated devices and virtual GPUs to learn more about this functionality.
Once the mdev is configured, KubeVirt will be able to discover and use it for device assignment.
"},{"location":"compute/host-devices/#listing-permitted-devices","title":"Listing permitted devices","text":"Administrators can control which host devices are exposed and permitted to be used in the cluster. Permitted host devices in the cluster will need to be allowlisted in KubeVirt CR by its vendor:product
selector for PCI devices or mediated device names.
configuration:\n permittedHostDevices:\n pciHostDevices:\n - pciVendorSelector: \"10DE:1EB8\"\n resourceName: \"nvidia.com/TU104GL_Tesla_T4\"\n externalResourceProvider: true\n - pciVendorSelector: \"8086:6F54\"\n resourceName: \"intel.com/qat\"\n mediatedDevices:\n - mdevNameSelector: \"GRID T4-1Q\"\n resourceName: \"nvidia.com/GRID_T4-1Q\"\n
pciVendorSelector
is a PCI vendor ID and product ID tuple in the form vendor_id:product_id
. This tuple can identify specific types of devices on a host. For example, the identifier 10de:1eb8
, shown above, can be found using lspci
.
$ lspci -nnv|grep -i nvidia\n65:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)\n
mdevNameSelector
is a name of a Mediated device type that can identify specific types of Mediated devices on a host.
You can see what mediated types a given PCI device supports by examining the contents of /sys/bus/pci/devices/SLOT:BUS:DOMAIN.FUNCTION/mdev_supported_types/TYPE/name
. For example, if you have an NVIDIA T4 GPU on your system, and you substitute in the SLOT
, BUS
, DOMAIN
, and FUNCTION
values that are correct for your system into the above path name, you will see that a TYPE
of nvidia-226
contains the selector string GRID T4-2A
in its name
file.
Taking GRID T4-2A
and specifying it as the mdevNameSelector
allows KubeVirt to find a corresponding mediated device by matching it against /sys/class/mdev_bus/SLOT:BUS:DOMAIN.FUNCTION/$mdevUUID/mdev_type/name
for some values of SLOT:BUS:DOMAIN.FUNCTION
and $mdevUUID
.
External providers: externalResourceProvider
field indicates that this resource is being provided by an external device plugin. In this case, KubeVirt will only permit the usage of this device in the cluster but will leave the allocation and monitoring to an external device plugin.
Host devices can be assigned to virtual machines via the gpus
and hostDevices
fields. The deviceNames
can reference both PCI and Mediated device resource names.
kind: VirtualMachineInstance\nspec:\n domain:\n devices:\n gpus:\n - deviceName: nvidia.com/TU104GL_Tesla_T4\n name: gpu1\n - deviceName: nvidia.com/GRID_T4-1Q\n name: gpu2\n hostDevices:\n - deviceName: intel.com/qat\n name: quickaccess1\n
"},{"location":"compute/host-devices/#nvme-pci-passthrough","title":"NVMe PCI passthrough","text":"In order to passthrough an NVMe device the procedure is very similar to the gpu case. The device needs to be listed under the permittedHostDevice
and under hostDevices
in the VM declaration.
Currently, the KubeVirt device plugin doesn't allow the user to select a specific device by specifying the address. Therefore, if multiple NVMe devices with the same vendor and product id exist in the cluster, they could be randomly assigned to a VM. If the devices are not on the same node, then the nodeSelector mitigates the issue.
Example:
Modify the permittedHostDevice
configuration:\n permittedHostDevices:\n pciHostDevices:\n - pciVendorSelector: 8086:5845\n resourceName: devices.kubevirt.io/nvme\n
VMI declaration:
kind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-nvme\n name: vmi-nvme\nspec:\n nodeSelector: \n kubernetes.io/hostname: node03 # <--\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n hostDevices: # <--\n - name: nvme # <--\n deviceName: devices.kubevirt.io/nvme # <--\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n
"},{"location":"compute/host-devices/#usb-host-passthrough","title":"USB Host Passthrough","text":"Since KubeVirt v1.1, we can provide USB devices that are plugged in a Node to the VM running in the same Node.
"},{"location":"compute/host-devices/#requirements","title":"Requirements","text":"Cluster admin privilege to edit the KubeVirt CR in order to:
HostDevices
feature gatepermittedHostDevices
configuration to expose node USB devices to the clusterIn order to assign USB devices to your VMI, you'll need to expose those devices to the cluster under a resource name. The device allowlist can be edited in KubeVirt CR under configuration.permittedHostDevices.usb
.
For this example, we will use the kubevirt.io/storage
resource name for the device with vendor: \"46f4\"
and product: \"0001\"
1.
spec:\n configuration:\n permittedHostDevices:\n usb:\n - resourceName: kubevirt.io/storage\n selectors:\n - vendor: \"46f4\"\n product: \"0001\"\n
After adding the usb
configuration under permittedHostDevices
to the KubeVirt CR, KubeVirt's device-plugin will expose this resource name and you can use it in your VMI.
Now, in the VMI configuration, you can add the devices.hostDevices.deviceName
and reference the resource name provided in the previous step, and also give it a local name
, for example:
spec:\n domain:\n devices:\n hostDevices:\n - deviceName: kubevirt.io/storage\n name: usb-storage\n
You can find a working example, which uses QEMU's emulated USB storage, under examples/vmi-usb.yaml.
"},{"location":"compute/host-devices/#bundle-of-usb-devices","title":"Bundle of USB devices","text":"You might be interested to redirect more than one USB device to a VMI, for example, a keyboard, a mouse and a smartcard device. The KubeVirt CR supports assigning multiple USB devices under the same resource name, so you could do:
spec:\n configuration:\n permittedHostDevices:\n usb:\n - resourceName: kubevirt.io/peripherals\n selectors:\n - vendor: \"045e\"\n product: \"07a5\"\n - vendor: \"062a\"\n product: \"4102\"\n - vendor: \"072f\"\n product: \"b100\"\n
Adding to the VMI configuration:
spec:\n domain:\n devices:\n hostDevices:\n - deviceName: kubevirt.io/peripherals\n name: local-peripherals \n
Note that all USB devices need to be present in order for the assignment to work.
Note that you can easily find the vendor:product
value with the lsusb
command.\u00a0\u21a9
For hugepages support you need at least Kubernetes version 1.9
.
To enable hugepages on Kubernetes, check the official documentation.
To enable hugepages on OKD, check the official documentation.
"},{"location":"compute/hugepages/#pre-allocate-hugepages-on-a-node","title":"Pre-allocate hugepages on a node","text":"To pre-allocate hugepages on boot time, you will need to specify hugepages under kernel boot parameters hugepagesz=2M hugepages=64
and restart your machine.
You can find more about hugepages under official documentation.
"},{"location":"compute/live_migration/","title":"Live Migration","text":"Live migration is a process during which a running Virtual Machine Instance moves to another compute node while the guest workload continues to run and remain accessible.
"},{"location":"compute/live_migration/#enabling-the-live-migration-support","title":"Enabling the live-migration support","text":"Live migration is enabled by default in recent versions of KubeVirt. Versions prior to v0.56, it must be enabled in the feature gates. The feature gates field in the KubeVirt CR must be expanded by adding the LiveMigration
to it.
Virtual machines using a PersistentVolumeClaim (PVC) must have a shared ReadWriteMany (RWX) access mode to be live migrated.
Live migration is not allowed with a pod network binding of bridge interface type ()
Live migration requires ports 49152, 49153
to be available in the virt-launcher pod. If these ports are explicitly specified in masquarade interface, live migration will not function.
Live migration is initiated by posting a VirtualMachineInstanceMigration (VMIM) object to the cluster. The example below starts a migration process for a virtual machine instance vmi-fedora
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceMigration\nmetadata:\n name: migration-job\nspec:\n vmiName: vmi-fedora\n
"},{"location":"compute/live_migration/#using-virtctl-to-initiate-live-migration","title":"Using virtctl to initiate live migration","text":"Live migration can also be initiated using virtctl
virtctl migrate vmi-fedora\n
"},{"location":"compute/live_migration/#migration-status-reporting","title":"Migration Status Reporting","text":""},{"location":"compute/live_migration/#condition-and-migration-method","title":"Condition and migration method","text":"When starting a virtual machine instance, it has also been calculated whether the machine is live migratable. The result is being stored in the VMI VMI.status.conditions
. The calculation can be based on multiple parameters of the VMI, however, at the moment, the calculation is largely based on the Access Mode
of the VMI volumes. Live migration is only permitted when the volume access mode is set to ReadWriteMany
. Requests to migrate a non-LiveMigratable VMI will be rejected.
The reported Migration Method
is also being calculated during VMI start. BlockMigration
indicates that some of the VMI disks require copying from the source to the destination. LiveMigration
means that only the instance memory will be copied.
Status:\n Conditions:\n Status: True\n Type: LiveMigratable\n Migration Method: BlockMigration\n
"},{"location":"compute/live_migration/#migration-status","title":"Migration Status","text":"The migration progress status is being reported in the VMI VMI.status
. Most importantly, it indicates whether the migration has been Completed
or if it Failed
.
Below is an example of a successful migration.
Migration State:\n Completed: true\n End Timestamp: 2019-03-29T03:37:52Z\n Migration Config:\n Completion Timeout Per GiB: 800\n Progress Timeout: 150\n Migration UID: c64d4898-51d3-11e9-b370-525500d15501\n Source Node: node02\n Start Timestamp: 2019-03-29T04:02:47Z\n Target Direct Migration Node Ports:\n 35001: 0\n 41068: 49152\n 38284: 49153\n Target Node: node01\n Target Node Address: 10.128.0.46\n Target Node Domain Detected: true\n Target Pod: virt-launcher-testvmimcbjgw6zrzcmp8wpddvztvzm7x2k6cjbdgktwv8tkq\n
"},{"location":"compute/live_migration/#canceling-a-live-migration","title":"Canceling a live migration","text":"Live migration can also be canceled by simply deleting the migration object. A successfully aborted migration will indicate that the abort has been requested Abort Requested
, and that it succeeded: Abort Status: Succeeded
. The migration in this case will be Completed
and Failed
.
Migration State:\n Abort Requested: true\n Abort Status: Succeeded\n Completed: true\n End Timestamp: 2019-03-29T04:02:49Z\n Failed: true\n Migration Config:\n Completion Timeout Per GiB: 800\n Progress Timeout: 150\n Migration UID: 57a693d6-51d7-11e9-b370-525500d15501\n Source Node: node02\n Start Timestamp: 2019-03-29T04:02:47Z\n Target Direct Migration Node Ports:\n 39445: 0\n 43345: 49152\n 44222: 49153\n Target Node: node01\n Target Node Address: 10.128.0.46\n Target Node Domain Detected: true\n Target Pod: virt-launcher-testvmimcbjgw6zrzcmp8wpddvztvzm7x2k6cjbdgktwv8tkq\n
"},{"location":"compute/live_migration/#using-virtctl-to-cancel-a-live-migration","title":"Using virtctl to cancel a live migration","text":"Live migration can also be canceled using virtctl, by specifying the name of a VMI which is currently being migrated
virtctl migrate-cancel vmi-fedora\n
"},{"location":"compute/live_migration/#changing-cluster-wide-migration-limits","title":"Changing Cluster Wide Migration Limits","text":"KubeVirt puts some limits in place, so that migrations don't overwhelm the cluster. By default, it is configured to only run 5
migrations in parallel with an additional limit of a maximum of 2
outbound migrations per node. Finally, every migration is limited to a bandwidth of 64MiB/s
.
These values can be changed in the kubevirt
CR:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n migrations:\n parallelMigrationsPerCluster: 5\n parallelOutboundMigrationsPerNode: 2\n bandwidthPerMigration: 64Mi\n completionTimeoutPerGiB: 800\n progressTimeout: 150\n disableTLS: false\n nodeDrainTaintKey: \"kubevirt.io/drain\"\n allowAutoConverge: false\n allowPostCopy: false\n unsafeMigrationOverride: false\n
Bear in mind that most of these configuration can be overridden and fine-tuned to a specified group of VMs. For more information, please see Migration Policies.
"},{"location":"compute/live_migration/#understanding-different-migration-strategies","title":"Understanding different migration strategies","text":"Live migration is a complex process. During a migration, the source VM needs to transfer its whole state (mainly RAM) to the target VM. If there are enough resources available, such as network bandwidth and CPU power, migrations should converge nicely. If this is not the scenario, however, the migration might get stuck without an ability to progress.
The main factor that affects migrations from the guest perspective is its dirty rate
, which is the rate by which the VM dirties memory. Guests with high dirty rate lead to a race during migration. On the one hand, memory would be transferred continuously to the target, and on the other, the same memory would get dirty by the guest. On such scenarios, one could consider to use more advanced migration strategies.
Let's explain the 3 supported migration strategies as of today.
"},{"location":"compute/live_migration/#pre-copy","title":"Pre-copy","text":"Pre-copy is the default strategy. It should be used for most cases.
The way it works is as following:
Pre-copy is the safest and fastest strategy for most cases. Furthermore, it can be easily cancelled, can utilize multithreading, and more. If there is no real reason to use another strategy, this is definitely the strategy to go with.
However, on some cases migrations might not converge easily, that is, by the time the chunk of source VM state would be received by the target VM, it would already be mutated by the source VM (which is the VM the guest executes on). There are many reasons for migrations to fail converging, such as a high dirty-rate or low resources like network bandwidth and CPU. On such scenarios, see the following alternative strategies below.
"},{"location":"compute/live_migration/#post-copy","title":"Post-copy","text":"The way post-copy migrations work is as following:
The main idea here is that the guest starts to run immediately on the target VM. This approach has advantages and disadvantages:
advantages:
disadvantages:
Auto-converge is a technique to help pre-copy migrations converge faster without changing the core algorithm of how the migration works.
Since a high dirty-rate is usually the most significant factor for migrations to not converge, auto-converge simply throttles the guest's CPU. If the migration would converge fast enough, the guest's CPU would not be throttled or throttled negligibly. But, if the migration would not converge fast enough, the CPU would be throttled more and more as time goes.
This technique dramatically increases the probability of the migration converging eventually.
"},{"location":"compute/live_migration/#using-a-different-network-for-migrations","title":"Using a different network for migrations","text":"Live migrations can be configured to happen on a different network than the one Kubernetes is configured to use. That potentially allows for more determinism, control and/or bandwidth, depending on use-cases.
"},{"location":"compute/live_migration/#creating-a-migration-network-on-a-cluster","title":"Creating a migration network on a cluster","text":"A separate physical network is required, meaning that every node on the cluster has to have at least 2 NICs, and the NICs that will be used for migrations need to be interconnected, i.e. all plugged to the same switch. The examples below assume that eth1
will be used for migrations.
It is also required for the Kubernetes cluster to have multus installed.
If the desired network doesn't include a DHCP server, then whereabouts will be needed as well.
Finally, a NetworkAttachmentDefinition needs to be created in the namespace where KubeVirt is installed. Here is an example:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: migration-network\n namespace: kubevirt\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"migration-bridge\",\n \"type\": \"macvlan\",\n \"master\": \"eth1\",\n \"mode\": \"bridge\",\n \"ipam\": {\n \"type\": \"whereabouts\",\n \"range\": \"10.1.1.0/24\"\n }\n }'\n
"},{"location":"compute/live_migration/#configuring-kubevirt-to-migrate-vmis-over-that-network","title":"Configuring KubeVirt to migrate VMIs over that network","text":"This is just a matter of adding the name of the NetworkAttachmentDefinition to the KubeVirt CR, like so:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - LiveMigration\n migrations:\n network: migration-network\n
That change will trigger a restart of the virt-handler pods, as they get connected to that new network.
From now on, migrations will happen over that network.
"},{"location":"compute/live_migration/#configuring-kubevirtci-for-testing-migration-networks","title":"Configuring KubeVirtCI for testing migration networks","text":"Developers and people wanting to test the feature before deploying it on a real cluster might want to configure a dedicated migration network in KubeVirtCI.
KubeVirtCI can simply be configured to include a virtual secondary network, as well as automatically install multus and whereabouts. The following environment variables just have to be declared before running make cluster-up
:
export KUBEVIRT_NUM_NODES=2;\nexport KUBEVIRT_NUM_SECONDARY_NICS=1;\nexport KUBEVIRT_DEPLOY_ISTIO=true;\nexport KUBEVIRT_WITH_CNAO=true\n
"},{"location":"compute/live_migration/#migration-timeouts","title":"Migration timeouts","text":"Depending on the type, the live migration process will copy virtual machine memory pages and disk blocks to the destination. During this process non-locked pages and blocks are being copied and become free for the instance to use again. To achieve a successful migration, it is assumed that the instance will write to the free pages and blocks (pollute the pages) at a lower rate than these are being copied.
"},{"location":"compute/live_migration/#completion-time","title":"Completion time","text":"In some cases the virtual machine can write to different memory pages / disk blocks at a higher rate than these can be copied, which will prevent the migration process from completing in a reasonable amount of time. In this case, live migration will be aborted if it is running for a long period of time. The timeout is calculated base on the size of the VMI, it's memory and the ephemeral disks that are needed to be copied. The configurable parameter completionTimeoutPerGiB
, which defaults to 800s is the time for GiB of data to wait for the migration to be completed before aborting it. A VMI with 8Gib of memory will time out after 6400 seconds.
Live migration will also be aborted when it will be noticed that copying memory doesn't make any progress. The time to wait for live migration to make progress in transferring data is configurable by progressTimeout
parameter, which defaults to 150s
FEATURE STATE: KubeVirt v0.43
Sometimes it may be desirable to disable TLS encryption of migrations to improve performance. Use disableTLS
to do that:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - \"LiveMigration\"\n migrationConfiguration:\n disableTLS: true\n
Note: While this increases performance it may allow MITM attacks. Be careful.
"},{"location":"compute/mediated_devices_configuration/","title":"Mediated devices and virtual GPUs","text":""},{"location":"compute/mediated_devices_configuration/#configuring-mediated-devices-and-virtual-gpus","title":"Configuring mediated devices and virtual GPUs","text":"KubeVirt aims to facilitate the configuration of mediated devices on large clusters. Administrators can use the mediatedDevicesConfiguration
API in the KubeVirt CR to create or remove mediated devices in a declarative way, by providing a list of the desired mediated device types that they expect to be configured in the cluster.
You can also include the nodeMediatedDeviceTypes
option to provide a more specific configuration that targets a specific node or a group of nodes directly with a node selector. The nodeMediatedDeviceTypes
option must be used in combination with mediatedDevicesTypes
in order to override the global configuration set in the mediatedDevicesTypes
section.
KubeVirt will use the provided configuration to automatically create the relevant mdev/vGPU devices on nodes that can support it.
Currently, a single mdev type per card will be configured. The maximum amount of instances of the selected mdev type will be configured per card.
Note: Some vendors, such as NVIDIA, require a driver to be installed on the nodes to provide mediated devices, including vGPUs.
Example snippet of a KubeVirt CR configuration that includes both nodeMediatedDeviceTypes
and mediatedDevicesTypes
:
spec:\n configuration:\n mediatedDevicesConfiguration:\n mediatedDevicesTypes:\n - nvidia-222\n - nvidia-228\n nodeMediatedDeviceTypes:\n - nodeSelector:\n kubernetes.io/hostname: nodeName\n mediatedDevicesTypes:\n - nvidia-234\n
"},{"location":"compute/mediated_devices_configuration/#configuration-scenarios","title":"Configuration scenarios","text":""},{"location":"compute/mediated_devices_configuration/#example-large-cluster-with-multiple-cards-on-each-node","title":"Example: Large cluster with multiple cards on each node","text":"On nodes with multiple cards that can support similar vGPU types, the relevant desired types will be created in a round-robin manner.
For example, considering the following KubeVirt CR configuration:
spec:\n configuration:\n mediatedDevicesConfiguration:\n mediatedDevicesTypes:\n - nvidia-222\n - nvidia-228\n - nvidia-105\n - nvidia-108\n
This cluster has nodes with two different PCIe cards:
Nodes with 3 Tesla T4 cards, where each card can support multiple devices types:
Nodes with 2 Tesla V100 cards, where each card can support multiple device types:
KubeVirt will then create the following devices:
When nodes only have a single card, the first supported type from the list will be configured.
For example, consider the following list of desired types, where nvidia-223 and nvidia-224 are supported:
spec:\n configuration:\n mediatedDevicesConfiguration:\n mediatedDevicesTypes:\n - nvidia-223\n - nvidia-224\n
In this case, nvidia-223 will be configured on the node because it is the first supported type in the list."},{"location":"compute/mediated_devices_configuration/#overriding-configuration-on-a-specifc-node","title":"Overriding configuration on a specifc node","text":"To override the global configuration set by mediatedDevicesTypes
, include the nodeMediatedDeviceTypes
option, specifying the node selector and the mediatedDevicesTypes
that you want to override for that node.
In this example, the KubeVirt CR includes the nodeMediatedDeviceTypes
option to override the global configuration specifically for node 2, which will only use the nvidia-234 type.
spec:\n configuration:\n mediatedDevicesConfiguration:\n mediatedDevicesTypes:\n - nvidia-230\n - nvidia-223\n - nvidia-224\n nodeMediatedDeviceTypes:\n - nodeSelector:\n kubernetes.io/hostname: node2 \n mediatedDevicesTypes:\n - nvidia-234\n
The cluster has two nodes that both have 3 Tesla T4 cards.
KubeVirt will then create the following devices:
Node 1 has been configured in a round-robin manner based on the global configuration but node 2 only uses the nvidia-234 that was specified for it.
"},{"location":"compute/mediated_devices_configuration/#updating-and-removing-vgpu-types","title":"Updating and Removing vGPU types","text":"Changes made to the mediatedDevicesTypes
section of the KubeVirt CR will trigger a re-evaluation of the configured mdevs/vGPU types on the cluster nodes.
Any change to the node labels that match the nodeMediatedDeviceTypes
nodeSelector in the KubeVirt CR will trigger a similar re-evaluation.
Consequently, mediated devices will be reconfigured or entirely removed based on the updated configuration.
"},{"location":"compute/mediated_devices_configuration/#assigning-vgpumdev-to-a-virtual-machine","title":"Assigning vGPU/MDEV to a Virtual Machine","text":"See the Host Devices Assignment to learn how to consume the newly created mediated devices/vGPUs.
"},{"location":"compute/memory_dump/","title":"Virtual machine memory dump","text":"Kubevirt now supports getting a VM memory dump for analysis purposes. The Memory dump can be used to diagnose, identify and resolve issues in the VM. Typically providing information about the last state of the programs, applications and system before they were terminated or crashed.
Note This memory dump is not used for saving VM state and resuming it later.
"},{"location":"compute/memory_dump/#prerequisites","title":"Prerequisites","text":""},{"location":"compute/memory_dump/#hot-plug-feature-gate","title":"Hot plug Feature Gate","text":"The memory dump process mounts a PVC to the virt-launcher in order to get the output in that PVC, hence the hot plug volumes feature gate must be enabled. The feature gates field in the KubeVirt CR must be expanded by adding the HotplugVolumes
to it.
Now lets assume we have a running VM and the name of the VM is 'my-vm'. We can either dump to an existing pvc, or request one to be created.
"},{"location":"compute/memory_dump/#existing-pvc","title":"Existing PVC","text":"The size of the PVC must be big enough to hold the memory dump. The calculation is (VMMemorySize + 100Mi) * FileSystemOverhead, Where VMMemorySize
is the memory size, 100Mi is reserved space for the memory dump overhead and FileSystemOverhead
is the value used to adjust requested PVC size with the filesystem overhead. also the PVC must have a FileSystem
volume mode.
Example for such PVC:
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: my-pvc\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 2Gi\n storageClassName: rook-ceph-block\n volumeMode: Filesystem\n
We can get a memory dump of the VM to the PVC by using the 'memory-dump get' command available with virtctl
$ virtctl memory-dump get my-vm --claim-name=my-pvc\n
"},{"location":"compute/memory_dump/#on-demand-pvc","title":"On demand PVC","text":"For on demand PVC, we need to add --create-claim
flag to the virtctl request:
$ virtctl memory-dump get my-vm --claim-name=new-pvc --create-claim\n
A PVC with size big enough for the dump will be created. We can also request specific storage class and access mode with appropriate flags.
"},{"location":"compute/memory_dump/#download-memory-dump","title":"Download memory dump","text":"By adding the --output
flag, the memory will be dumped to the PVC and then downloaded to the given output path.
$ virtctl memory-dump get myvm --claim-name=memoryvolume --create-claim --output=memoryDump.dump.gz\n
For downloading the last memory dump from the PVC associated with the VM, without triggering another memory dump, use the memory dump download command.
$ virtctl memory-dump download myvm --output=memoryDump.dump.gz\n
For downloading a memory dump from a PVC already disassociated from the VM you can use the virtctl vmexport command
"},{"location":"compute/memory_dump/#monitoring-the-memory-dump","title":"Monitoring the memory dump","text":"Information regarding the memory dump process will be available on the VM's status section
memoryDumpRequest:\n claimName: memory-dump\n phase: Completed\n startTimestamp: \"2022-03-29T11:00:04Z\"\n endTimestamp: \"2022-03-29T11:00:09Z\"\n fileName: my-vm-my-pvc-20220329-110004\n
During the process the volumeStatus on the VMI will be updated with the process information such as the attachment pod information and messages, if all goes well once the process is completed, the PVC is unmounted from the virt-launcher pod and the volumeStatus is deleted. A memory dump annotation will be added to the PVC with the memory dump file name.
"},{"location":"compute/memory_dump/#retriggering-the-memory-dump","title":"Retriggering the memory dump","text":"Getting a new memory dump to the same PVC is possible without the need to use any flag:
$ virtctl memory-dump get my-vm\n
Note Each memory-dump command will delete the previous dump in that PVC.
In order to get a memory dump to a different PVC you need to 'remove' the current memory-dump PVC and then do a new get with the new PVC name.
"},{"location":"compute/memory_dump/#remove-memory-dump","title":"Remove memory dump","text":"As mentioned in order to remove the associated memory dump PVC you need to run a 'memory-dump remove' command. This will allow you to replace the current PVC and get the memory dump to a new one.
$ virtctl memory-dump remove my-vm\n
"},{"location":"compute/memory_dump/#handle-the-memory-dump","title":"Handle the memory dump","text":"Once the memory dump process is completed the PVC will hold the output. You can manage the dump in one of the following ways: - Download the memory dump - Create a pod with troubleshooting tools that will mount the PVC and inspect it within the pod. - Include the memory dump in the VM Snapshot (will include both the memory dump and the disks) to save a snapshot of the VM in that point of time and inspect it when needed. (The VM Snapshot can be exported and downloaded).
The output of the memory dump can be inspected with memory analysis tools for example Volatility3
"},{"location":"compute/memory_hotplug/","title":"Memory Hotplug","text":"Memory hotplug was introduced in KubeVirt version 1.1, enabling the dynamic resizing of the amount of memory available to a running VM.
"},{"location":"compute/memory_hotplug/#limitations","title":"Limitations","text":"To use memory hotplug we need to add the VMLiveUpdateFeatures
feature gate in the KubeVirt CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - VMLiveUpdateFeatures\n
"},{"location":"compute/memory_hotplug/#configure-the-workload-update-strategy","title":"Configure the Workload Update Strategy","text":"Configure LiveMigrate
as workloadUpdateStrategy
in the KubeVirt CR, since the current implementation of the hotplug process requires the VM to live-migrate.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - LiveMigrate\n
"},{"location":"compute/memory_hotplug/#configure-the-vm-rollout-strategy","title":"Configure the VM rollout strategy","text":"Finally, set the VM rollout strategy to LiveUpdate
, so that the changes made to the VM object propagate to the VMI without a restart. This is also done in the KubeVirt CR configuration:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmRolloutStrategy: \"LiveUpdate\"\n
NOTE: If memory hotplug is enabled/disabled on an already running VM, a reboot is necessary for the changes to take effect.
More information can be found on the VM Rollout Strategies page.
"},{"location":"compute/memory_hotplug/#optional-set-a-cluster-wide-maximum-amount-of-memory","title":"[OPTIONAL] Set a cluster-wide maximum amount of memory","text":"You can set the maximum amount of memory for the guest using a cluster level setting in the KubeVirt CR.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n liveUpdateConfiguration:\n maxGuest: 8Gi\n
The VM-level configuration will take precedence over the cluster-wide one.
"},{"location":"compute/memory_hotplug/#memory-hotplug-in-action","title":"Memory Hotplug in Action","text":"First we enable the VMLiveUpdateFeatures
feature gate, set the rollout strategy to LiveUpdate
and set LiveMigrate
as workloadUpdateStrategy
in the KubeVirt CR.
$ kubectl --namespace kubevirt patch kv kubevirt -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/developerConfiguration/featureGates\", \"value\": [\"VMLiveUpdateFeatures\"]}]' --type='json'\n$ kubectl --namespace kubevirt patch kv kubevirt -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/vmRolloutStrategy\", \"value\": \"LiveUpdate\"}]' --type='json'\n$ kubectl --namespace kubevirt patch kv kubevirt -p='[{\"op\": \"add\", \"path\": \"/spec/workloadUpdateStrategy/workloadUpdateMethods\", \"value\": [\"LiveMigrate\"]}]' --type='json'\n
Now we create a VM with memory hotplug enabled.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-cirros\nspec:\n running: true\n template:\n spec:\n domain:\n memory:\n maxGuest: 2Gi\n guest: 128Mi\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/alpine-container-disk-demo:devel\n name: containerdisk\n
The Virtual Machine will automatically start and once booted it will report the currently available memory to the guest in the status.memory
field inside the VMI.
$ kubectl get vmi vm-cirros -o json | jq .status.memory\n
{\n \"guestAtBoot\": \"128Mi\",\n \"guestCurrent\": \"128Mi\",\n \"guestRequested\": \"128Mi\"\n}\n
Since the Virtual Machine is now running we can patch the VM object to double the available guest memory so that we'll go from 128Mi to 256Mi.
$ kubectl patch vm vm-cirros -p='[{\"op\": \"replace\", \"path\": \"/spec/template/spec/domain/memory/guest\", \"value\": \"256Mi\"}]' --type='json'\n
After the hotplug request is processed and the Virtual Machine is live migrated, the new amount of memory should be available to the guest and visible in the VMI object.
$ kubectl get vmi vm-cirros -o json | jq .status.memory\n
{\n \"guestAtBoot\": \"128Mi\",\n \"guestCurrent\": \"256Mi\",\n \"guestRequested\": \"256Mi\"\n}\n
"},{"location":"compute/node_assignment/","title":"Node assignment","text":"You can constrain the VM to only run on specific nodes or to prefer running on specific nodes:
Setting spec.nodeSelector
requirements, constrains the scheduler to only schedule VMs on nodes, which contain the specified labels. In the following example the vmi contains the labels cpu: slow
and storage: fast
:
metadata:\n name: testvmi-ephemeral\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n nodeSelector:\n cpu: slow\n storage: fast\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
Thus the scheduler will only schedule the vmi to nodes which contain these labels in their metadata. It works exactly like the Pods nodeSelector
. See the Pod nodeSelector Documentation for more examples.
The spec.affinity
field allows specifying hard- and soft-affinity for VMs. It is possible to write matching rules against workloads (VMs and Pods) and Nodes. Since VMs are a workload type based on Pods, Pod-affinity affects VMs as well.
An example for podAffinity
and podAntiAffinity
may look like this:
metadata:\n name: testvmi-ephemeral\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n nodeSelector:\n cpu: slow\n storage: fast\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n affinity:\n podAffinity:\n requiredDuringSchedulingIgnoredDuringExecution:\n - labelSelector:\n matchExpressions:\n - key: security\n operator: In\n values:\n - S1\n topologyKey: failure-domain.beta.kubernetes.io/zone\n podAntiAffinity:\n preferredDuringSchedulingIgnoredDuringExecution:\n - weight: 100\n podAffinityTerm:\n labelSelector:\n matchExpressions:\n - key: security\n operator: In\n values:\n - S2\n topologyKey: kubernetes.io/hostname\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
Affinity and anti-affinity works exactly like the Pods affinity
. This includes podAffinity
, podAntiAffinity
, nodeAffinity
and nodeAntiAffinity
. See the Pod affinity and anti-affinity Documentation for more examples and details.
Affinity as described above, is a property of VMs that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite - they allow a node to repel a set of VMs.
Taints and tolerations work together to ensure that VMs are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any VMs that do not tolerate the taints. Tolerations are applied to VMs, and allow (but do not require) the VMs to schedule onto nodes with matching taints.
You add a taint to a node using kubectl taint. For example,
kubectl taint nodes node1 key=value:NoSchedule\n
An example for tolerations
may look like this:
metadata:\n name: testvmi-ephemeral\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n nodeSelector:\n cpu: slow\n storage: fast\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n tolerations:\n - key: \"key\"\n operator: \"Equal\"\n value: \"value\"\n effect: \"NoSchedule\"\n
"},{"location":"compute/node_assignment/#node-balancing-with-descheduler","title":"Node balancing with Descheduler","text":"In some cases we might need to rebalance the cluster on current scheduling policy and load conditions. Descheduler can find pods, which violates e.g. scheduling decisions and evict them based on descheduler policies. Kubevirt VMs are handled as pods with local storage, so by default, descheduler will not evict them. But it can be easily overridden by adding special annotation to the VMI template in the VM:
spec:\n template:\n metadata:\n annotations:\n descheduler.alpha.kubernetes.io/evict: true\n
This annotation will cause, that the descheduler will be able to evict the VM's pod which can then be scheduled by scheduler on different nodes. A VirtualMachine will never restart or re-create a VirtualMachineInstance until the current instance of the VirtualMachineInstance is deleted from the cluster.
"},{"location":"compute/node_overcommit/","title":"Node overcommit","text":"KubeVirt does not yet support classical Memory Overcommit Management or Memory Ballooning. In other words VirtualMachineInstances can't give back memory they have allocated. However, a few other things can be tweaked to reduce the memory footprint and overcommit the per-VMI memory overhead.
"},{"location":"compute/node_overcommit/#remove-the-graphical-devices","title":"Remove the Graphical Devices","text":"First the safest option to reduce the memory footprint, is removing the graphical device from the VMI by setting spec.domain.devices.autottachGraphicsDevice
to false
. See the video and graphics device documentation for further details and examples.
This will save a constant amount of 16MB
per VirtualMachineInstance but also disable VNC access.
Before you continue, make sure you make yourself comfortable with the Out of Resource Management of Kubernetes.
Every VirtualMachineInstance requests slightly more memory from Kubernetes than what was requested by the user for the Operating System. The additional memory is used for the per-VMI overhead consisting of our infrastructure which is wrapping the actual VirtualMachineInstance process.
In order to increase the VMI density on the node, it is possible to not request the additional overhead by setting spec.domain.resources.overcommitGuestOverhead
to true
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 30\n domain:\n resources:\n overcommitGuestOverhead: true\n requests:\n memory: 1024M\n[...]\n
This will work fine for as long as most of the VirtualMachineInstances will not request the whole memory. That is especially the case if you have short-lived VMIs. But if you have long-lived VirtualMachineInstances or do extremely memory intensive tasks inside the VirtualMachineInstance, your VMIs will use all memory they are granted sooner or later.
"},{"location":"compute/node_overcommit/#overcommit-guest-memory","title":"Overcommit Guest Memory","text":"The third option is real memory overcommit on the VMI. In this scenario the VMI is explicitly told that it has more memory available than what is requested from the cluster by setting spec.domain.memory.guest
to a value higher than spec.domain.resources.requests.memory
.
The following definition requests 1024MB
from the cluster but tells the VMI that it has 2048MB
of memory available:
apiVersion: kubevirt.io/v1alpha3\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 30\n domain:\n resources:\n overcommitGuestOverhead: true\n requests:\n memory: 1024M\n memory:\n guest: 2048M\n[...]\n
For as long as there is enough free memory available on the node, the VMI can happily consume up to 2048MB
. This VMI will get the Burstable
resource class assigned by Kubernetes (See QoS classes in Kubernetes for more details). The same eviction rules like for Pods apply to the VMI in case the node gets under memory pressure.
Implicit memory overcommit is disabled by default. This means that when memory request is not specified, it is set to match spec.domain.memory.guest
. However, it can be enabled using spec.configuration.developerConfiguration.memoryOvercommit
in the kubevirt
CR. For example, by setting memoryOvercommit: \"150\"
we define that when memory request is not explicitly set, it will be implicitly set to achieve memory overcommit of 150%. For instance, when spec.domain.memory.guest: 3072M
, memory request is set to 2048M, if omitted. Note that the actual memory request depends on additional configuration options like OvercommitGuestOverhead.
If the node gets under memory pressure, depending on the kubelet
configuration the virtual machines may get killed by the OOM handler or by the kubelet
itself. It is possible to tweak that behaviour based on the requirements of your VirtualMachineInstances by:
--system-reserved
and --kubelet-reserved
Note: Soft Eviction will effectively shutdown VirtualMachineInstances. They are not paused, hibernated or migrated. Further, Soft Eviction is disabled by default.
If configured, VirtualMachineInstances get evicted once the available memory falls below the threshold specified via --eviction-soft
and the VirtualmachineInstance is given the chance to perform a shutdown of the VMI within a timespan specified via --eviction-max-pod-grace-period
. The flag --eviction-soft-grace-period
specifies for how long a soft eviction condition must be held before soft evictions are triggered.
If set properly according to the demands of the VMIs, overcommitting should only lead to soft evictions in rare cases for some VMIs. They may even get re-scheduled to the same node with less initial memory demand. For some workload types, this can be perfectly fine and lead to better overall memory-utilization.
"},{"location":"compute/node_overcommit/#configuring-hard-eviction-thresholds","title":"Configuring Hard Eviction Thresholds","text":"Note: If unspecified, the kubelet will do hard evictions for Pods once memory.available
falls below 100Mi
.
Limits set via --eviction-hard
will lead to immediate eviction of VirtualMachineInstances or Pods. This stops VMIs without a grace period and is comparable with power-loss on a real computer.
If the hard limit is hit, VMIs may from time to time simply be killed. They may be re-scheduled to the same node immediately again, since they start with less memory consumption again. This can be a simple option, if the memory threshold is only very seldom hit and the work performed by the VMIs is reproducible or it can be resumed from some checkpoints.
"},{"location":"compute/node_overcommit/#requesting-the-right-qos-class-for-virtualmachineinstances","title":"Requesting the right QoS Class for VirtualMachineInstances","text":"Different QoS classes get assigned to Pods and VirtualMachineInstances based on the requests.memory
and limits.memory
. KubeVirt right now supports the QoS classes Burstable
and Guaranteed
. Burstable
VMIs are evicted before Guaranteed
VMIs.
This allows creating two classes of VMIs:
requests.memory
and limits.memory
set and therefore gets the Guaranteed
class assigned. This one will not get evicted and should never run into memory issues, but is more demanding.limits.memory
or a limits.memory
which is greater than requests.memory
and therefore gets the Burstable
class assigned. These VMIs will be evicted first.--system-reserved
and --kubelet-reserved
","text":"It may be important to reserve some memory for other daemons (not DaemonSets) which are running on the same node (ssh, dhcp servers, etc). The reservation can be done with the --system reserved
switch. Further for the Kubelet and Docker a special flag called --kubelet-reserved
exists.
The KSM (Kernel same-page merging) daemon can be started on the node. Depending on its tuning parameters it can more or less aggressively try to merge identical pages between applications and VirtualMachineInstances. The more aggressive it is configured the more CPU it will use itself, so the memory overcommit advantages comes with a slight CPU performance hit.
Config file tuning allows changes to scanning frequency (how often will KSM activate) and aggressiveness (how many pages per second will it scan).
"},{"location":"compute/node_overcommit/#enabling-swap","title":"Enabling Swap","text":"Note: This will definitely make sure that your VirtualMachines can't crash or get evicted from the node but it comes with the cost of pretty unpredictable performance once the node runs out of memory and the kubelet may not detect that it should evict Pods to increase the performance again.
Enabling swap is in general not recommended on Kubernetes right now. However, it can be useful in combination with KSM, since KSM merges identical pages over time. Swap allows the VMIs to successfully allocate memory which will then effectively never be used because of the later de-duplication done by KSM.
"},{"location":"compute/node_overcommit/#node-cpu-allocation-ratio","title":"Node CPU allocation ratio","text":"KubeVirt runs Virtual Machines in a Kubernetes Pod. This pod requests a certain amount of CPU time from the host. On the other hand, the Virtual Machine is being created with a certain amount of vCPUs. The number of vCPUs may not necessarily correlate to the number of requested CPUs by the POD. Depending on the QOS of the POD, vCPUs can be scheduled on a variable amount of physical CPUs; this depends on the available CPU resources on a node. When there are fewer available CPUs on the node as the requested vCPU, vCPU will be over committed.
By default, each pod requests 100mil of CPU time. The CPU requested on the pod sets the cgroups cpu.shares which serves as a priority for the scheduler to provide CPU time for vCPUs in this POD. As the number of vCPUs increases, this will reduce the amount of CPU time each vCPU may get when competing with other processes on the node or other Virtual Machine Instances with a lower amount of vCPUs.
The cpuAllocationRatio
comes to normalize the amount of CPU time the POD will request based on the number of vCPUs. For example, POD CPU request = number of vCPUs * 1/cpuAllocationRatio When cpuAllocationRatio is set to 1, a full amount of vCPUs will be requested for the POD.
Note: In Kubernetes, one full core is 1000 of CPU time More Information
Administrators can change this ratio by updating the KubeVirt CR
...\nspec:\n configuration:\n developerConfiguration:\n cpuAllocationRatio: 10\n
"},{"location":"compute/numa/","title":"NUMA","text":"FEATURE STATE: KubeVirt v0.43
NUMA support in KubeVirt is at this stage limited to a small set of special use-cases and will improve over time together with improvements made to Kubernetes.
In general, the goal is to map the host NUMA topology as efficiently as possible to the Virtual Machine topology to improve the performance.
The following NUMA mapping strategies can be used:
In order to use current NUMA support, the following preconditions must be met:
NUMA
feature gate must be enabled.GuestMappingPassthrough will pass through the node numa topology to the guest. The topology is based on the dedicated CPUs which the VMI got assigned from the kubelet via the CPU Manager. It can be requested by setting spec.domain.cpu.guestMappingPassthrough
on the VMI.
Since KubeVirt does not know upfront which exclusive CPUs the VMI will get from the kubelet, there are some limitations:
While this NUMA modelling strategy has its limitations, aligning the guest's NUMA architecture with the node's can be critical for high-performance applications.
An example VMI may look like this:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: numavm\nspec:\n domain:\n cpu:\n cores: 4\n dedicatedCpuPlacement: true\n numa:\n guestMappingPassthrough: { }\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n resources:\n requests:\n memory: 64Mi\n memory:\n hugepages:\n pageSize: 2Mi\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/cirros-container-disk-demo\n name: containerdisk\n - cloudInitNoCloud:\n userData: |\n #!/bin/sh\n echo 'printed from cloud-init userdata'\n name: cloudinitdisk\n
"},{"location":"compute/numa/#running-real-time-workloads","title":"Running real-time workloads","text":""},{"location":"compute/numa/#overview","title":"Overview","text":"It is possible to deploy Virtual Machines that run a real-time kernel and make use of libvirtd's guest cpu and memory optimizations that improve the overall latency. These changes leverage mostly on already available settings in KubeVirt, as we will see shortly, but the VMI manifest now exposes two new settings that instruct KubeVirt to configure the generated libvirt XML with the recommended tuning settings for running real-time workloads.
To make use of the optimized settings, two new settings have been added to the VMI schema:
spec.domain.cpu.realtime
: When defined, it instructs KubeVirt to configure the linux scheduler for the VCPUS to run processes in FIFO scheduling policy (SCHED_FIFO) with priority 1. This setting guarantees that all processes running in the host will be executed with real-time priority.
spec.domain.cpu.realtime.mask
: It defines which VCPUs assigned to the VM are used for real-time. If not defined, libvirt will define all VCPUS assigned to run processes in FIFO scheduling and in the highest priority (1).
A prerequisite to running real-time workloads include locking resources in the cluster to allow the real-time VM exclusive usage. This translates into nodes, or node, that have been configured with a dedicated set of CPUs and also provides support for NUMA with a free number of hugepages of 2Mi or 1Gi size (depending on the configuration in the VMI). Additionally, the node must be configured to allow the scheduler to run processes with real-time policy.
"},{"location":"compute/numa/#nodes-capable-of-running-real-time-workloads","title":"Nodes capable of running real-time workloads","text":"When the KubeVirt pods are deployed in a node, it will check if it is capable of running processes in real-time scheduling policy and label the node as real-time capable (kubevirt.io/realtime). If, on the other hand, the node is not able to deliver such capability, the label is not applied. To check which nodes are able to host real-time VM workloads run this command:
$>kubectl get nodes -l kubevirt.io/realtime\nNAME STATUS ROLES AGE VERSION\nworker-0-0 Ready worker 12d v1.20.0+df9c838\n
Internally, the KubeVirt pod running in each node checks if the kernel setting kernel.sched_rt_runtime_us
equals to -1, which grants processes to run in real-time scheduling policy for an unlimited amount of time.
Here is an example of a VM manifest that runs a custom fedora container disk configured to run with a real-time kernel. The settings have been configured for optimal efficiency.
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: fedora-realtime\n name: fedora-realtime\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: fedora-realtime\n spec:\n domain:\n devices:\n autoattachSerialConsole: true\n autoattachMemBalloon: false\n autoattachGraphicsDevice: false\n disks:\n - disk:\n bus: virtio\n name: containerdisk \n - disk:\n bus: virtio\n name: cloudinitdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1Gi\n cpu: 2\n limits:\n memory: 1Gi\n cpu: 2\n cpu:\n model: host-passthrough\n dedicatedCpuPlacement: true\n isolateEmulatorThread: true\n ioThreadsPolicy: auto\n features:\n - name: tsc-deadline\n policy: require\n numa:\n guestMappingPassthrough: {}\n realtime: {}\n memory:\n hugepages:\n pageSize: 1Gi\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-realtime-container-disk:v20211008-22109a3\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n bootcmd:\n - tuned-adm profile realtime\n name: cloudinitdisk\n
Breaking down the tuned sections, we have the following configuration:
Devices: - Disable the guest's memory balloon capability - Avoid attaching a graphics device, to reduce the number of interrupts to the kernel.
spec:\n domain:\n devices:\n autoattachSerialConsole: true\n autoattachMemBalloon: false\n autoattachGraphicsDevice: false\n
CPU: - model: host-passthrough
to allow the guest to see host CPU without masking any capability. - dedicated CPU Placement: The VM needs to have dedicated CPUs assigned to it. The Kubernetes CPU Manager takes care of this aspect. - isolatedEmulatorThread: to request an additional CPU to run the emulator on it, thus avoid using CPU cycles from the workload CPUs. - ioThreadsPolicy: Set to auto to let the dedicated IO thread to run in the same CPU as the emulator thread. - NUMA: defining guestMappingPassthrough
enables NUMA support for this VM. - realtime: instructs the virt-handler to configure this VM for real-time workloads, such as configuring the VCPUS to use FIFO scheduler policy and set priority to 1. cpu:
cpu:\n model: host-passthrough\n dedicatedCpuPlacement: true\n isolateEmulatorThread: true\n ioThreadsPolicy: auto\n features:\n - name: tsc-deadline\n policy: require\n numa:\n guestMappingPassthrough: {}\n realtime: {}\n
Memory - pageSize: allocate the pod's memory in hugepages of the given size, in this case of 1Gi.
memory:\n hugepages:\n pageSize: 1Gi\n
"},{"location":"compute/numa/#how-to-dedicate-vcpus-for-real-time-only","title":"How to dedicate VCPUS for real-time only","text":"It is possible to pass a regular expression of the VCPUs to isolate to use real-time scheduling policy, by using the realtime.mask
setting.
cpu:\n numa:\n guestMappingPassthrough: {}\n realtime:\n mask: \"0\"\n
When applied this configuration, KubeVirt will only set the first VCPU for real-time scheduler policy, leaving the remaining VCPUS to use the default scheduler policy. Other examples of valid masks are: - 0-3
: Use cores 0 to 3 for real-time scheduling, assuming that the VM has requested at least 3 cores. - 0-3,^1
: Use cores 0, 2 and 3 for real-time scheduling only, assuming that the VM has requested at least 3 cores.
Kubernetes provides additional NUMA components that may be relevant to your use-case but typically are not enabled by default. Please consult the Kubernetes documentation for details on configuration of these components.
"},{"location":"compute/numa/#topology-manager","title":"Topology Manager","text":"Topology Manager provides optimizations related to CPU isolation, memory and device locality. It is useful, for example, where an SR-IOV network adaptor VF allocation needs to be aligned with a NUMA node.
https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/
"},{"location":"compute/numa/#memory-manager","title":"Memory Manager","text":"Memory Manager is analogous to CPU Manager. It is useful, for example, where you want to align hugepage allocations with a NUMA node. It works in conjunction with Topology Manager.
The Memory Manager employs hint generation protocol to yield the most suitable NUMA affinity for a pod. The Memory Manager feeds the central manager (Topology Manager) with these affinity hints. Based on both the hints and Topology Manager policy, the pod is rejected or admitted to the node.
https://kubernetes.io/docs/tasks/administer-cluster/memory-manager/
"},{"location":"compute/persistent_tpm_and_uefi_state/","title":"Persistent TPM and UEFI state","text":"FEATURE STATE: KubeVirt v1.0.0
For both TPM and UEFI, libvirt supports persisting data created by a virtual machine as files on the virtualization host. In KubeVirt, the virtualization host is the virt-launcher pod, which is ephemeral (created on VM start and destroyed on VM stop). As of v1.0.0, KubeVirt supports using a PVC to persist those files. KubeVirt usually refers to that storage area as \"backend storage\".
"},{"location":"compute/persistent_tpm_and_uefi_state/#backend-storage","title":"Backend storage","text":"KubeVirt automatically creates backend storage PVCs for VMs that need it. However, the admin must first enable the VMPersistentState
feature gate, and tell KubeVirt which storage class to use by setting the vmStateStorageClass
configuration parameter in the KubeVirt Custom Resource (CR). The storage class must support read-write-many (RWX) in filesystem mode (FS). Here's an example of KubeVirt CR that sets both:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmStateStorageClass: \"nfs-csi\"\n developerConfiguration:\n featureGates:\n - VMPersistentState\n
"},{"location":"compute/persistent_tpm_and_uefi_state/#limitations","title":"Limitations","text":"Since KubeVirt v0.53.0, a TPM device can be added to a VM (with just tpm: {}
). However, the data stored in it does not persist across reboots. Support for persistence was added in v1.0.0 using a simple persistent
boolean parameter that default to false, to preserve previous behavior. Of course, backend storage must first be configured before adding a persistent TPM to a VM. See above. Here's a portion of a VM definition that includes a persistent TPM:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm\nspec:\n template:\n spec:\n domain:\n devices:\n tpm:\n persistent: true\n
"},{"location":"compute/persistent_tpm_and_uefi_state/#uses","title":"Uses","text":"tpm-crb
model is used (instead of tpm-tis
for non-persistent vTPMs)EFI support is handled by libvirt using OVMF. OVMF data usually consists of 2 files, CODE and VARS. VARS is where persistent data from the guest can be stored. When EFI persistence is enabled on a VM, the VARS file will be persisted inside the backend storage. Of course, backend storage must first be configured before enabling EFI persistence on a VM. See above. Here's a portion of a VM definition that includes a persistent EFI:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm\nspec:\n template:\n spec:\n domain:\n firmware:\n bootloader:\n efi:\n persistent: true\n
"},{"location":"compute/persistent_tpm_and_uefi_state/#uses_1","title":"Uses","text":"In this document, we are talking about the resources values set on the virt-launcher compute container, referred to as \"the container\" below for simplicity.
"},{"location":"compute/resources_requests_and_limits/#cpu","title":"CPU","text":"Note: dedicated CPUs (and isolated emulator thread) are ignored here as they have a dedicated page.
"},{"location":"compute/resources_requests_and_limits/#cpu-requests-on-the-container","title":"CPU requests on the container","text":"KubeVirt provides two ways to automatically set CPU limits on VM(I)s:
AutoResourceLimitsGate
feature gate.In both cases, the VM(I) created will have a CPU limit of 1 per vCPU.
"},{"location":"compute/resources_requests_and_limits/#autoresourcelimitsgate-feature-gate","title":"AutoResourceLimitsGate feature gate","text":"By enabling this feature gate, cpu limits will be added to the vmi if all the following conditions are true:
Cluster admins can define a label selector in the KubeVirt CR. Once that label selector is defined, if the creation namespace matches the selector, all VM(I)s created in it will have a CPU limits set.
Example:
CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n autoCPULimitNamespaceLabelSelector:\n matchLabels:\n autoCpuLimit: \"true\"\n
Namespace:
apiVersion: v1\nkind: Namespace\nmetadata:\n labels:\n autoCpuLimit: \"true\"\n kubernetes.io/metadata.name: default\n name: default\n
KubeVirt provides a feature gate(AutoResourceLimitsGate
) to automatically set memory limits on VM(I)s. By enabling this feature gate, memory limits will be added to the vmi if all the following conditions are true:
If all the previous conditions are true, the memory limits will be set to a value (2x
) of the memory requests. This ratio can be adjusted, per namespace, by adding the annotation alpha.kubevirt.io/auto-memory-limits-ratio
, with the desired custom value. For example, with alpha.kubevirt.io/auto-memory-limits-ratio: 1.2
, the memory limits set will be equal to (1.2x
) of the memory requests.
VirtualMachines have a Running
setting that determines whether or not there should be a guest running or not. Because KubeVirt will always immediately restart a VirtualMachineInstance for VirtualMachines with spec.running: true
, a simple boolean is not always enough to fully describe desired behavior. For instance, there are cases when a user would like the ability to shut down a guest from inside the virtual machine. With spec.running: true
, KubeVirt would immediately restart the VirtualMachineInstance.
To allow for greater variation of user states, the RunStrategy
field has been introduced. This is mutually exclusive with Running
as they have somewhat overlapping conditions. There are currently four RunStrategies defined:
Always: The system is tasked with keeping the VM in a running state. This is achieved by respawning a VirtualMachineInstance whenever the current one terminated in a controlled (e.g. shutdown from inside the guest) or uncontrolled (e.g. crash) way. This behavior is equal to spec.running: true
.
RerunOnFailure: Similar to Always
, except that the VM is only restarted if it terminated in an uncontrolled way (e.g. crash) and due to an infrastructure reason (i.e. the node crashed, the KVM related process OOMed). This allows a user to determine when the VM should be shut down by initiating the shut down inside the guest. Note: Guest sided crashes (i.e. BSOD) are not covered by this. In such cases liveness checks or the use of a watchdog can help.
Manual: The system will not automatically turn the VM on or off, instead the user manually controlls the VM status by issuing start, stop, and restart commands on the VirtualMachine subresource endpoints.
Halted: The system is asked to ensure that no VM is running. This is achieved by stopping any VirtualMachineInstance that is associated ith the VM. If a guest is already running, it will be stopped. This behavior is equal to spec.running: false
.
Note: RunStrategy
and running
are mutually exclusive, because they can be contradictory. The API server will reject VirtualMachine resources that define both.
The start
, stop
and restart
methods of virtctl will invoke their respective subresources of VirtualMachines. This can have an effect on the runStrategy of the VirtualMachine as below:
Always
-
Halted
Always
RerunOnFailure
RerunOnFailure
RerunOnFailure
RerunOnFailure
Manual
Manual
Manual
Manual
Halted
Always
-
-
Table entries marked with -
don't make sense, so won't have an effect on RunStrategy.
An example usage of the Always RunStrategy.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-cirros\n name: vm-cirros\nspec:\n runStrategy: Always\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-cirros\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n name: containerdisk\n
"},{"location":"compute/virtual_hardware/","title":"Virtual hardware","text":"Fine-tuning different aspects of the hardware which are not device related (BIOS, mainboard, etc.) is sometimes necessary to allow guest operating systems to properly boot and reboot.
"},{"location":"compute/virtual_hardware/#machine-type","title":"Machine Type","text":"QEMU is able to work with two different classes of chipsets for x86_64, so called machine types. The x86_64 chipsets are i440fx (also called pc) and q35. They are versioned based on qemu-system-${ARCH}, following the format pc-${machine_type}-${qemu_version}
, e.g.pc-i440fx-2.10
and pc-q35-2.10
.
KubeVirt defaults to QEMU's newest q35 machine type. If a custom machine type is desired, it is configurable through the following structure:
metadata:\n name: myvmi\nspec:\n domain:\n machine:\n # This value indicates QEMU machine type.\n type: pc-q35-2.10\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
Comparison of the machine types' internals can be found at QEMU wiki."},{"location":"compute/virtual_hardware/#biosuefi","title":"BIOS/UEFI","text":"All virtual machines use BIOS by default for booting.
It is possible to utilize UEFI/OVMF by setting a value via spec.firmware.bootloader
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-alpine-efi\n name: vmi-alpine-efi\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n features:\n smm:\n enabled: true\n firmware:\n # this sets the bootloader type\n bootloader:\n efi: {}\n
Enabling EFI automatically enables Secure Boot, unless the secureBoot
field under efi
is set to false
. Secure Boot itself requires the SMM CPU feature to be enabled as above, which does not happen automatically, for security reasons.
In order to provide a consistent view on the virtualized hardware for the guest OS, the SMBIOS UUID can be set to a constant value via spec.firmware.uuid
:
metadata:\n name: myvmi\nspec:\n domain:\n firmware:\n # this sets the UUID\n uuid: 5d307ca9-b3ef-428c-8861-06e72d69f223\n serial: e4686d2c-6e8d-4335-b8fd-81bee22f4815\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
In addition, the SMBIOS serial number can be set to a constant value via spec.firmware.serial
, as demonstrated above.
Note: This is not related to scheduling decisions or resource assignment.
"},{"location":"compute/virtual_hardware/#topology","title":"Topology","text":"Setting the number of CPU cores is possible via spec.domain.cpu.cores
. The following VM will have a CPU with 3
cores:
metadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this sets the cores\n cores: 3\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
"},{"location":"compute/virtual_hardware/#labeling-nodes-with-cpu-models-and-cpu-features","title":"Labeling nodes with cpu models and cpu features","text":"KubeVirt can create node selectors based on VM cpu models and features. With these node selectors, VMs will be scheduled on the nodes that support the matching VM cpu model and features.
To properly label the node, user can use Kubevirt Node-labeller, which creates all necessary labels or create node labels by himself.
Kubevirt node-labeller creates 3 types of labels: cpu models, cpu features and kvm info. It uses libvirt to get all supported cpu models and cpu features on host and then Node-labeller creates labels from cpu models.
Node-labeller supports obsolete list of cpu models and minimal baseline cpu model for features. Both features can be set via KubeVirt CR:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n ...\n configuration:\n obsoleteCPUModels:\n 486: true\n pentium: true\n...\n
Obsolete cpus will not be inserted in labels. If KubeVirt CR doesn't contain obsoleteCPUModels
variable, Labeller sets default values (\"pentium, pentium2, pentium3, pentiumpro, coreduo, n270, core2duo, Conroe, athlon, phenom, kvm32, kvm64, qemu32 and qemu64\").
User can change obsoleteCPUModels by adding / removing cpu model in config map. Kubevirt then update nodes with new labels.
For homogenous cluster / clusters without live migration enabled it's possible to disable the node labeler and avoid adding labels to the nodes by adding the following annotation to the nodes:
node-labeller.kubevirt.io/skip-node
.
Note: If CPU model wasn't defined, the VM will have CPU model closest to one that used on the node where the VM is running.
Note: CPU model is case sensitive.
Setting the CPU model is possible via spec.domain.cpu.model
. The following VM will have a CPU with the Conroe
model:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this sets the CPU model\n model: Conroe\n...\n
You can check list of available models here.
When CPUNodeDiscovery feature-gate is enabled and VM has cpu model, Kubevirt creates node selector with format: cpu-model.node.kubevirt.io/<cpuModel>
, e.g. cpu-model.node.kubevirt.io/Conroe
. When VM doesn\u2019t have cpu model, then no node selector is created.
To enable the default cpu model, user may add the cpuModel
field in the KubeVirt CR.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n ...\n configuration:\n cpuModel: \"EPYC\"\n...\n
Default CPU model is set when vmi doesn't have any cpu model. When vmi has cpu model set, then vmi's cpu model is preferred. When default cpu model is not set and vmi's cpu model is not set too, host-model
will be set. Default cpu model can be changed when kubevirt is running. When CPUNodeDiscovery feature gate is enabled Kubevirt creates node selector with default cpu model.
As special cases you can set spec.domain.cpu.model
equals to: - host-passthrough
to passthrough CPU from the node to the VM
metadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this passthrough the node CPU to the VM\n model: host-passthrough\n...\n
host-model
to get CPU on the VM close to the node onemetadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this set the VM CPU close to the node one\n model: host-model\n...\n
See the CPU API reference for more details.
"},{"location":"compute/virtual_hardware/#features","title":"Features","text":"Setting CPU features is possible via spec.domain.cpu.features
and can contain zero or more CPU features :
metadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this sets the CPU features\n features:\n # this is the feature's name\n - name: \"apic\"\n # this is the feature's policy\n policy: \"require\"\n...\n
Note: Policy attribute can either be omitted or contain one of the following policies: force, require, optional, disable, forbid.
Note: In case a policy is omitted for a feature, it will default to require.
Behaviour according to Policies:
Full description about features and policies can be found here.
When CPUNodeDiscovery feature-gate is enabled Kubevirt creates node selector from cpu features with format: cpu-feature.node.kubevirt.io/<cpuFeature>
, e.g. cpu-feature.node.kubevirt.io/apic
. When VM doesn\u2019t have cpu feature, then no node selector is created.
Sets the virtualized hardware clock inside the VM to a specific time. Available options are
utc
timezone
See the Clock API Reference for all possible configuration options.
"},{"location":"compute/virtual_hardware/#utc","title":"utc","text":"If utc
is specified, the VM's clock will be set to UTC.
metadata:\n name: myvmi\nspec:\n domain:\n clock:\n utc: {}\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
"},{"location":"compute/virtual_hardware/#timezone","title":"timezone","text":"If timezone
is specified, the VM's clock will be set to the specified local time.
metadata:\n name: myvmi\nspec:\n domain:\n clock:\n timezone: \"America/New York\"\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
"},{"location":"compute/virtual_hardware/#timers","title":"Timers","text":"pit
rtc
kvm
hyperv
A pretty common timer configuration for VMs looks like this:
metadata:\n name: myvmi\nspec:\n domain:\n clock:\n utc: {}\n # here are the timer\n timer:\n hpet:\n present: false\n pit:\n tickPolicy: delay\n rtc:\n tickPolicy: catchup\n hyperv: {}\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
hpet
is disabled,pit
and rtc
are configured to use a specific tickPolicy
. Finally, hyperv
is made available too.
See the Timer API Reference for all possible configuration options.
Note: Timer can be part of a machine type. Thus it may be necessary to explicitly disable them. We may in the future decide to add them via cluster-level defaulting, if they are part of a QEMU machine definition.
"},{"location":"compute/virtual_hardware/#random-number-generator-rng","title":"Random number generator (RNG)","text":"You may want to use entropy collected by your cluster nodes inside your guest. KubeVirt allows to add a virtio
RNG device to a virtual machine as follows.
metadata:\n name: vmi-with-rng\nspec:\n domain:\n devices:\n rng: {}\n
For Linux guests, the virtio-rng
kernel module should be loaded early in the boot process to acquire access to the entropy source. Other systems may require similar adjustments to work with the virtio
RNG device.
Note: Some guest operating systems or user payloads may require the RNG device with enough entropy and may fail to boot without it. For example, fresh Fedora images with newer kernels (4.16.4+) may require the virtio
RNG device to be present to boot to login.
By default a minimal Video and Graphics device configuration will be applied to the VirtualMachineInstance. The video device is vga
compatible and comes with a memory size of 16 MB. This device allows connecting to the OS via vnc
.
It is possible not attach it by setting spec.domain.devices.autoattachGraphicsDevice
to false
:
metadata:\n name: myvmi\nspec:\n domain:\n devices:\n autoattachGraphicsDevice: false\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
VMIs without graphics and video devices are very often referenced as headless
VMIs.
If using a huge amount of small VMs this can be helpful to increase the VMI density per node, since no memory needs to be reserved for video.
"},{"location":"compute/virtual_hardware/#features_1","title":"Features","text":"KubeVirt supports a range of virtualization features which may be tweaked in order to allow non-Linux based operating systems to properly boot. Most noteworthy are
acpi
apic
hyperv
A common feature configuration is shown by the following example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n domain:\n # typical features\n features:\n acpi: {}\n apic: {}\n hyperv:\n relaxed: {}\n vapic: {}\n spinlocks:\n spinlocks: 8191\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimname: myclaim\n
See the Features API Reference for all available features and configuration options.
"},{"location":"compute/virtual_hardware/#resources-requests-and-limits","title":"Resources Requests and Limits","text":"An optional resource request can be specified by the users to allow the scheduler to make a better decision in finding the most suitable Node to place the VM.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n domain:\n resources:\n requests:\n memory: \"1Gi\"\n cpu: \"1\"\n limits:\n memory: \"2Gi\"\n cpu: \"2\"\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimname: myclaim\n
"},{"location":"compute/virtual_hardware/#cpu_1","title":"CPU","text":"Specifying CPU limits will determine the amount of cpu shares set on the control group the VM is running in, in other words, the amount of time the VM's CPUs can execute on the assigned resources when there is a competition for CPU resources.
For more information please refer to how Pods with resource limits are run.
"},{"location":"compute/virtual_hardware/#memory-overhead","title":"Memory Overhead","text":"Various VM resources, such as a video adapter, IOThreads, and supplementary system software, consume additional memory from the Node, beyond the requested memory intended for the guest OS consumption. In order to provide a better estimate for the scheduler, this memory overhead will be calculated and added to the requested memory.
Please see how Pods with resource requests are scheduled for additional information on resource requests and limits.
"},{"location":"compute/virtual_hardware/#hugepages","title":"Hugepages","text":"KubeVirt give you possibility to use hugepages as backing memory for your VM. You will need to provide desired amount of memory resources.requests.memory
and size of hugepages to use memory.hugepages.pageSize
, for example for x86_64 architecture it can be 2Mi
.
apiVersion: kubevirt.io/v1alpha1\nkind: VirtualMachine\nmetadata:\n name: myvm\nspec:\n domain:\n resources:\n requests:\n memory: \"64Mi\"\n memory:\n hugepages:\n pageSize: \"2Mi\"\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimname: myclaim\n
In the above example the VM will have 64Mi
of memory, but instead of regular memory it will use node hugepages of the size of 2Mi
.
a node must have pre-allocated hugepages
hugepages size cannot be bigger than requested memory
requested memory must be divisible by hugepages size
hugepages uses by default memfd. Memfd is supported from kernel >= 4.14. If you run on an older host (e.g centos 7.9), it is required to disable memfd with the annotation kubevirt.io/memfd: \"false\"
in the VMI metadata annotation.
Kubevirt supports input devices. The only type which is supported is tablet
. Tablet input device supports only virtio
and usb
bus. Bus can be empty. In that case, usb
will be selected.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: myvm\nspec:\n domain:\n devices:\n inputs:\n - type: tablet\n bus: virtio\n name: tablet1\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimname: myclaim\n
"},{"location":"compute/vsock/","title":"VSOCK","text":"VM Sockets (vsock) is a fast and efficient guest-host communication mechanism.
"},{"location":"compute/vsock/#background","title":"Background","text":"Right now KubeVirt uses virtio-serial for local guest-host communication. Currently it used in KubeVirt by libvirt and qemu to communicate with the qemu-guest-agent. Virtio-serial can also be used by other agents, but it is a little bit cumbersome due to:
With virtio-vsock we get support for easy guest-host communication which solves the above issues from a user/admin perspective.
"},{"location":"compute/vsock/#usage","title":"Usage","text":""},{"location":"compute/vsock/#feature-gate","title":"Feature Gate","text":"To enable VSOCK in KubeVirt cluster, the user may expand the featureGates
field in the KubeVirt CR by adding the VSOCK
to it.
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n ...\n configuration:\n developerConfiguration:\n featureGates:\n - \"VSOCK\"\n
Alternatively, users can edit an existing kubevirt CR:
kubectl edit kubevirt kubevirt -n kubevirt
...\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - \"VSOCK\"\n
"},{"location":"compute/vsock/#virtual-machine-instance","title":"Virtual Machine Instance","text":"To attach VSOCK device to a Virtual Machine, the user has to add autoattachVSOCK: true
in a devices
section of Virtual Machine Instance specification:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-vsock\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n autoattachVSOCK: true\n
This will expose VSOCK device to the VM. The CID
will be assigned randomly by virt-controller
, and exposed to the Virtual Machine Instance status:
status:\n VSOCKCID: 123\n
"},{"location":"compute/vsock/#security","title":"Security","text":"NOTE: The /dev/vhost-vsock
device is NOT NEEDED to connect or bind to a VSOCK socket.
To make VSOCK feature secure, following measures are put in place:
CAP_NET_BIND_SERVICE
capability.AF_VSOCK
socket syscall gets blocked in containerd 1.7+ (containerd/containerd#7442). It is right now the responsibility of the vendor to ensure that the used CRI selects a default seccomp policy which blocks VSOCK socket calls in a similar way like it was done for containerd.virt-controller
and are unique per Virtual Machine Instance to ensure that virt-handler
has an easy way of tracking the identity without races. While this still allows virt-launcher
to fake-use an assigned CID, it eliminates possible assignment races which attackers could make use-of to redirect VSOCK calls.Purpose of this document is to explain how to install virtio drivers for Microsoft Windows running in a fully virtualized guest.
"},{"location":"compute/windows_virtio_drivers/#do-i-need-virtio-drivers","title":"Do I need virtio drivers?","text":"Yes. Without the virtio drivers, you cannot use paravirtualized hardware properly. It would either not work, or will have a severe performance penalty.
For more information about VirtIO and paravirtualization, see VirtIO and paravirtualization
For more details on configuring your VirtIO driver please refer to Installing VirtIO driver on a new Windows virtual machine and Installing VirtIO driver on an existing Windows virtual machine.
"},{"location":"compute/windows_virtio_drivers/#which-drivers-i-need-to-install","title":"Which drivers I need to install?","text":"There are usually up to 8 possible devices that are required to run Windows smoothly in a virtualized environment. KubeVirt currently supports only:
viostor, the block driver, applies to SCSI Controller in the Other devices group.
viorng, the entropy source driver, applies to PCI Device in the Other devices group.
NetKVM, the network driver, applies to Ethernet Controller in the Other devices group. Available only if a virtio NIC is configured.
Other virtio drivers, that exists and might be supported in the future:
Balloon, the balloon driver, applies to PCI Device in the Other devices group
vioserial, the paravirtual serial driver, applies to PCI Simple Communications Controller in the Other devices group.
vioscsi, the SCSI block driver, applies to SCSI Controller in the Other devices group.
qemupciserial, the emulated PCI serial driver, applies to PCI Serial Port in the Other devices group.
qxl, the paravirtual video driver, applied to Microsoft Basic Display Adapter in the Display adapters group.
pvpanic, the paravirtual panic driver, applies to Unknown device in the Other devices group.
Note
Some drivers are required in the installation phase. When you are installing Windows onto the virtio block storage you have to provide an appropriate virtio driver. Namely, choose viostor driver for your version of Microsoft Windows, eg. does not install XP driver when you run Windows 10.
Other drivers can be installed after the successful windows installation. Again, please install only drivers matching your Windows version.
"},{"location":"compute/windows_virtio_drivers/#how-to-install-during-windows-install","title":"How to install during Windows install?","text":"To install drivers before the Windows starts its install, make sure you have virtio-win package attached to your VirtualMachine as SATA CD-ROM. In the Windows installation, choose advanced install and load driver. Then please navigate to loaded Virtio CD-ROM and install one of viostor or vioscsi, depending on whichever you have set up.
Step by step screenshots:
"},{"location":"compute/windows_virtio_drivers/#how-to-install-after-windows-install","title":"How to install after Windows install?","text":"After windows install, please go to Device Manager. There you should see undetected devices in \"available devices\" section. You can install virtio drivers one by one going through this list.
For more details on how to choose a proper driver and how to install the driver, please refer to the Windows Guest Virtual Machines on Red Hat Enterprise Linux 7.
"},{"location":"compute/windows_virtio_drivers/#how-to-obtain-virtio-drivers","title":"How to obtain virtio drivers?","text":"The virtio Windows drivers are distributed in a form of containerDisk, which can be simply mounted to the VirtualMachine. The container image, containing the disk is located at: https://quay.io/repository/kubevirt/virtio-container-disk?tab=tags and the image be pulled as any other docker container:
docker pull quay.io/kubevirt/virtio-container-disk\n
However, pulling image manually is not required, it will be downloaded if not present by Kubernetes when deploying VirtualMachine.
"},{"location":"compute/windows_virtio_drivers/#attaching-to-virtualmachine","title":"Attaching to VirtualMachine","text":"KubeVirt distributes virtio drivers for Microsoft Windows in a form of container disk. The package contains the virtio drivers and QEMU guest agent. The disk was tested on Microsoft Windows Server 2012. Supported Windows version is XP and up.
The package is intended to be used as CD-ROM attached to the virtual machine with Microsoft Windows. It can be used as SATA CDROM during install phase or to provide drivers in an existing Windows installation.
Attaching the virtio-win package can be done simply by adding ContainerDisk to you VirtualMachine.
spec:\n domain:\n devices:\n disks:\n - name: virtiocontainerdisk\n # Any other disk you want to use, must go before virtioContainerDisk.\n # KubeVirt boots from disks in order ther are defined.\n # Therefore virtioContainerDisk, must be after bootable disk.\n # Other option is to choose boot order explicitly:\n # - https://kubevirt.io/api-reference/v0.13.2/definitions.html#_v1_disk\n # NOTE: You either specify bootOrder explicitely or sort the items in\n # disks. You can not do both at the same time.\n # bootOrder: 2\n cdrom:\n bus: sata\nvolumes:\n - containerDisk:\n image: quay.io/kubevirt/virtio-container-disk\n name: virtiocontainerdisk\n
Once you are done installing virtio drivers, you can remove virtio container disk by simply removing the disk from yaml specification and restarting the VirtualMachine.
"},{"location":"debug_virt_stack/debug/","title":"Debug","text":"This page contains instructions on how to debug KubeVirt.
This is useful to both KubeVirt developers and advanced users that would like to gain deep understanding on what's happening behind the scenes.
"},{"location":"debug_virt_stack/debug/#log-verbosity","title":"Log Verbosity","text":"KubeVirt produces a lot of logging throughout its codebase. Some log entries have a verbosity level defined to them. The verbosity level that's defined for a log entry determines the minimum verbosity level in order to expose the log entry.
In code, the log entry looks similar to: log.Log.V(verbosity).Infof(\"...\")
while verbosity
is the minimum verbosity level for this entry.
For example, if the log verbosity for some log entry is 3
, then the log would be exposed only if the log verbosity is defined to be equal or greater than 3
, or else it would be filtered out.
Currently, log verbosity can be defined per-component or per-node. The most updated API is detailed here.
"},{"location":"debug_virt_stack/debug/#setting-verbosity-per-kubevirt-component","title":"Setting verbosity per KubeVirt component","text":"One way of raising log verbosity is to manually determine it for the different components in KubeVirt
CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n logVerbosity:\n virtLauncher: 2\n virtHandler: 3\n virtController: 4\n virtAPI: 5\n virtOperator: 6\n
This option is best for debugging specific components.
"},{"location":"debug_virt_stack/debug/#libvirt-virtqemudconf-set-log_filters-according-to-virt-launcher-log-verbosity","title":"libvirt virtqemud.conf set log_filters according to virt-launcher log Verbosity","text":"Verbosity level log_filters in virtqemud.conf 5 log_filters=\"3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 3:util.threadjob 3:cpu.cpu 3:qemu.qemu_monitor 3:qemu.qemu_monitor_json 3:conf.domain_addr 1:*\" 6 3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 3:util.threadjob 3:cpu.cpu 3:qemu.qemu_monitor 1:* 7 3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 3:util.threadjob 3:cpu.cpu 1:* 8 and above 3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 1:*User can set self-defined log-filters via the annotations tag kubevirt.io/libvirt-log-filters
in VMI configuration. e.g.
kind: VirtualMachineInstance\nmetadata:\n name: my-vmi\n annotations:\n kubevirt.io/libvirt-log-filters: \"3:remote 4:event 1:*\"\n
"},{"location":"debug_virt_stack/debug/#setting-verbosity-per-nodes","title":"Setting verbosity per nodes","text":"Another way is to set verbosity level per node:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n nodeVerbosity:\n \"node01\": 4\n \"otherNodeName\": 6\n
nodeVerbosity
is essentially a map from string to int where the key is the node name and the value is the verbosity level. The verbosity level would be defined for all the different components in that node (e.g. virt-handler
, virt-launcher
, etc).
In Kubernetes, logs are defined at the Pod level. Therefore, first it's needed to list the Pods of KubeVirt's core components. In order to do that we can first list the Pods under KubeVirt's install namespace.
For example:
$> kubectl get pods -n <KubeVirt Install Namespace>\nNAME READY STATUS RESTARTS AGE\ndisks-images-provider-7gqbc 1/1 Running 0 32m\ndisks-images-provider-vg4kx 1/1 Running 0 32m\nvirt-api-57fcc4497b-7qfmc 1/1 Running 0 31m\nvirt-api-57fcc4497b-tx9nc 1/1 Running 0 31m\nvirt-controller-76c784655f-7fp6m 1/1 Running 0 30m\nvirt-controller-76c784655f-f4pbd 1/1 Running 0 30m\nvirt-handler-2m86x 1/1 Running 0 30m\nvirt-handler-9qs6z 1/1 Running 0 30m\nvirt-operator-7ccfdbf65f-q5snk 1/1 Running 0 32m\nvirt-operator-7ccfdbf65f-vllz8 1/1 Running 0 32m\n
Then, we can pick one of the pods and fetch its logs. For example:
$> kubectl logs -n <KubeVirt Install Namespace> virt-handler-2m86x | head -n8\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"set verbosity to 2\",\"pos\":\"virt-handler.go:453\",\"timestamp\":\"2022-04-17T08:58:37.373695Z\"}\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"set verbosity to 2\",\"pos\":\"virt-handler.go:453\",\"timestamp\":\"2022-04-17T08:58:37.373726Z\"}\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"setting rate limiter to 5 QPS and 10 Burst\",\"pos\":\"virt-handler.go:462\",\"timestamp\":\"2022-04-17T08:58:37.373782Z\"}\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"CPU features of a minimum baseline CPU model: map[apic:true clflush:true cmov:true cx16:true cx8:true de:true fpu:true fxsr:true lahf_lm:true lm:true mca:true mce:true mmx:true msr:true mtrr:true nx:true pae:true pat:true pge:true pni:true pse:true pse36:true sep:true sse:true sse2:true sse4.1:true ssse3:true syscall:true tsc:true]\",\"pos\":\"cpu_plugin.go:96\",\"timestamp\":\"2022-04-17T08:58:37.390221Z\"}\n{\"component\":\"virt-handler\",\"level\":\"warning\",\"msg\":\"host model mode is expected to contain only one model\",\"pos\":\"cpu_plugin.go:103\",\"timestamp\":\"2022-04-17T08:58:37.390263Z\"}\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"node-labeller is running\",\"pos\":\"node_labeller.go:94\",\"timestamp\":\"2022-04-17T08:58:37.391011Z\"}\n
Obviously, for both examples above, <KubeVirt Install Namespace>
needs to be replaced with the actual namespace KubeVirt is installed in.
Using the cluster-profiler
client tool, a developer can get the PProf profiling data for every component in the Kubevirt Control plane. Here is a user guide:
cluster-profiler
","text":"Build from source code
$ git clone https://github.com/kubevirt/kubevirt.git\n$ cd kubevirt/tools/cluster-profiler\n$ go build\n
"},{"location":"debug_virt_stack/debug/#enable-the-feature-gate","title":"Enable the feature gate","text":"Add ClusterProfiler
in KubeVirt config
$ cat << END > enable-feature-gate.yaml\n\n---\napiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - ClusterProfiler\nEND\n\n$ kubectl apply -f enable-feature-gate.yaml\n
"},{"location":"debug_virt_stack/debug/#do-the-profiling","title":"Do the profiling","text":"Start CPU profiling
$ cluster-profiler --cmd start\n\n2023/05/17 09:31:09 SUCCESS: started cpu profiling KubeVirt control plane\n
Stop CPU profiling $ cluster-profiler --cmd stop\n\n2023/05/17 09:31:14 SUCCESS: stopped cpu profiling KubeVirt control plane\n
Dump the pprof result $ cluster-profiler --cmd dump\n\n2023/05/17 09:31:18 Moving already existing \"cluster-profiler-results\" => \"cluster-profiler-results-old-67fq\"\nSUCCESS: Dumped PProf 6 results for KubeVirt control plane to [cluster-profiler-results]\n
The PProf result can be found in the folder cluster-profiler-results
$ tree cluster-profiler-results\n\ncluster-profiler-results\n\u251c\u2500\u2500 virt-api-5f96f84dcb-lkpb7\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u251c\u2500\u2500 virt-controller-5bbd9554d9-2f8j2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u251c\u2500\u2500 virt-controller-5bbd9554d9-qct2w\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u251c\u2500\u2500 virt-handler-ccq6c\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u251c\u2500\u2500 virt-operator-cdc677b7-pg9j2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u2514\u2500\u2500 virt-operator-cdc677b7-pjqdx\n \u251c\u2500\u2500 allocs.pprof\n \u251c\u2500\u2500 block.pprof\n \u251c\u2500\u2500 cpu.pprof\n \u251c\u2500\u2500 goroutine.pprof\n \u251c\u2500\u2500 heap.pprof\n \u251c\u2500\u2500 mutex.pprof\n \u2514\u2500\u2500 threadcreate.pprof\n
"},{"location":"debug_virt_stack/launch-qemu-gdb/","title":"Launch QEMU with gdb and connect locally with gdb client","text":"This guide is for cases where QEMU counters very early failures and it is hard to synchronize it in a later point in time.
"},{"location":"debug_virt_stack/launch-qemu-gdb/#image-creation-and-pvc-population","title":"Image creation and PVC population","text":"This scenario is a slight variation of the guide about starting strace, hence some of the details on the image build and the PVC population are simply skipped and explained in the other section.
In this example, QEMU will be launched with gdbserver
and later we will connect to it using a local gdb
client.
The wrapping script looks like:
#!/bin/bash\n\nLD_LIBRARY_PATH=$LD_LIBRARY_PATH:/var/run/debug/usr/lib64 /var/run/debug/usr/bin/gdbserver \\\n localhost:1234 \\\n /usr/libexec/qemu-kvm $@ &\nprintf \"%d\" $(pgrep gdbserver) > /run/libvirt/qemu/run/default_vmi-debug-tools.pid\n
First, we need to build and push the image with the wrapping script and the gdbserver:
FROM quay.io/centos/centos:stream9 as build\n\nENV DIR /debug-tools\nENV DEBUGINFOD_URLS https://debuginfod.centos.org/\nRUN mkdir -p ${DIR}/logs\n\nRUN yum install --installroot=${DIR} -y gdb-gdbserver && yum clean all\n\nCOPY ./wrap_qemu_gdb.sh $DIR/wrap_qemu_gdb.sh\nRUN chmod 0755 ${DIR}/wrap_qemu_gdb.sh\nRUN chown 107:107 ${DIR}/wrap_qemu_gdb.sh\nRUN chown 107:107 ${DIR}/logs\n
Then, we can create and populate the debug-tools
PVC as with did in the strace example:
$ k apply -f debug-tools-pvc.yaml\npersistentvolumeclaim/debug-tools created\n$ kubectl apply -f populate-job-pvc.yaml\njob.batch/populate-pvc created\n$ $ kubectl get jobs\nNAME COMPLETIONS DURATION AGE\npopulate-pvc 1/1 7s 2m12s\n
Configmap:
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: my-config-map\ndata:\n my_script.sh: |\n #!/bin/sh\n tempFile=`mktemp --dry-run`\n echo $4 > $tempFile\n sed -i \"s|<emulator>/usr/libexec/qemu-kvm</emulator>|<emulator>/var/run/debug/wrap_qemu_gdb.sh</emulator>|\" $tempFile\n cat $tempFile\n
As last step, we need to create the configmaps to modify the VM XML:
$ kubectl apply -f configmap.yaml\nconfigmap/my-config-map created\n
"},{"location":"debug_virt_stack/launch-qemu-gdb/#build-client-image","title":"Build client image","text":"In this scenario, we use an additional container image containing gdb
and the same qemu binary as the target process to debug. This image will be run locally with podman
.
In order to build this image, we need to identify the image of the virt-launcher
container we want to debug. Based on the KubeVirt installation, the namespace and the name of the KubeVirt CR could vary. In this example, we'll assume that KubeVirt CR is called kubevirt
and installed in the kubevirt
namespace.
You can easily find out the right names in your cluster by searching with:
$ kubectl get kubevirt -A\nNAMESPACE NAME AGE PHASE\nkubevirt kubevirt 3h11m Deployed\n
The steps to build the image are:
Get the registry of the images of the KubeVirt installation:
$ export registry=$(kubectl get kubevirt kubevirt -n kubevirt -o jsonpath='{.status.observedDeploymentConfig}' |jq '.registry'|tr -d \"\\\"\")\n$ echo $registry\n\"registry:5000/kubevirt\"\n
Get the shasum of the virt-launcher image:
$ export tag=$(kubectl get kubevirt kubevirt -n kubevirt -o jsonpath='{.status.observedDeploymentConfig}' |jq '.virtLauncherSha'|tr -d \"\\\"\")\n$ echo $tag\n\"sha256:6c8b85eed8e83a4c70779836b246c057d3e882eb513f3ded0a02e0a4c4bda837\"\n
Example of Dockerfile:
ARG registry\nARG tag\nFROM ${registry}/kubevirt/virt-launcher${tag} AS launcher\nFROM quay.io/centos/centos:stream9 as build\n\nRUN yum install -y gdb && yum clean all\n\nCOPY --from=launcher /usr/libexec/qemu-kvm /usr/libexec/qemu-kvm\n
registry
and the tag
retrieved in the previous steps: $ podman build \\\n -t gdb-client \\\n --build-arg registry=$registry \\\n --build-arg tag=@$tag \\\n -f Dockerfile.client .\n
Podman will replace the registry and tag arguments provided on the command line. In this way, we can specify the image registry and shasum for the KubeVirt version to debug.
"},{"location":"debug_virt_stack/launch-qemu-gdb/#run-the-vm-to-troubleshoot","title":"Run the VM to troubleshoot","text":"For this example, we add an annotation to keep the virt-launcher pod running even if any errors occur:
metadata:\n annotations:\n kubevirt.io/keep-launcher-alive-after-failure: \"true\"\n
Then, we can launch the VM:
$ kubectl apply -f debug-vmi.yaml\nvirtualmachineinstance.kubevirt.io/vmi-debug-tools created\n$ kubectl get vmi\nNAME AGE PHASE IP NODENAME READY\nvmi-debug-tools 28s Scheduled node01 False\n$ kubectl get po\nNAME READY STATUS RESTARTS AGE\npopulate-pvc-dnxld 0/1 Completed 0 4m17s\nvirt-launcher-vmi-debug-tools-tfh28 4/4 Running 0 25s\n
The wrapping script starts the gdbserver
and expose in the port 1234
inside the container. In order to be able to connect remotely to the gdbserver, we can use the command kubectl port-forward
to expose the gdb port on our machine.
$ kubectl port-forward virt-launcher-vmi-debug-tools-tfh28 1234\nForwarding from 127.0.0.1:1234 -> 1234\nForwarding from [::1]:1234 -> 1234\n
Finally, we can start the gbd client in the container:
$ podman run -ti --network host gdb-client:latest\n$ gdb /usr/libexec/qemu-kvm -ex 'target remote localhost:1234'\nGNU gdb (GDB) Red Hat Enterprise Linux 10.2-12.el9\nCopyright (C) 2021 Free Software Foundation, Inc.\nLicense GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\nType \"show copying\" and \"show warranty\" for details.\nThis GDB was configured as \"x86_64-redhat-linux-gnu\".\nType \"show configuration\" for configuration details.\nFor bug reporting instructions, please see:\n<https://www.gnu.org/software/gdb/bugs/>.\nFind the GDB manual and other documentation resources online at:\n <http://www.gnu.org/software/gdb/documentation/>.\n\nFor help, type \"help\".\n--Type <RET> for more, q to quit, c to continue without paging--\nType \"apropos word\" to search for commands related to \"word\"...\nReading symbols from /usr/libexec/qemu-kvm...\n\nReading symbols from /root/.cache/debuginfod_client/26221a84fabd219a68445ad0cc87283e881fda15/debuginfo...\nRemote debugging using localhost:1234\nReading /lib64/ld-linux-x86-64.so.2 from remote target...\nwarning: File transfers from remote targets can be slow. Use \"set sysroot\" to access files locally instead.\nReading /lib64/ld-linux-x86-64.so.2 from remote target...\nReading symbols from target:/lib64/ld-linux-x86-64.so.2...\nDownloading separate debug info for /system-supplied DSO at 0x7ffc10eff000...\n0x00007f1a70225e70 in _start () from target:/lib64/ld-linux-x86-64.so.2\n
For simplicity, we started podman with the option --network host
in this way, the container is able to access any port mapped on the host.
This guide explains how launch QEMU with a debugging tool in virt-launcher pod. This method can be useful to debug early failures or starting QEMU as a child of the debug tool relying on ptrace. The second point is particularly relevant when a process is operating in a non-privileged environment since otherwise, it would need root access to be able to ptrace the process.
Ephemeral containers are among the emerging techniques to overcome the lack of debugging tool inside the original image. This solution does, however, come with a number of limitations. For example, it is possible to spawn a new container inside the same pod of the application to debug and share the same PID namespace. Though they share the same PID namespace, KubeVirt's usage of unprivileged containers makes it, for example, impossible to ptrace a running container. Therefore, this technique isn't appropriate for our needs.
Due to its security and image size reduction, KubeVirt container images are based on distroless containers. These kinds of images are extremely beneficial for deployments, but they are challenging to troubleshoot because there is no package management, which prevents the installation of additional tools on the flight.
Wrapping the QEMU binary in a script is one practical method for debugging QEMU launched by Libvirt. This script launches the QEMU as a child of this process together with the debugging tool (such as strace or valgrind).
The final part that needs to be added is the configuration for Libvirt to use the wrapped script rather than calling the QEMU program directly.
It is possible to alter the generated XML with the help of KubeVirt sidecars. This allows us to use the wrapping script in place of the built-in emulator.
The primary concept behind this configuration is that all of the additional tools, scripts, and final output files will be stored in a PerstistentVolumeClaim (PVC) that this guide refers to as debug-tools
. The virt-launcher pod that we wish to debug will have this PVC attached to it.
PVC:
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: debug-tools\nspec:\n accessModes:\n - ReadWriteOnce\n volumeMode: Filesystem\n resources:\n requests:\n storage: 1Gi\n
In this guide, we'll apply the above concepts to debug QEMU inside virt-launcher using strace without the need of build a custom virt-launcher image.
You can see a full demo of this setup:
"},{"location":"debug_virt_stack/launch-qemu-strace/#how-to-bring-the-debug-tools-and-wrapping-script-into-distroless-containers","title":"How to bring the debug tools and wrapping script into distroless containers","text":"This section provides an example of how to provide extra tools into the distroless container that will be supplied as a PVC using a Dockerfile. Although there are several ways to accomplish this, this covers a relatively simple technique. Alternatively, you could run a pod and manually populate the PVC by execing into the pod.
Dockerfile:
FROM quay.io/centos/centos:stream9 as build\n\nENV DIR /debug-tools\nRUN mkdir -p ${DIR}/logs\n\nRUN yum install --installroot=${DIR} -y strace && yum clean all\n\nCOPY ./wrap_qemu_strace.sh $DIR/wrap_qemu_strace.sh\nRUN chmod 0755 ${DIR}/wrap_qemu_strace.sh\nRUN chown 107:107 ${DIR}/wrap_qemu_strace.sh\nRUN chown 107:107 ${DIR}/logs\n
The directory debug-tools
stores the content that will be later copied inside the debug-tools
PVC. We are essentially adding the missing utilities in the custom directory with yum install --installroot=${DIR}}
, and the parent image matches with the parent images of virt-launcher.
The wrap_qemu_strace.sh
is the wrapping script that will be used to launch QEMU with strace
similarly as the example with valgrind
.
#!/bin/bash\n\nLD_LIBRARY_PATH=$LD_LIBRARY_PATH:/var/run/debug/usr/lib64 /var/run/debug/usr/bin/strace \\\n -o /var/run/debug/logs/strace.out \\\n /usr/libexec/qemu-kvm $@\n
It is important to set the dynamic library path LD_LIBRARY_PATH
to the path where the PVC will be mounted in the virt-launcher container.
Then, you will simply need to build the image and your debug setup is ready. The Dockerfle and the script wrap_qemu_strace.sh
need to be in the same directory where you run the command.
$ podman build -t debug .\n
The second step is to populate the PVC. This can be easily achieved using a kubernetes Job
like:
apiVersion: batch/v1\nkind: Job\nmetadata:\n name: populate-pvc\nspec:\n template:\n spec:\n volumes:\n - name: populate\n persistentVolumeClaim:\n claimName: debug-tools\n containers:\n - name: populate\n image: registry:5000/debug:latest\n command: [\"sh\", \"-c\", \"cp -r /debug-tools/* /vol\"]\n imagePullPolicy: Always\n volumeMounts:\n - mountPath: \"/vol\"\n name: populate\n restartPolicy: Never\n backoffLimit: 4\n
The image referenced in the Job
is the image we built in the previous step. Once applied this and the job completed, thedebug-tools
PVC is ready to be used.
This part is achieved by using ConfigMaps and a KubeVirt sidecar (more details in the section Using ConfigMap to run custom script).
Configmap:
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: my-config-map\ndata:\n my_script.sh: |\n #!/bin/sh\n tempFile=`mktemp --dry-run`\n echo $4 > $tempFile\n sed -i \"s|<emulator>/usr/libexec/qemu-kvm</emulator>|<emulator>/var/run/debug/wrap_qemu_strace.sh</emulator>|\" $tempFile\n cat $tempFile\n
The script that replaces the QEMU binary with the wrapping script in the XML is stored in the configmap my-config-map
. This script will run as a hook, as explained in full in the documentation for the KubeVirt sidecar.
Once all the objects created, we can finally run the guest to debug.
VMI:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n hooks.kubevirt.io/hookSidecars: '[{\"args\": [\"--version\", \"v1alpha2\"],\n \"image\":\"registry:5000/kubevirt/sidecar-shim:devel\",\n \"pvc\": {\"name\": \"debug-tools\",\"volumePath\": \"/debug\", \"sharedComputePath\": \"/var/run/debug\"},\n \"configMap\": {\"name\": \"my-config-map\",\"key\": \"my_script.sh\", \"hookPath\": \"/usr/bin/onDefineDomain\"}}]'\n labels:\n special: vmi-debug-tools\n name: vmi-debug-tools\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n
The VMI example is a simply VM instance declaration and the interesting parts are the annotations for the hook: * image
refers to the sidecar-shim already built and shipped with KubeVirt * pvc
refers to the PVC populated with the debug setup. The name
refers to the claim name, the volumePath
is the path inside the sidecar container where the volume is mounted while the sharedComputePath
is the path of the same volume inside the compute container. * configMap
refers to the confimap containing the script to modify the XML for the wrapping script
Once the VM is declared, the hook will modify the emulator section and Libvirt will call the wrapping script instead of QEMU directly.
"},{"location":"debug_virt_stack/launch-qemu-strace/#how-to-fetch-the-output","title":"How to fetch the output","text":"The wrapping script configures strace
to store the output in the PVC. In this way, it is possible to retrieve the output file in a later time, for example using an additional pod like:
apiVersion: v1\nkind: Pod\nmetadata:\n name: fetch-logs\nspec:\n securityContext:\n runAsUser: 107\n fsGroup: 107\n volumes:\n - name: populate\n persistentVolumeClaim:\n claimName: debug-tools\n containers:\n - name: populate\n image: busybox:latest\n command: [\"tail\", \"-f\", \"/dev/null\"]\n volumeMounts:\n - mountPath: \"/vol\"\n name: populate\n
It is then possible to copy the file locally with:
$ kubectl cp fetch-logs:/vol/logs/strace.out strace.out\n
"},{"location":"debug_virt_stack/logging/","title":"Control libvirt logging for each component","text":"Generally, cluster admins can control the log verbosity of each KubeVirt component in KubeVirt CR. For more details, please, check the KubeVirt documentation.
Nonetheless, regular users can also adjust the qemu component logging to have a finer control over it. The annotation kubevirt.io/libvirt-log-filters
enables you to modify each component's log level.
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n kubevirt.io/libvirt-log-filters: \"2:qemu.qemu_monitor 3:*\"\n labels:\n special: vmi-debug-tools\n name: vmi-debug-tools\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n
Then, it is possible to obtain the logs from the virt-launcher output:
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-debug-tools-fk64q 3/3 Running 0 64s\n$ kubectl logs virt-launcher-vmi-debug-tools-fk64q\n[..]\n{\"component\":\"virt-launcher\",\"level\":\"info\",\"msg\":\"QEMU_MONITOR_RECV_EVENT: mon=0x7faa8801f5d0 event={\\\"timestamp\\\": {\\\"seconds\\\": 1698324640, \\\"microseconds\\\": 523652}, \\\"event\\\": \\\"NIC_RX_FILTER_CHANGED\\\", \\\"data\\\": {\\\"name\\\": \\\"ua-default\\\", \\\"path\\\": \\\"/machine/peripheral/ua-default/virtio-backend\\\"}}\",\"pos\":\"qemuMonitorJSONIOProcessLine:205\",\"subcomponent\":\"libvirt\",\"thread\":\"80\",\"timestamp\":\"2023-10-26T12:50:40.523000Z\"}\n{\"component\":\"virt-launcher\",\"level\":\"info\",\"msg\":\"QEMU_MONITOR_RECV_EVENT: mon=0x7faa8801f5d0 event={\\\"timestamp\\\": {\\\"seconds\\\": 1698324644, \\\"microseconds\\\": 165626}, \\\"event\\\": \\\"VSERPORT_CHANGE\\\", \\\"data\\\": {\\\"open\\\": true, \\\"id\\\": \\\"channel0\\\"}}\",\"pos\":\"qemuMonitorJSONIOProcessLine:205\",\"subcomponent\":\"libvirt\",\"thread\":\"80\",\"timestamp\":\"2023-10-26T12:50:44.165000Z\"}\n[..]\n{\"component\":\"virt-launcher\",\"level\":\"info\",\"msg\":\"QEMU_MONITOR_RECV_EVENT: mon=0x7faa8801f5d0 event={\\\"timestamp\\\": {\\\"seconds\\\": 1698324646, \\\"microseconds\\\": 707666}, \\\"event\\\": \\\"RTC_CHANGE\\\", \\\"data\\\": {\\\"offset\\\": 0, \\\"qom-path\\\": \\\"/machine/unattached/device[8]\\\"}}\",\"pos\":\"qemuMonitorJSONIOProcessLine:205\",\"subcomponent\":\"libvirt\",\"thread\":\"80\",\"timestamp\":\"2023-10-26T12:50:46.708000Z\"}\n[..]\n
The annotation enables the filter from the container creation. However, in certain cases you might desire to change the logging level dynamically once the container and libvirt have already been started. In this case, virt-admin
comes to the rescue.
Example:
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-ephemeral-nqcld 3/3 Running 0 26m\n$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virt-admin -c virtqemud:///session daemon-log-filters \"1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util\"\n$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virt-admin -c virtqemud:///session daemon-log-filters\n Logging filters: 1:*libvirt* 1:*qemu* 1:*conf* 1:*security* 3:*event* 3:*json* 3:*file* 3:*object* 1:*util*\n
Otherwise, if you prefer to redirect the output to a file and fetch it later, you can rely on kubectl cp
to retrieve the file. In this case, we are saving the file in the /var/run/libvirt
directory because the compute container has the permissions to write there.
Example:
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-ephemeral-nqcld 3/3 Running 0 26m\n$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virt-admin -c virtqemud:///session daemon-log-outputs \"1:file:/var/run/libvirt/libvirtd.log\"\n$ kubectl cp virt-launcher-vmi-ephemeral-nqcld:/var/run/libvirt/libvirtd.log libvirt-kubevirt.log\ntar: Removing leading `/' from member names\n
"},{"location":"debug_virt_stack/privileged-node-debugging/","title":"Privileged debugging on the node","text":"This article describes the scenarios in which you can create privileged pods and have root access to the cluster nodes.
With privileged pods, you may access devices in /dev
, utilize host namespaces and ptrace processes that are running on the node, and use the hostPath
volume to mount node directories in the container.
A quick way to verify if you are allowed to create privileged pods is to create a sample pod with the --dry-run=server
option, like:
$ kubectl apply -f debug-pod.ymal --dry-run=server\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#build-the-container-image","title":"Build the container image","text":"KubeVirt uses distroless containers and those images don't have a package manager, for this reason it isn't possible to use the image as parent for installing additional packages.
In certain debugging scenarios, the tools require to have exactly the same binary available. However, if the debug tools are operating in a different container, this can be especially difficult as the filesystems of the containers are isolated.
This section will cover how to build a container image with the debug tools plus binaries of the KubeVirt version you want to debug.
Based on your installation the namespace and the name of the KubeVirt CR could vary. In this example, we'll assume that KubeVirt CR is called kubevirt
and installed in the kubevirt
namespace. You can easily find out how it is called in your cluster by searching with kubectl get kubevirt -A
. This is necessary as we need to retrieve the original virt-launcher
image to have exactly the same QEMU binary we want to debug.
Get the registry of the images of the KubeVirt installation:
$ export registry=$(kubectl get kubevirt kubevirt -n kubevirt -o jsonpath='{.status.observedDeploymentConfig}' |jq '.registry'|tr -d \"\\\"\")\n$ echo $registry\n\"registry:5000/kubevirt\"\n
Get the shasum of the virt-launcher image:
$ export tag=$(kubectl get kubevirt kubevirt -n kubevirt -o jsonpath='{.status.observedDeploymentConfig}' |jq '.virtLauncherSha'|tr -d \"\\\"\")\n$ echo $tag\n\"sha256:6c8b85eed8e83a4c70779836b246c057d3e882eb513f3ded0a02e0a4c4bda837\"\n
Dockerfile:
ARG registry\nARG tag\nFROM ${registry}/kubevirt/virt-launcher${tag} AS launcher\n\nFROM quay.io/centos/centos:stream9\n\nRUN yum install -y \\\n gdb \\\n kernel-devel \\\n qemu-kvm-tools \\\n strace \\\n systemtap-client \\\n systemtap-devel \\\n && yum clean all\nCOPY --from=launcher / /\n
Then, we can build the image by using the registry
and the tag
retrieved in the previous steps:
$ podman build \\\n -t debug-tools \\\n --build-arg registry=$registry \\\n --build-arg tag=@$tag \\\n -f Dockerfile .\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#deploy-the-privileged-debug-pod","title":"Deploy the privileged debug pod","text":"This is an example that gives you a couple of suggestions how you can define your debugging pod:
kind: Pod\nmetadata:\n name: node01-debug\nspec:\n containers:\n - command:\n - /bin/sh\n image: registry:5000/debug-tools:latest\n imagePullPolicy: Always\n name: debug\n securityContext:\n privileged: true\n runAsUser: 0\n stdin: true\n stdinOnce: true\n tty: true\n volumeMounts:\n - mountPath: /host\n name: host\n - mountPath: /usr/lib/modules\n name: modules\n - mountPath: /sys/kernel\n name: sys-kernel\n hostNetwork: true\n hostPID: true\n nodeName: node01\n restartPolicy: Never\n volumes:\n - hostPath:\n path: /\n type: Directory\n name: host\n - hostPath:\n path: /usr/lib/modules\n type: Directory\n name: modules\n - hostPath:\n path: /sys/kernel\n type: Directory\n name: sys-kernel\n
The privileged
option is required to have access to mostly all the resources on the node.
The nodeName
ensures that the debugging pod will be scheduled on the desired node. In order to select the right now, you can use the -owide
option with kubectl get po
and this will report the nodes where the pod is running.
Example:
k get pods -owide\nNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES\nlocal-volume-provisioner-4jtkb 1/1 Running 0 152m 10.244.196.129 node01 <none> <none>\nnode01-debug 1/1 Running 0 44m 192.168.66.101 node01 <none> <none>\nvirt-launcher-vmi-ephemeral-xg98p 3/3 Running 0 2m54s 10.244.196.148 node01 <none> 1/1\n
In the volumes
section, you can specify the directories you want to be directly mounted in the debugging container. For example, /usr/lib/modules
is particularly useful if you need to load some kernel modules.
Sharing the host pid namespace with the option hostPID
allows you to see all the processes on the node and attach to it with tools like gdb
and strace
.
exec
-ing into the pod gives you a shell with privileged access to the node plus the tooling you installed into the image:
$ kubectl exec -ti debug -- bash\n
The following examples assume you have already execed into the node01-debug
pod.
The tool vist-host-validate
is utility to validate the host to run libvirt hypervisor. This, for example, can be used to check if a particular node is kvm capable.
Example:
$ virt-host-validate\n QEMU: Checking for hardware virtualization : PASS\n QEMU: Checking if device /dev/kvm exists : PASS\n QEMU: Checking if device /dev/kvm is accessible : PASS\n QEMU: Checking if device /dev/vhost-net exists : PASS\n QEMU: Checking if device /dev/net/tun exists : PASS\n QEMU: Checking for cgroup 'cpu' controller support : PASS\n QEMU: Checking for cgroup 'cpuacct' controller support : PASS\n QEMU: Checking for cgroup 'cpuset' controller support : PASS\n QEMU: Checking for cgroup 'memory' controller support : PASS\n QEMU: Checking for cgroup 'devices' controller support : PASS\n QEMU: Checking for cgroup 'blkio' controller support : PASS\n QEMU: Checking for device assignment IOMMU support : PASS\n QEMU: Checking if IOMMU is enabled by kernel : PASS\n QEMU: Checking for secure guest support : WARN (Unknown if this platform has Secure\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#run-a-command-directly-on-the-node","title":"Run a command directly on the node","text":"The debug container has in the volume section the host filesystem mounted under /host
. This can be particularly useful if you want to access the node filesystem or execute a command directly on the host. However, the tool needs already to be present on the node.
# chroot /host\nsh-5.1# cat /etc/os-release\nNAME=\"CentOS Stream\"\nVERSION=\"9\"\nID=\"centos\"\nID_LIKE=\"rhel fedora\"\nVERSION_ID=\"9\"\nPLATFORM_ID=\"platform:el9\"\nPRETTY_NAME=\"CentOS Stream 9\"\nANSI_COLOR=\"0;31\"\nLOGO=\"fedora-logo-icon\"\nCPE_NAME=\"cpe:/o:centos:centos:9\"\nHOME_URL=\"https://centos.org/\"\nBUG_REPORT_URL=\"https://bugzilla.redhat.com/\"\nREDHAT_SUPPORT_PRODUCT=\"Red Hat Enterprise Linux 9\"\nREDHAT_SUPPORT_PRODUCT_VERSION=\"CentOS Stream\"\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#attach-to-a-running-process-eg-strace-or-gdb","title":"Attach to a running process (e.g strace or gdb)","text":"This requires the field hostPID: true
in this way you are able to list all the processes running on the node.
$ ps -ef |grep qemu-kvm\nqemu 50122 49850 0 12:34 ? 00:00:25 /usr/libexec/qemu-kvm -name guest=default_vmi-ephemeral,debug-threads=on -S -object {\"qom-type\":\"secret\",\"id\":\"masterKey0\",\"format\":\"raw\",\"file\":\"/var/run/kubevirt-private/libvirt/qemu/lib/domain-1-default_vmi-ephemera/master-key.aes\"} -machine pc-q35-rhel9.2.0,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=on -accel kvm -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,fb-clear=on,hle=off,rtm=off -m size=131072k -object {\"qom-type\":\"memory-backend-ram\",\"id\":\"pc.ram\",\"size\":134217728} -overcommit mem-lock=off -smp 1,sockets=1,dies=1,cores=1,threads=1 -object {\"qom-type\":\"iothread\",\"id\":\"iothread1\"} -uuid b56f06f0-07e9-4fe5-8913-18a14e83a4d1 -smbios type=1,manufacturer=KubeVirt,product=None,uuid=b56f06f0-07e9-4fe5-8913-18a14e83a4d1,family=KubeVirt -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=21,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device {\"driver\":\"pcie-root-port\",\"port\":16,\"chassis\":1,\"id\":\"pci.1\",\"bus\":\"pcie.0\",\"multifunction\":true,\"addr\":\"0x2\"} -device {\"driver\":\"pcie-root-port\",\"port\":17,\"chassis\":2,\"id\":\"pci.2\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x1\"} -device {\"driver\":\"pcie-root-port\",\"port\":18,\"chassis\":3,\"id\":\"pci.3\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x2\"} -device {\"driver\":\"pcie-root-port\",\"port\":19,\"chassis\":4,\"id\":\"pci.4\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x3\"} -device {\"driver\":\"pcie-root-port\",\"port\":20,\"chassis\":5,\"id\":\"pci.5\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x4\"} -device {\"driver\":\"pcie-root-port\",\"port\":21,\"chassis\":6,\"id\":\"pci.6\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x5\"} -device {\"driver\":\"pcie-root-port\",\"port\":22,\"chassis\":7,\"id\":\"pci.7\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x6\"} -device {\"driver\":\"pcie-root-port\",\"port\":23,\"chassis\":8,\"id\":\"pci.8\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x7\"} -device {\"driver\":\"pcie-root-port\",\"port\":24,\"chassis\":9,\"id\":\"pci.9\",\"bus\":\"pcie.0\",\"addr\":\"0x3\"} -device {\"driver\":\"virtio-scsi-pci-non-transitional\",\"id\":\"scsi0\",\"bus\":\"pci.5\",\"addr\":\"0x0\"} -device {\"driver\":\"virtio-serial-pci-non-transitional\",\"id\":\"virtio-serial0\",\"bus\":\"pci.6\",\"addr\":\"0x0\"} -blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt/container-disks/disk_0.img\",\"node-name\":\"libvirt-2-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"} -blockdev {\"node-name\":\"libvirt-2-format\",\"read-only\":true,\"discard\":\"unmap\",\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"qcow2\",\"file\":\"libvirt-2-storage\"} -blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2\",\"node-name\":\"libvirt-1-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"} -blockdev {\"node-name\":\"libvirt-1-format\",\"read-only\":false,\"discard\":\"unmap\",\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"qcow2\",\"file\":\"libvirt-1-storage\",\"backing\":\"libvirt-2-format\"} -device {\"driver\":\"virtio-blk-pci-non-transitional\",\"bus\":\"pci.7\",\"addr\":\"0x0\",\"drive\":\"libvirt-1-format\",\"id\":\"ua-containerdisk\",\"bootindex\":1,\"write-cache\":\"on\",\"werror\":\"stop\",\"rerror\":\"stop\"} -netdev {\"type\":\"tap\",\"fd\":\"22\",\"vhost\":true,\"vhostfd\":\"24\",\"id\":\"hostua-default\"} -device {\"driver\":\"virtio-net-pci-non-transitional\",\"host_mtu\":1480,\"netdev\":\"hostua-default\",\"id\":\"ua-default\",\"mac\":\"7e:cb:ba:c3:71:88\",\"bus\":\"pci.1\",\"addr\":\"0x0\",\"romfile\":\"\"} -add-fd set=0,fd=20,opaque=serial0-log -chardev socket,id=charserial0,fd=18,server=on,wait=off,logfile=/dev/fdset/0,logappend=on -device {\"driver\":\"isa-serial\",\"chardev\":\"charserial0\",\"id\":\"serial0\",\"index\":0} -chardev socket,id=charchannel0,fd=19,server=on,wait=off -device {\"driver\":\"virtserialport\",\"bus\":\"virtio-serial0.0\",\"nr\":1,\"chardev\":\"charchannel0\",\"id\":\"channel0\",\"name\":\"org.qemu.guest_agent.0\"} -audiodev {\"id\":\"audio1\",\"driver\":\"none\"} -vnc vnc=unix:/var/run/kubevirt-private/3a8f7774-7ec7-4cfb-97ce-581db52ee053/virt-vnc,audiodev=audio1 -device {\"driver\":\"VGA\",\"id\":\"video0\",\"vgamem_mb\":16,\"bus\":\"pcie.0\",\"addr\":\"0x1\"} -global ICH9-LPC.noreboot=off -watchdog-action reset -device {\"driver\":\"virtio-balloon-pci-non-transitional\",\"id\":\"balloon0\",\"free-page-reporting\":true,\"bus\":\"pci.8\",\"addr\":\"0x0\"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on\n$ gdb -p 50122 /usr/libexec/qemu-kvm\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#debugging-using-crictl","title":"Debugging using crictl
","text":"Crictl
is a cli for CRI runtimes and can be particularly useful to troubleshoot container failures (for a more detailed guide, please refer to this Kubernetes article).
In this example, we'll concentrate to find where libvirt creates the files and directory in the compute
container of the virt-launcher pod.
$ crictl ps |grep compute\n67bc7be3222da 5ef5ba25a087a80e204f28be6c9250bbf378fd87fa927085abd516188993d695 25 minutes ago Running compute 0 7b045ea9f485f virt-launcher-vmi-ephemeral-xg98p\n$ crictl inspect 67bc7be3222da\n[..]\n \"mounts\": [\n {\n {\n \"containerPath\": \"/var/run/libvirt\",\n \"hostPath\": \"/var/lib/kubelet/pods/2ccc3e93-d1c3-4f22-bb31-321bfa74edf6/volumes/kubernetes.io~empty-dir/libvirt-runtime\",\n \"propagation\": \"PROPAGATION_PRIVATE\",\n \"readonly\": false,\n \"selinuxRelabel\": true\n },\n[..]\n$ ls /var/lib/kubelet/pods/2ccc3e93-d1c3-4f22-bb31-321bfa74edf6/volumes/kubernetes.io~empty-dir/libvirt-runtime/\ncommon qemu virtlogd-sock virtqemud-admin-sock virtqemud.conf\nhostdevmgr virtlogd-admin-sock virtlogd.pid virtqemud-sock virtqemud.pid\n
"},{"location":"debug_virt_stack/virsh-commands/","title":"Execute virsh commands in virt-launcher pod","text":"A powerful utility to check and troubleshoot the VM state is virsh
and the utility is already installed in the compute
container on the virt-launcher pod.
For example, it possible to run any QMP commands.
For a full list of QMP command, please refer to the QEMU documentation.
$ kubectl get po\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-ephemeral-xg98p 3/3 Running 0 44m\n$ kubectl exec -ti virt-launcher-vmi-debug-tools-fk64q -- bash\nbash-5.1$ virsh list\n Id Name State\n-----------------------------------------\n 1 default_vmi-debug-tools running\nbash-5.1$ virsh qemu-monitor-command default_vmi-debug-tools query-status --pretty\n{\n \"return\": {\n \"status\": \"running\",\n \"singlestep\": false,\n \"running\": true\n },\n \"id\": \"libvirt-439\"\n}\n$ virsh qemu-monitor-command default_vmi-debug-tools query-kvm --pretty\n{\n \"return\": {\n \"enabled\": true,\n \"present\": true\n },\n \"id\": \"libvirt-438\"\n}\n
Another useful virsh command is the qemu-monitor-event
. Once invoked, it observes and reports the QEMU events.
The following example shows the events generated for pausing and unpausing the guest.
$ kubectl get po\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-ephemeral-nqcld 3/3 Running 0 57m\n$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virsh qemu-monitor-event --pretty --loop\n
Then, you can, for example, pause and then unpause the guest and check the triggered events:
$ virtctl pause vmi vmi-ephemeral\nVMI vmi-ephemeral was scheduled to pause\n $ virtctl unpause vmi vmi-ephemeral\nVMI vmi-ephemeral was scheduled to unpause\n
From the monitored events:
$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virsh qemu-monitor-event --pretty --loop\nevent STOP at 1698405797.422823 for domain 'default_vmi-ephemeral': <null>\nevent RESUME at 1698405823.162458 for domain 'default_vmi-ephemeral': <null>\n
"},{"location":"network/dns/","title":"DNS records","text":"In order to create unique DNS records per VirtualMachineInstance, it is possible to set spec.hostname
and spec.subdomain
. If a subdomain is set and a headless service with a name, matching the subdomain, exists, kube-dns will create unique DNS entries for every VirtualMachineInstance which matches the selector of the service. Have a look at the DNS for Services and Pods documentation for additional information.
The following example consists of a VirtualMachine and a headless Service which matches the labels and the subdomain of the VirtualMachineInstance:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: vmi-fedora\n labels:\n expose: me\nspec:\n hostname: \"myvmi\"\n subdomain: \"mysubdomain\"\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-registry-disk-demo:latest\n - cloudInitNoCloud:\n userDataBase64: IyEvYmluL2Jhc2gKZWNobyAiZmVkb3JhOmZlZG9yYSIgfCBjaHBhc3N3ZAo=\n name: cloudinitdisk\n---\napiVersion: v1\nkind: Service\nmetadata:\n name: mysubdomain\nspec:\n selector:\n expose: me\n clusterIP: None\n ports:\n - name: foo # Actually, no port is needed.\n port: 1234\n targetPort: 1234\n
As a consequence, when we enter the VirtualMachineInstance via e.g. virtctl console vmi-fedora
and ping myvmi.mysubdomain
we see that we find a DNS entry for myvmi.mysubdomain.default.svc.cluster.local
which points to 10.244.0.57
, which is the IP of the VirtualMachineInstance (not of the Service):
[fedora@myvmi ~]$ ping myvmi.mysubdomain\nPING myvmi.mysubdomain.default.svc.cluster.local (10.244.0.57) 56(84) bytes of data.\n64 bytes from myvmi.mysubdomain.default.svc.cluster.local (10.244.0.57): icmp_seq=1 ttl=64 time=0.029 ms\n[fedora@myvmi ~]$ ip a\n2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000\n link/ether 0a:58:0a:f4:00:39 brd ff:ff:ff:ff:ff:ff\n inet 10.244.0.57/24 brd 10.244.0.255 scope global dynamic eth0\n valid_lft 86313556sec preferred_lft 86313556sec\n inet6 fe80::858:aff:fef4:39/64 scope link\n valid_lft forever preferred_lft forever\n
So spec.hostname
and spec.subdomain
get translated to a DNS A-record of the form <vmi.spec.hostname>.<vmi.spec.subdomain>.<vmi.metadata.namespace>.svc.cluster.local
. If no spec.hostname
is set, then we fall back to the VirtualMachineInstance name itself. The resulting DNS A-record looks like this then: <vmi.metadata.name>.<vmi.spec.subdomain>.<vmi.metadata.namespace>.svc.cluster.local
.
Release: - v1.1.0: Alpha - v1.3.0: Beta
KubeVirt supports hotplugging and unplugging network interfaces into a running Virtual Machine (VM).
Hotplug is supported for interfaces using the virtio
model connected through bridge binding or SR-IOV binding.
Hot-unplug is supported only for interfaces connected through bridge binding.
"},{"location":"network/hotplug_interfaces/#requirements","title":"Requirements","text":"Adding an interface to a KubeVirt Virtual Machine requires first an interface to be added to a running pod. This is not trivial, and has some requirements:
Network interface hotplug support must be enabled via a feature gate. The feature gates array in the KubeVirt CR must feature HotplugNICs
.
First start a VM. You can refer to the following example:
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\nspec:\n running: true\n template:\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n interfaces:\n - masquerade: {}\n name: defaultnetwork\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: defaultnetwork\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n
You should configure a network attachment definition - where the pod interface configuration is held. The snippet below shows an example of a very simple one:
apiVersion: k8s.cni.cncf.io/v1\nkind: NetworkAttachmentDefinition\nmetadata:\n name: new-fancy-net\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"type\": \"bridge\",\n \"mtu\": 1300,\n \"name\":\"new-fancy-net\"\n }'\n
Please refer to the Multus documentation for more information. Once the virtual machine is running, and the attachment configuration provisioned, the user can request the interface hotplug operation by editing the VM spec template and adding the desired interface and network:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # new interface\n - name: dyniface1\n bridge: {}\n networks:\n - name: defaultnetwork\n pod: {}\n # new network\n - name: dyniface1\n multus:\n networkName: new-fancy-net\n ...\n
Note: virtctl
addinterface
and removeinterface
commands are no longer available, hotplug/unplug interfaces is done by editing the VM spec template.
The interface and network will be added to the corresponding VMI object as well by Kubevirt.
You can now check the VMI status for the presence of this new interface:
kubectl get vmi vm-fedora -ojsonpath=\"{ @.status.interfaces }\"\n
"},{"location":"network/hotplug_interfaces/#removing-an-interface-from-a-running-vm","title":"Removing an interface from a running VM","text":"Following the example above, the user can request an interface unplug operation by editing the VM spec template and set the desired interface state to absent
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # set the interface state to absent \n - name: dyniface1\n state: absent\n bridge: {}\n networks:\n - name: defaultnetwork\n pod: {}\n - name: dyniface1\n multus:\n networkName: new-fancy-net\n
The interface in the corresponding VMI object will be set with state 'absent' as well by Kubevirt. Note: Existing VMs from version v0.59.0 and below do not support hot-unplug interfaces.
"},{"location":"network/hotplug_interfaces/#migration-based-hotplug","title":"Migration based hotplug","text":"In case your cluster doesn't run Multus as thick plugin and Multus Dynamic Networks controller, it's possible to hotplug an interface by migrating the VM.
The actual attachment won't take place immediately, and the new interface will be available in the guest once the migration is completed.
"},{"location":"network/hotplug_interfaces/#add-new-interface","title":"Add new interface","text":"Add the desired interface and network to the VM spec template:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # new interface\n - name: dyniface1\n bridge: {}\n networks:\n - name: defaultnetwork\n pod: {}\n # new network\n - name: dyniface1\n multus:\n networkName: new-fancy-net\n ...\n
At this point the interface and network will be added to the corresponding VMI object as well, but won't be attached to the guest.
"},{"location":"network/hotplug_interfaces/#migrate-the-vm","title":"Migrate the VM","text":"cat <<EOF kubectl apply -f -\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceMigration\nmetadata:\n name: migration-job\nspec:\n vmiName: vmi-fedora\nEOF\n
Please refer to the Live Migration documentation for more information. Once the migration is completed the VM will have the new interface attached.
Note: It is recommended to avoid performing migrations in parallel to a hotplug operation. It is safer to assure hotplug succeeded or at least reached the VMI specification before issuing a migration.
"},{"location":"network/hotplug_interfaces/#remove-interface","title":"Remove interface","text":"Set the desired interface state to absent
in the VM spec template:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # set the interface state to absent \n - name: dyniface1\n state: absent\n bridge: {}\n networks:\n - name: defaultnetwork\n pod: {}\n - name: dyniface1\n multus:\n networkName: new-fancy-net\n
At this point the subject interface should be detached from the guest but exist in the pod.
Note: Existing VMs from version v0.59.0 and below do not support hot-unplug interfaces.
"},{"location":"network/hotplug_interfaces/#migrate-the-vm_1","title":"Migrate the VM","text":"cat <<EOF kubectl apply -f -\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceMigration\nmetadata:\n name: migration-job\nspec:\n vmiName: vmi-fedora\nEOF\n
Please refer to the Live Migration documentation for more information. Once the VM is migrated, the interface will not exist in the migration target pod.
Note: It is recommended to avoid performing migrations in parallel to an unplug operation. It is safer to assure unplug succeeded or at least reached the VMI specification before issuing a migration.
"},{"location":"network/hotplug_interfaces/#sr-iov-interfaces","title":"SR-IOV interfaces","text":"Kubevirt supports hot-plugging of SR-IOV interfaces to running VMs.
Similar to bridge binding interfaces, edit the VM spec template and add the desired SR-IOV interface and network:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # new interface\n - name: sriov-net\n sriov: {}\n networks:\n - name: defaultnetwork\n pod: {}\n # new network\n - name: sriov-net\n multus:\n networkName: sriov-net-1\n ...\n
Please refer to the Interface and Networks documentation for more information about SR-IOV networking. At this point the interface and network will be added to the corresponding VMI object as well, but won't be attached to the guest.
"},{"location":"network/hotplug_interfaces/#migrate-the-vm_2","title":"Migrate the VM","text":"cat <<EOF kubectl apply -f -\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceMigration\nmetadata:\n name: migration-job\nspec:\n vmiName: vmi-fedora\nEOF\n
Please refer to the Live Migration documentation for more information. Once the VM is migrated, the interface will not exist in the migration target pod. Due to limitation of Kubernetes device plugin API to allocate resources dynamically, the SR-IOV device plugin cannot allocate additional SR-IOV resources for Kubevirt to hotplug. Thus, SR-IOV interface hotplug is limited to migration based hotplug only, regardless of Multus \"thick\" version.
"},{"location":"network/hotplug_interfaces/#virtio-limitations","title":"Virtio Limitations","text":"The hotplugged interfaces have model: virtio
. This imposes several limitations: each interface will consume a PCI slot in the VM, and there are a total maximum of 32. Furthermore, other devices will also use these PCI slots (e.g. disks, guest-agent, etc).
Kubevirt reserves resources for 4 interface to allow later hotplug operations. The actual maximum amount of available resources depends on the machine type (e.g. q35 adds another PCI slot). For more information on maximum limits, see libvirt documentation.
Yet, upon a VM restart, the hotplugged interface will become part of the standard networks; this mitigates the maximum hotplug interfaces (per machine type) limitation.
Note: The user can execute this command against a stopped VM - i.e. a VM without an associated VMI. When this happens, KubeVirt mutates the VM spec template on behalf of the user.
"},{"location":"network/interfaces_and_networks/","title":"Interfaces and Networks","text":"Connecting a virtual machine to a network consists of two parts. First, networks are specified in spec.networks
. Then, interfaces backed by the networks are added to the VM by specifying them in spec.domain.devices.interfaces
.
Each interface must have a corresponding network with the same name.
An interface
defines a virtual network interface of a virtual machine (also called a frontend). A network
specifies the backend of an interface
and declares which logical or physical device it is connected to (also called as backend).
There are multiple ways of configuring an interface
as well as a network
.
All possible configuration options are available in the Interface API Reference and Network API Reference.
"},{"location":"network/interfaces_and_networks/#backend","title":"Backend","text":"Network backends are configured in spec.networks
. A network must have a unique name. Additional fields declare which logical or physical device the network relates to.
Each network should declare its type by defining one of the following fields:
Type Descriptionpod
Default Kubernetes network
multus
Secondary network provided using Multus
"},{"location":"network/interfaces_and_networks/#pod","title":"pod","text":"A pod
network represents the default pod eth0
interface configured by cluster network solution that is present in each pod.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: default\n masquerade: {}\n networks:\n - name: default\n pod: {} # Stock pod network\n
"},{"location":"network/interfaces_and_networks/#multus","title":"multus","text":"It is also possible to connect VMIs to secondary networks using Multus. This assumes that multus is installed across your cluster and a corresponding NetworkAttachmentDefinition
CRD was created.
The following example defines a network which uses the bridge CNI plugin, which will connect the VMI to Linux bridge br1
. Other CNI plugins such as ptp, ovs-cni, or Flannel might be used as well. For their installation and usage refer to the respective project documentation.
First the NetworkAttachmentDefinition
needs to be created. That is usually done by an administrator. Users can then reference the definition.
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: bridge-test\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"bridge-test\",\n \"type\": \"bridge\",\n \"bridge\": \"br1\",\n \"disableContainerInterface\": true\n }'\n
With following definition, the VMI will be connected to the default pod network and to the secondary Open vSwitch network.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: default\n masquerade: {}\n bootOrder: 1 # attempt to boot from an external tftp server\n dhcpOptions:\n bootFileName: default_image.bin\n tftpServerName: tftp.example.com\n - name: ovs-net\n bridge: {}\n bootOrder: 2 # if first attempt failed, try to PXE-boot from this L2 networks\n networks:\n - name: default\n pod: {} # Stock pod network\n - name: ovs-net\n multus: # Secondary multus network\n networkName: ovs-vlan-100\n
It is also possible to define a multus network as the default pod network with Multus. A version of multus after this Pull Request is required (currently master).
Note the following:
A multus default network and a pod network type are mutually exclusive.
The virt-launcher pod that starts the VMI will not have the pod network configured.
The multus delegate chosen as default must return at least one IP address.
Create a NetworkAttachmentDefinition
with IPAM.
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: bridge-test\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"bridge-test\",\n \"type\": \"bridge\",\n \"bridge\": \"br1\",\n \"ipam\": {\n \"type\": \"host-local\",\n \"subnet\": \"10.250.250.0/24\"\n }\n }'\n
Define a VMI with a Multus network as the default.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: test1\n bridge: {}\n networks:\n - name: test1\n multus: # Multus network as default\n default: true\n networkName: bridge-test\n
"},{"location":"network/interfaces_and_networks/#invalid-cnis-for-secondary-networks","title":"Invalid CNIs for secondary networks","text":"The following list of CNIs is known not to work for bridge interfaces - which are most common for secondary interfaces.
macvlan
ipvlan
The reason is similar: the bridge interface type moves the pod interface MAC address to the VM, leaving the pod interface with a different address. The aforementioned CNIs require the pod interface to have the original MAC address.
These issues are tracked individually:
macvlan
ipvlan
Feel free to discuss and / or propose fixes for them; we'd like to have these plugins as valid options on our ecosystem.
"},{"location":"network/interfaces_and_networks/#frontend","title":"Frontend","text":"Network interfaces are configured in spec.domain.devices.interfaces
. They describe properties of virtual interfaces as \"seen\" inside guest instances. The same network backend may be connected to a virtual machine in multiple different ways, each with their own connectivity guarantees and characteristics.
Each interface should declare its type by defining on of the following fields:
Type Descriptionbridge
Connect using a linux bridge
slirp
Connect using QEMU user networking mode
sriov
Pass through a SR-IOV PCI device via vfio
masquerade
Connect using Iptables rules to nat the traffic
Each interface may also have additional configuration fields that modify properties \"seen\" inside guest instances, as listed below:
Name Format Default value Descriptionmodel
One of: e1000
, e1000e
, ne2k_pci
, pcnet
, rtl8139
, virtio
virtio
NIC type
macAddress
ff:ff:ff:ff:ff:ff
or FF-FF-FF-FF-FF-FF
MAC address as seen inside the guest system, for example: de:ad:00:00:be:af
ports
empty
List of ports to be forwarded to the virtual machine.
pciAddress
0000:81:00.1
Set network interface PCI address, for example: 0000:81:00.1
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: default\n model: e1000 # expose e1000 NIC to the guest\n masquerade: {} # connect through a masquerade\n ports:\n - name: http\n port: 80\n networks:\n - name: default\n pod: {}\n
Note: For secondary interfaces, when a MAC address is specified for a virtual machine interface, it is passed to the underlying CNI plugin which is, in turn, expected to configure the backend to allow for this particular MAC. Not every plugin has native support for custom MAC addresses.
Note: For some CNI plugins without native support for custom MAC addresses, there is a workaround, which is to use the tuning
CNI plugin to adjust pod interface MAC address. This can be used as follows:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: ptp-mac\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"ptp-mac\",\n \"plugins\": [\n {\n \"type\": \"ptp\",\n \"ipam\": {\n \"type\": \"host-local\",\n \"subnet\": \"10.1.1.0/24\"\n }\n },\n {\n \"type\": \"tuning\"\n }\n ]\n }'\n
This approach may not work for all plugins. For example, OKD SDN is not compatible with tuning
plugin.
Plugins that handle custom MAC addresses natively: ovs
, bridge
.
Plugins that are compatible with tuning
plugin: flannel
, ptp
.
Plugins that don't need special MAC address treatment: sriov
(in vfio
mode).
Declare ports listen by the virtual machine
Note: When using the slirp interface only the configured ports will be forwarded to the virtual machine.
Name Format Required Descriptionname
no
Name
port
1 - 65535
yes
Port to expose
protocol
TCP,UDP
no
Connection protocol
Tip: Use e1000
model if your guest image doesn't ship with virtio drivers.
If spec.domain.devices.interfaces
is omitted, the virtual machine is connected using the default pod network interface of bridge
type. If you'd like to have a virtual machine instance without any network connectivity, you can use the autoattachPodInterface
field as follows:
kind: VM\nspec:\n domain:\n devices:\n autoattachPodInterface: false\n
"},{"location":"network/interfaces_and_networks/#mtu","title":"MTU","text":"There are two methods for the MTU to be propagated to the guest interface.
On Windows guest non virtio interfaces, MTU has to be set manually using netsh
or other tool since the Windows DHCP client doesn't request/read the MTU.
The table below is summarizing the MTU propagation to the guest.
masquerade bridge with CNI IP bridge with no CNI IP Windows virtio DHCP & libvirt DHCP & libvirt libvirt libvirt non-virtio DHCP DHCP X XIn bridge
mode, virtual machines are connected to the network backend through a linux \"bridge\". The pod network IPv4 address (if exists) is delegated to the virtual machine via DHCPv4. The virtual machine should be configured to use DHCP to acquire IPv4 addresses.
Note: If a specific MAC address is not configured in the virtual machine interface spec the MAC address from the relevant pod interface is delegated to the virtual machine.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: red\n bridge: {} # connect through a bridge\n networks:\n - name: red\n multus:\n networkName: red\n
At this time, bridge
mode doesn't support additional configuration fields.
Note: due to IPv4 address delegation, in bridge
mode the pod doesn't have an IP address configured, which may introduce issues with third-party solutions that may rely on it. For example, Istio may not work in this mode.
Note: admin can forbid using bridge
interface type for pod networks via a designated configuration flag. To achieve it, the admin should set the following option to false
:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n network:\n permitBridgeInterfaceOnPodNetwork: false\n
Note: binding the pod network using bridge
interface type may cause issues. Other than the third-party issue mentioned in the above note, live migration is not allowed with a pod network binding of bridge
interface type, and also some CNI plugins might not allow to use a custom MAC address for your VM instances. If you think you may be affected by any of issues mentioned above, consider changing the default interface type to masquerade
, and disabling the bridge
type for pod network, as shown in the example above.
In slirp
mode, virtual machines are connected to the network backend using QEMU user networking mode. In this mode, QEMU allocates internal IP addresses to virtual machines and hides them behind NAT.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: red\n slirp: {} # connect using SLIRP mode\n networks:\n - name: red\n pod: {}\n
At this time, slirp
mode doesn't support additional configuration fields.
Note: in slirp
mode, the only supported protocols are TCP and UDP. ICMP is not supported.
More information about SLIRP mode can be found in QEMU Wiki.
Note: Since v1.1.0, Kubevirt delegates Slirp network configuration to the Slirp network binding plugin by default. In case the binding plugin is not registered, Kubevirt will use the following default image: quay.io/kubevirt/network-slirp-binding:20230830_638c60fc8
.
Note: In the next release (v1.2.0) no default image will be set by Kubevirt, registering an image will be mandatory.
Note: On disconnected clusters it will be necessary to mirror Slirp binding plugin image to the cluster registry.
"},{"location":"network/interfaces_and_networks/#masquerade","title":"masquerade","text":"In masquerade
mode, KubeVirt allocates internal IP addresses to virtual machines and hides them behind NAT. All the traffic exiting virtual machines is \"source NAT'ed\" using pod IP addresses; thus, cluster workloads should use the pod's IP address to contact the VM over this interface. This IP address is reported in the VMI's spec.status.interface
. A guest operating system should be configured to use DHCP to acquire IPv4 addresses.
To allow the VM to live-migrate or hard restart (both cause the VM to run on a different pod, with a different IP address) and still be reachable, it should be exposed by a Kubernetes service.
To allow traffic of specific ports into virtual machines, the template ports
section of the interface should be configured as follows. If the ports
section is missing, all ports forwarded into the VM.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: red\n masquerade: {} # connect using masquerade mode\n ports:\n - port: 80 # allow incoming traffic on port 80 to get into the virtual machine\n networks:\n - name: red\n pod: {}\n
Note: Masquerade is only allowed to connect to the pod network.
Note: The network CIDR can be configured in the pod network section using the vmNetworkCIDR
attribute.
masquerade
mode can be used in IPv4 and IPv6 dual-stack clusters to provide a VM with an IP connectivity over both protocols.
As with the IPv4 masquerade
mode, the VM can be contacted using the pod's IP address - which will be in this case two IP addresses, one IPv4 and one IPv6. Outgoing traffic is also \"NAT'ed\" to the pod's respective IP address from the given family.
Unlike in IPv4, the configuration of the IPv6 address and the default route is not automatic; it should be configured via cloud init, as shown below:
kind: VM\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: red\n masquerade: {} # connect using masquerade mode\n ports:\n - port: 80 # allow incoming traffic on port 80 to get into the virtual machine\n networks:\n - name: red\n pod: {}\n volumes:\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n addresses: [ fd10:0:2::2/120 ]\n gateway6: fd10:0:2::1\n userData: |-\n #!/bin/bash\n echo \"fedora\" |passwd fedora --stdin\n
Note: The IPv6 address for the VM and default gateway must be the ones shown above.
"},{"location":"network/interfaces_and_networks/#masquerade-ipv6-single-stack-support","title":"masquerade - IPv6 single-stack support","text":"masquerade
mode can be used in IPv6 single stack clusters to provide a VM with an IPv6 only connectivity.
As with the IPv4 masquerade
mode, the VM can be contacted using the pod's IP address - which will be in this case the IPv6 one. Outgoing traffic is also \"NAT'ed\" to the pod's respective IPv6 address.
As with the dual-stack cluster, the configuration of the IPv6 address and the default route is not automatic; it should be configured via cloud init, as shown in the dual-stack section.
Unlike the dual-stack cluster, which has a DHCP server for IPv4, the IPv6 single stack cluster has no DHCP server at all. Therefore, the VM won't have the search domains information and reaching a destination using its FQDN is not possible. Tracking issue - https://github.com/kubevirt/kubevirt/issues/7184
"},{"location":"network/interfaces_and_networks/#passt","title":"passt","text":"Warning: The core binding is being deprecated and targeted for removal in v1.3 . As an alternative, the same functionality is introduced and available as a binding plugin.
passt
is a new approach for user-mode networking which can be used as a simple replacement for Slirp (which is practically dead).
passt
is a universal tool which implements a translation layer between a Layer-2 network interface and native Layer -4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host.
Its main benefits are: - doesn't require extra network capabilities as CAP_NET_RAW and CAP_NET_ADMIN. - allows integration with service meshes (which expect applications to run locally) out of the box. - supports IPv6 out of the box (in contrast to the existing bindings which require configuring IPv6 manually).
Masquerade Bridge Passt Supports migration Yes No No(will be supported in the future) VM uses Pod IP No Yes Yes(in the future it will be possible to configure the VM IP. Currently the default is the pod IP) Service Mesh out of the box No(only ISTIO is supported, adjustmets on both ISTIO and kubevirt had to be done to make it work) No Yes Doesn\u2019t require extra capabilities on the virt-launcher pod Yes(multiple workarounds had to be added to kuebivrt to make it work) No(Multiple workarounds had to be added to kuebivrt to make it work) Yes Doesn't require extra network devices on the virt-launcher pod No(bridge and tap device are created) No(bridge and tap device are created) Yes Supports IPv6 Yes(requires manual configuration on the VM) No Yeskind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: red\n passt: {} # connect using passt mode\n ports:\n - port: 8080 # allow incoming traffic on port 8080 to get into the virtual machine\n networks:\n - name: red\n pod: {}\n
"},{"location":"network/interfaces_and_networks/#requirementsrecommendations","title":"Requirements/Recommendations:","text":"sysctl -w net.core.rmem_max = 33554432\nsysctl -w net.core.wmem_max = 33554432\n
fs.file-max
should be increased (for a VM forwards all IPv4 and IPv6 ports, for TCP and UDP, passt needs to create ~2^18 sockets): sysctl -w fs.file-max = 9223372036854775807\n
NOTE: To achieve optimal memory consumption with Passt binding, specify ports required for your workload. When no ports are explicitly specified, all ports are forwarded, leading to memory overhead of up to 800 Mi.
"},{"location":"network/interfaces_and_networks/#temporary-restrictions","title":"Temporary restrictions:","text":"passt
currently only supported as primary network and doesn't allow extra multus networks to be configured on the VM.passt interfaces are feature gated; to enable the feature, follow these instructions, in order to activate the Passt
feature gate (case sensitive).
More information about passt mode can be found in passt Wiki.
"},{"location":"network/interfaces_and_networks/#virtio-net-multiqueue","title":"virtio-net multiqueue","text":"Setting the networkInterfaceMultiqueue
to true
will enable the multi-queue functionality, increasing the number of vhost queue, for interfaces configured with a virtio
model.
kind: VM\nspec:\n domain:\n devices:\n networkInterfaceMultiqueue: true\n
Users of a Virtual Machine with multiple vCPUs may benefit of increased network throughput and performance.
Currently, the number of queues is being determined by the number of vCPUs of a VM. This is because multi-queue support optimizes RX interrupt affinity and TX queue selection in order to make a specific queue private to a specific vCPU.
Without enabling the feature, network performance does not scale as the number of vCPUs increases. Guests cannot transmit or retrieve packets in parallel, as virtio-net has only one TX and RX queue.
Virtio interfaces advertise on their status.interfaces.interface entry a field named queueCount. The queueCount field indicates how many queues were assigned to the interface. Queue count value is derived from the domain XML. In case the number of queues can't be determined (i.e interface that is reported by quest-agent only), it will be omitted.
NOTE: Although the virtio-net multiqueue feature provides a performance benefit, it has some limitations and therefore should not be unconditionally enabled
"},{"location":"network/interfaces_and_networks/#some-known-limitations","title":"Some known limitations","text":"Guest OS is limited to ~200 MSI vectors. Each NIC queue requires a MSI vector, as well as any virtio device or assigned PCI device. Defining an instance with multiple virtio NICs and vCPUs might lead to a possibility of hitting the guest MSI limit.
virtio-net multiqueue works well for incoming traffic, but can occasionally cause a performance degradation, for outgoing traffic. Specifically, this may occur when sending packets under 1,500 bytes over the Transmission Control Protocol (TCP) stream.
Enabling virtio-net multiqueue increases the total network throughput, but in parallel it also increases the CPU consumption.
Enabling virtio-net multiqueue in the host QEMU config, does not enable the functionality in the guest OS. The guest OS administrator needs to manually turn it on for each guest NIC that requires this feature, using ethtool.
MSI vectors would still be consumed (wasted), if multiqueue was enabled in the host, but has not been enabled in the guest OS by the administrator.
In case the number of vNICs in a guest instance is proportional to the number of vCPUs, enabling the multiqueue feature is less important.
Each virtio-net queue consumes 64 KiB of kernel memory for the vhost driver.
NOTE: Virtio-net multiqueue should be enabled in the guest OS manually, using ethtool. For example: ethtool -L <NIC> combined #num_of_queues
More information please refer to KVM/QEMU MultiQueue.
"},{"location":"network/interfaces_and_networks/#sriov","title":"sriov","text":"In sriov
mode, virtual machines are directly exposed to an SR-IOV PCI device, usually allocated by Intel SR-IOV device plugin. The device is passed through into the guest operating system as a host device, using the vfio userspace interface, to maintain high networking performance.
To simplify procedure, please use SR-IOV network operator to deploy and configure SR-IOV components in your cluster. On how to use the operator, please refer to their respective documentation.
Note: KubeVirt relies on VFIO userspace driver to pass PCI devices into VMI guest. Because of that, when configuring SR-IOV operator policies, make sure you define a pool of VF resources that uses deviceType: vfio-pci
.
Once the operator is deployed, an SriovNetworkNodePolicy must be provisioned, in which the list of SR-IOV devices to expose (with respective configurations) is defined.
Please refer to the following SriovNetworkNodePolicy
for an example:
apiVersion: sriovnetwork.openshift.io/v1\nkind: SriovNetworkNodePolicy\nmetadata:\n name: policy-1\n namespace: sriov-network-operator\nspec:\n deviceType: vfio-pci\n mtu: 9000\n nicSelector:\n pfNames:\n - ens1f0\n nodeSelector:\n sriov: \"true\"\n numVfs: 8\n priority: 90\n resourceName: sriov-nic\n
The policy above will configure the SR-IOV
device plugin, allowing the PF named ens1f0
to be exposed in the SRIOV capable nodes as a resource named sriov-nic
.
Once all the SR-IOV components are deployed, it is needed to indicate how to configure the SR-IOV network. Refer to the following SriovNetwork
for an example:
apiVersion: sriovnetwork.openshift.io/v1\nkind: SriovNetwork\nmetadata:\n name: sriov-net\n namespace: sriov-network-operator\nspec:\n ipam: |\n {}\n networkNamespace: default\n resourceName: sriov-nic\n spoofChk: \"off\"\n
Finally, to create a VM that will attach to the aforementioned Network, refer to the following VMI spec:
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-perf\n name: vmi-perf\nspec:\n domain:\n cpu:\n sockets: 2\n cores: 1\n threads: 1\n dedicatedCpuPlacement: true\n resources:\n requests:\n memory: \"4Gi\"\n limits:\n memory: \"4Gi\"\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - masquerade: {}\n name: default\n - name: sriov-net\n sriov: {}\n rng: {}\n machine:\n type: \"\"\n networks:\n - name: default\n pod: {}\n - multus:\n networkName: default/sriov-net\n name: sriov-net\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |\n #!/bin/bash\n echo \"centos\" |passwd centos --stdin\n dhclient eth1\n name: cloudinitdisk\n
Note: for some NICs (e.g. Mellanox), the kernel module needs to be installed in the guest VM.
Note: Placement on dedicated CPUs can only be achieved if the Kubernetes CPU manager is running on the SR-IOV capable workers. For further details please refer to the dedicated cpu resources documentation.
"},{"location":"network/interfaces_and_networks/#macvtap","title":"Macvtap","text":"Note: The core binding will be deprecated soon. As an alternative, the same functionality is introduced and available as a binding plugin.
In macvtap
mode, virtual machines are directly exposed to the Kubernetes nodes L2 network. This is achieved by 'extending' an existing network interface with a virtual device that has its own MAC address.
Macvtap interfaces are feature gated; to enable the feature, follow these instructions, in order to activate the Macvtap
feature gate (case sensitive).
Note: On KinD clusters, the user needs to adjust the cluster configuration, mounting dev
of the running host onto the KinD nodes, because of a known issue.
To simplify the procedure, please use the Cluster Network Addons Operator to deploy and configure the macvtap components in your cluster.
The aforementioned operator effectively deploys the macvtap-cni cni / device plugin combo.
There are two different alternatives to configure which host interfaces get exposed to the user, enabling them to create macvtap interfaces on top of:
Both options are configured via the macvtap-deviceplugin-config
ConfigMap, and more information on how to configure it can be found in the macvtap-cni repo.
You can find a minimal example, in which the eth0
interface of the Kubernetes nodes is exposed, via the lowerDevice
attribute.
kind: ConfigMap\napiVersion: v1\nmetadata:\n name: macvtap-deviceplugin-config\ndata:\n DP_MACVTAP_CONF: |\n [\n {\n \"name\" : \"dataplane\",\n \"lowerDevice\": \"eth0\",\n \"mode\" : \"bridge\",\n \"capacity\" : 50\n }\n ]\n
This step can be omitted, since the default configuration of the aforementioned ConfigMap
is to expose all host interfaces (which is represented by the following configuration):
kind: ConfigMap\napiVersion: v1\nmetadata:\n name: macvtap-deviceplugin-config\ndata:\n DP_MACVTAP_CONF: '[]'\n
"},{"location":"network/interfaces_and_networks/#start-a-vm-with-macvtap-interfaces","title":"Start a VM with macvtap interfaces","text":"Once the macvtap components are deployed, it is needed to indicate how to configure the macvtap network. Refer to the following NetworkAttachmentDefinition
for a simple example:
---\nkind: NetworkAttachmentDefinition\napiVersion: k8s.cni.cncf.io/v1\nmetadata:\n name: macvtapnetwork\n annotations:\n k8s.v1.cni.cncf.io/resourceName: macvtap.network.kubevirt.io/eth0\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"macvtapnetwork\",\n \"type\": \"macvtap\",\n \"mtu\": 1500\n }'\n
The requested k8s.v1.cni.cncf.io/resourceName
annotation must point to an exposed host interface (via the lowerDevice
attribute, on the macvtap-deviceplugin-config
ConfigMap
). Finally, to create a VM that will attach to the aforementioned Network, refer to the following VMI spec:
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-host-network\n name: vmi-host-network\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - macvtap: {}\n name: hostnetwork\n rng: {}\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n networks:\n - multus:\n networkName: macvtapnetwork\n name: hostnetwork\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: docker.io/kubevirt/fedora-cloud-container-disk-demo:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #!/bin/bash\n echo \"fedora\" |passwd fedora --stdin\n name: cloudinitdisk\n
The requested multus
networkName
- i.e. macvtapnetwork
- must match the name of the provisioned NetworkAttachmentDefinition
. Note: VMIs with macvtap interfaces can be migrated, but their MAC addresses must be statically set.
"},{"location":"network/interfaces_and_networks/#security","title":"Security","text":""},{"location":"network/interfaces_and_networks/#mac-spoof-check","title":"MAC spoof check","text":"MAC spoofing refers to the ability to generate traffic with an arbitrary source MAC address. An attacker may use this option to generate attacks on the network.
In order to protect against such scenarios, it is possible to enable the mac-spoof-check support in CNI plugins that support it.
The pod primary network which is served by the cluster network provider is not covered by this documentation. Please refer to the relevant provider to check how to enable spoofing check. The following text refers to the secondary networks, served using multus.
There are two known CNI plugins that support mac-spoof-check:
spoofchk
parameter .macspoofchk
parameter.The configuration is to be done on the NetworkAttachmentDefinition by the operator and any interface that refers to it, will have this feature enabled.
Below is an example of using the bridge
CNI with macspoofchk
enabled:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: br-spoof-check\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"br-spoof-check\",\n \"type\": \"bridge\",\n \"bridge\": \"br10\",\n \"disableContainerInterface\": true,\n \"macspoofchk\": true\n }'\n
On the VMI, the network section should point to this NetworkAttachmentDefinition by name:
networks:\n- name: default\n pod: {}\n- multus:\n networkName: br-spoof-check\n name: br10\n
"},{"location":"network/interfaces_and_networks/#limitations_1","title":"Limitations","text":"bridge
CNI supports mac-spoof-check through nftables, therefore the node must support nftables and have the nft
binary deployed.Service mesh allows to monitor, visualize and control traffic between pods. Kubevirt supports running VMs as a part of Istio service mesh.
"},{"location":"network/istio_service_mesh/#limitations","title":"Limitations","text":"Istio service mesh is only supported with a pod network masquerade or passt binding.
Istio uses a list of ports for its own purposes, these ports must not be explicitly specified in a VMI interface.
Istio only supports IPv4.
This guide assumes that Istio is already deployed and uses Istio CNI Plugin. See Istio documentation for more information.
Optionally, istioctl
binary for troubleshooting. See Istio installation inctructions.
The target namespace where the VM is created must be labelled with istio-injection=enabled
label.
If Multus is used to manage CNI, the following NetworkAttachmentDefinition
is required in the application namespace:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: istio-cni\n
The example below specifies a VMI with masquerade network interface and sidecar.istio.io/inject
annotation to register the VM to the service mesh.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n sidecar.istio.io/inject: \"true\"\n labels:\n app: vmi-istio\n name: vmi-istio\nspec:\n domain:\n devices:\n interfaces:\n - name: default\n masquerade: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n resources:\n requests:\n memory: 1024M\n networks:\n - name: default\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: registry:5000/kubevirt/fedora-cloud-container-disk-demo:devel\n
Istio expects each application to be associated with at least one Kubernetes service. Create the following Service exposing port 8080:
apiVersion: v1\nkind: Service\nmetadata:\n name: vmi-istio\nspec:\n selector:\n app: vmi-istio\n ports:\n - port: 8080\n name: http\n protocol: TCP\n
Note: Each Istio enabled VMI must feature the sidecar.istio.io/inject
annotation instructing KubeVirt to perform necessary network configuration.
Verify istio-proxy sidecar is deployed and able to synchronize with Istio control plane using istioctl proxy-status
command. See Istio Debbuging Envoy and Istiod documentation section for more information about proxy-status
subcommand.
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-istio-ncx7r 3/3 Running 0 7s\n\n$ kubectl get pods virt-launcher-vmi-istio-ncx7r -o jsonpath='{.spec.containers[*].name}'\ncompute volumecontainerdisk istio-proxy\n\n$ istioctl proxy-status\nNAME CDS LDS EDS RDS ISTIOD VERSION\n...\nvirt-launcher-vmi-istio-ncx7r.default SYNCED SYNCED SYNCED SYNCED istiod-7c4d8c7757-hshj5 1.10.0\n
"},{"location":"network/istio_service_mesh/#troubleshooting","title":"Troubleshooting","text":""},{"location":"network/istio_service_mesh/#istio-sidecar-is-not-deployed","title":"Istio sidecar is not deployed","text":"$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-istio-jnw6p 2/2 Running 0 37s\n\n$ kubectl get pods virt-launcher-vmi-istio-jnw6p -o jsonpath='{.spec.containers[*].name}'\ncompute volumecontainerdisk\n
Resolution: Make sure the istio-injection=enabled
is added to the target namespace. If the issue persists, consult relevant part of Istio documentation.
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-istio-lg5gp 2/3 Running 0 90s\n\n$ kubectl describe pod virt-launcher-vmi-istio-lg5gp\n ...\n Warning Unhealthy 2d8h (x3 over 2d8h) kubelet Readiness probe failed: Get \"http://10.244.186.222:15021/healthz/ready\": dial tcp 10.244.186.222:15021: connect: no route to host\n Warning Unhealthy 2d8h (x4 over 2d8h) kubelet Readiness probe failed: Get \"http://10.244.186.222:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n
Resolution: Make sure the sidecar.istio.io/inject: \"true\"
annotation is defined in the created VMI and that masquerade or passt binding is used for pod network interface.
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-istio-44mws 0/3 Init:0/3 0 29s\n\n$ kubectl describe pod virt-launcher-vmi-istio-44mws\n ...\n Multus: [default/virt-launcher-vmi-istio-44mws]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: cannot find a network-attachment-definition (istio-cni) in namespace (default): network-attachment-definitions.k8s.cni.cncf.io \"istio-cni\" not found\n
Resolution: Make sure the istio-cni
NetworkAttachmentDefinition (provided in the Prerequisites section) is created in the target namespace.
[v1.1.0, Alpha feature]
A modular plugin which integrates with Kubevirt to implement a network binding.
"},{"location":"network/network_binding_plugins/#overview","title":"Overview","text":""},{"location":"network/network_binding_plugins/#network-connectivity","title":"Network Connectivity","text":"In order for a VM to have access to external network(s), several layers need to be defined and configured, depending on the connectivity characteristics needs.
These layers include:
This guide focuses on the Network Binding portion.
"},{"location":"network/network_binding_plugins/#network-binding","title":"Network Binding","text":"The network binding defines how the domain (VM) network interface is wired in the VM pod through the domain to the guest.
The network binding includes:
The network bindings have been part of Kubevirt core API and codebase. With the increase of the number of network bindings added and frequent requests to tweak and change the existing network bindings, a decision has been made to create a network binding plugin infrastructure.
The plugin infrastructure provides means to compose a network binding plugin and integrate it into Kubevirt in a modular manner.
Kubevirt is providing several network binding plugins as references. The following plugins are available:
A network binding plugin configuration consist of the following steps:
Deploy network binding optional components:
Binding CNI plugin.
Enable NetworkBindingPlugins
Feature Gate (FG).
Register network binding.
Depending on the plugin, some components need to be deployed in the cluster. Not all network binding plugins require all these components, therefore these steps are optional.
This binary needs to be deployed on each node of the cluster, like any other CNI plugin.
The binary can be built from source or consumed from an existing artifact.
Note: The location of the CNI plugins binaries depends on the platform used and its configuration. A frequently used path for such binaries is /opt/cni/bin/
.
Example:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: netbindingpasst\nspec:\n config: '{\n \"cniVersion\": \"1.0.0\",\n \"name\": \"netbindingpasst\",\n \"plugins\": [\n {\n \"type\": \"cni-passt-binding-plugin\"\n }\n ]\n }'\n
Note: It is possible to deploy the NetworkAttachmentDefinition on the default
namespace, where all other namespaces can access it. Nevertheless, it is recommended (for security reasons) to define the NetworkAttachmentDefinition in the same namespace the VM resides.
Multus: In order for the network binding CNI and the NetworkAttachmentDefinition to operate, there is a need to have Multus deployed on the cluster. For more information, check the Quickstart Intallation Guide.
Sidecar image: When a core domain-attachment is not a fit, a sidecar is used to configure the vNIC domain configuration. In a more complex scenarios, the sidecar also runs services like DHCP to deliver IP information to the guest.
The sidecar image is built and usually pushed to an image registry for consumption. Therefore, the cluster needs to have access to the image.
The image can be built from source and pushed to an accessible registry or used from a given registry that already contains it.
NetworkBindingPlugins
.It is therefore necessary to set the FG in the Kubevirt CR.
Example (valid when the FG subtree is already defined):
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/developerConfiguration/featureGates/-\", \"value\": \"NetworkBindingPlugins\"}]'\n
"},{"location":"network/network_binding_plugins/#register","title":"Register","text":"In order to use a network binding plugin, the cluster admin needs to register the binding. Registration includes the addition of the binding name with all its parameters to the Kubevirt CR.
The following (optional) parameters are currently supported:
From: v1.1.0
Use the format to specify the NetworkAttachementDefinition that defines the CNI plugin and the configuration the binding plugin uses. Used when the binding plugin needs to change the pod network namespace."},{"location":"network/network_binding_plugins/#sidecarimage","title":"sidecarImage","text":"
From: v1.1.0
Specify a container image in a registry. Used when the binding plugin needs to modify the domain vNIC configuration or when a service needs to be executed (e.g. DHCP server).
"},{"location":"network/network_binding_plugins/#domainattachmenttype","title":"domainAttachmentType","text":"From: v1.1.1
The Domain Attachment type is a pre-defined core kubevirt method to attach an interface to the domain.
Specify the name of a core domain attachment type. A possible alternative to a sidecar, to configure the domain vNIC.
Supported types:
tap
(from v1.1.1): The domain configuration is set to use an existing tap device. It also supports existing macvtap
devices.When both the domainAttachmentType
and sidecarImage
are specified, the domain will first be configured according to the domainAttachmentType
and then the sidecarImage
may modify it.
From: v1.2.0
Specify whether the network binding plugin supports migration. It is possible to specify a migration method. Supported migration method types: - link-refresh
(from v1.2.0): after migration, the guest nic will be deactivated and then activated again. It can be useful to renew the DHCP lease.
Note: In some deployments the Kubevirt CR is controlled by an external controller (e.g. HCO). In such cases, make sure to configure the wrapper operator/controller so the changes will get preserved.
Example (the passt
binding):
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/network\", \"value\": {\n \"binding\": {\n \"passt\": {\n \"networkAttachmentDefinition\": \"default/netbindingpasst\",\n \"sidecarImage\": \"quay.io/kubevirt/network-passt-binding:20231205_29a16d5c9\"\n \"migration\": {\n \"method\": \"link-refresh\"\n }\n }\n }\n }}]'\n
"},{"location":"network/network_binding_plugins/#vm-network-interface","title":"VM Network Interface","text":"When configuring the VM/VMI network interface, the binding plugin name can be specified. If it exists in the Kubevirt CR, it will be used to setup the network interface.
Example (passt
binding):
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-net-binding-passt\n name: vm-net-binding-passt\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-net-binding-passt\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: passtnet\n binding:\n name: passt\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: passtnet\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.1.0\n name: containerdisk\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n name: cloudinitdisk\n
"},{"location":"network/networkpolicy/","title":"NetworkPolicy","text":"Before creating NetworkPolicy objects, make sure you are using a networking solution which supports NetworkPolicy. Network isolation is controlled entirely by NetworkPolicy objects. By default, all vmis in a namespace are accessible from other vmis and network endpoints. To isolate one or more vmis in a project, you can create NetworkPolicy objects in that namespace to indicate the allowed incoming connections.
Note: vmis and pods are treated equally by network policies, since labels are passed through to the pods which contain the running vmi. With other words, labels on vmis can be matched by spec.podSelector
on the policy.
To make a project \"deny by default\" add a NetworkPolicy object that matches all vmis but accepts no traffic.
kind: NetworkPolicy\napiVersion: networking.k8s.io/v1\nmetadata:\n name: deny-by-default\nspec:\n podSelector: {}\n ingress: []\n
"},{"location":"network/networkpolicy/#create-networkpolicy-to-only-accept-connections-from-vmis-within-namespaces","title":"Create NetworkPolicy to only Accept connections from vmis within namespaces","text":"To make vmis accept connections from other vmis in the same namespace, but reject all other connections from vmis in other namespaces:
kind: NetworkPolicy\napiVersion: networking.k8s.io/v1\nmetadata:\n name: allow-same-namespace\nspec:\n podSelector: {}\n ingress:\n - from:\n - podSelector: {}\n
"},{"location":"network/networkpolicy/#create-networkpolicy-to-only-allow-http-and-https-traffic","title":"Create NetworkPolicy to only allow HTTP and HTTPS traffic","text":"To enable only HTTP and HTTPS access to the vmis, add a NetworkPolicy object similar to:
kind: NetworkPolicy\napiVersion: networking.k8s.io/v1\nmetadata:\n name: allow-http-https\nspec:\n podSelector: {}\n ingress:\n - ports:\n - protocol: TCP\n port: 8080\n - protocol: TCP\n port: 8443\n
"},{"location":"network/networkpolicy/#create-networkpolicy-to-deny-traffic-by-labels","title":"Create NetworkPolicy to deny traffic by labels","text":"To make one specific vmi with a label type: test
to reject all traffic from other vmis, create:
kind: NetworkPolicy\napiVersion: networking.k8s.io/v1\nmetadata:\n name: deny-by-label\nspec:\n podSelector:\n matchLabels:\n type: test\n ingress: []\n
Kubernetes NetworkPolicy Documentation can be found here: Kubernetes NetworkPolicy
"},{"location":"network/service_objects/","title":"Service objects","text":"Once the VirtualMachineInstance is started, in order to connect to a VirtualMachineInstance, you can create a Service
object for a VirtualMachineInstance. Currently, three types of service are supported: ClusterIP
, NodePort
and LoadBalancer
. The default type is ClusterIP
.
Note: Labels on a VirtualMachineInstance are passed through to the pod, so simply add your labels for service creation to the VirtualMachineInstance. From there on it works like exposing any other k8s resource, by referencing these labels in a service.
"},{"location":"network/service_objects/#expose-virtualmachineinstance-as-a-clusterip-service","title":"Expose VirtualMachineInstance as a ClusterIP Service","text":"Give a VirtualMachineInstance with the label special: key
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: vmi-ephemeral\n labels:\n special: key\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n resources:\n requests:\n memory: 64M\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-registry-disk-demo:latest\n
we can expose its SSH port (22) by creating a ClusterIP
service:
apiVersion: v1\nkind: Service\nmetadata:\n name: vmiservice\nspec:\n ports:\n - port: 27017\n protocol: TCP\n targetPort: 22\n selector:\n special: key\n type: ClusterIP\n
You just need to create this ClusterIP
service by using kubectl
:
$ kubectl create -f vmiservice.yaml\n
Alternatively, the VirtualMachineInstance could be exposed using the virtctl
command:
$ virtctl expose virtualmachineinstance vmi-ephemeral --name vmiservice --port 27017 --target-port 22\n
Notes: * If --target-port
is not set, it will be take the same value as --port
* The cluster IP is usually allocated automatically, but it may also be forced into a value using the --cluster-ip
flag (assuming value is in the valid range and not taken)
Query the service object:
$ kubectl get service\nNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\nvmiservice ClusterIP 172.30.3.149 <none> 27017/TCP 2m\n
You can connect to the VirtualMachineInstance by service IP and service port inside the cluster network:
$ ssh cirros@172.30.3.149 -p 27017\n
"},{"location":"network/service_objects/#expose-virtualmachineinstance-as-a-nodeport-service","title":"Expose VirtualMachineInstance as a NodePort Service","text":"Expose the SSH port (22) of a VirtualMachineInstance running on KubeVirt by creating a NodePort
service:
apiVersion: v1\nkind: Service\nmetadata:\n name: nodeport\nspec:\n externalTrafficPolicy: Cluster\n ports:\n - name: nodeport\n nodePort: 30000\n port: 27017\n protocol: TCP\n targetPort: 22\n selector:\n special: key\n type: NodePort\n
You just need to create this NodePort
service by using kubectl
:
$ kubectl -f nodeport.yaml\n
Alternatively, the VirtualMachineInstance could be exposed using the virtctl
command:
$ virtctl expose virtualmachineinstance vmi-ephemeral --name nodeport --type NodePort --port 27017 --target-port 22 --node-port 30000\n
Notes: * If --node-port
is not set, its value will be allocated dynamically (in the range above 30000) * If the --node-port
value is set, it must be unique across all services
The service can be listed by querying for the service objects:
$ kubectl get service\nNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\nnodeport NodePort 172.30.232.73 <none> 27017:30000/TCP 5m\n
Connect to the VirtualMachineInstance by using a node IP and node port outside the cluster network:
$ ssh cirros@$NODE_IP -p 30000\n
"},{"location":"network/service_objects/#expose-virtualmachineinstance-as-a-loadbalancer-service","title":"Expose VirtualMachineInstance as a LoadBalancer Service","text":"Expose the RDP port (3389) of a VirtualMachineInstance running on KubeVirt by creating LoadBalancer
service. Here is an example:
apiVersion: v1\nkind: Service\nmetadata:\n name: lbsvc\nspec:\n externalTrafficPolicy: Cluster\n ports:\n - port: 27017\n protocol: TCP\n targetPort: 3389\n selector:\n special: key\n type: LoadBalancer\n
You could create this LoadBalancer
service by using kubectl
:
$ kubectl -f lbsvc.yaml\n
Alternatively, the VirtualMachineInstance could be exposed using the virtctl
command:
$ virtctl expose virtualmachineinstance vmi-ephemeral --name lbsvc --type LoadBalancer --port 27017 --target-port 3389\n
Note that the external IP of the service could be forced to a value using the --external-ip
flag (no validation is performed on this value).
The service can be listed by querying for the service objects:
$ kubectl get svc\nNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\nlbsvc LoadBalancer 172.30.27.5 172.29.10.235,172.29.10.235 27017:31829/TCP 5s\n
Use vinagre
client to connect your VirtualMachineInstance by using the public IP and port.
Note that here the external port here (31829) was dynamically allocated.
"},{"location":"network/net_binding_plugins/macvtap/","title":"Macvtap binding","text":""},{"location":"network/net_binding_plugins/macvtap/#overview","title":"Overview","text":"With the macvtap
binding plugin, virtual machines are directly exposed to the Kubernetes nodes L2 network. This is achieved by 'extending' an existing network interface with a virtual device that has its own MAC address.
Its main benefits are:
Warning: On KinD clusters, the user needs to adjust the cluster configuration, mounting dev
of the running host onto the KinD nodes, because of a known issue.
The macvtap
solution consists of a CNI and a DP.
In order to use macvtap
, the following points need to be covered:
To simplify the procedure, use the Cluster Network Addons Operator to deploy and configure the macvtap components in your cluster.
The aforementioned operator effectively deploys the macvtap cni and device plugin.
"},{"location":"network/net_binding_plugins/macvtap/#expose-node-interface-to-the-macvtap-device-plugin","title":"Expose node interface to the macvtap device plugin","text":"There are two different alternatives to configure which host interfaces get exposed to the user, enabling them to create macvtap interfaces on top of:
Both options are configured via the macvtap-deviceplugin-config
ConfigMap, and more information on how to configure it can be found in the macvtap-cni repo.
This is a minimal example, in which the eth0
interface of the Kubernetes nodes is exposed, via the lowerDevice
attribute.
kind: ConfigMap\napiVersion: v1\nmetadata:\n name: macvtap-deviceplugin-config\ndata:\n DP_MACVTAP_CONF: |\n [\n {\n \"name\" : \"dataplane\",\n \"lowerDevice\" : \"eth0\",\n \"mode\" : \"bridge\",\n \"capacity\" : 50\n },\n ]\n
This step can be omitted, since the default configuration of the aforementioned ConfigMap
is to expose all host interfaces (which is represented by the following configuration):
kind: ConfigMap\napiVersion: v1\nmetadata:\n name: macvtap-deviceplugin-config\ndata:\n DP_MACVTAP_CONF: '[]'\n
"},{"location":"network/net_binding_plugins/macvtap/#macvtap-networkattachmentdefinition","title":"Macvtap NetworkAttachmentDefinition","text":"The configuration needed for a macvtap network attachment can be minimalistic:
kind: NetworkAttachmentDefinition\napiVersion: k8s.cni.cncf.io/v1\nmetadata:\n name: macvtapnetwork\n annotations:\n k8s.v1.cni.cncf.io/resourceName: macvtap.network.kubevirt.io/eth0\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"macvtapnetwork\",\n \"type\": \"macvtap\",\n \"mtu\": 1500\n }'\n
The object should be created in a \"default\" namespace where all other namespaces can access, or, in the same namespace the VMs reside in.
The requested k8s.v1.cni.cncf.io/resourceName
annotation must point to an exposed host interface (via the lowerDevice
attribute, on the macvtap-deviceplugin-config
ConfigMap
).
[v1.1.1]
The binding plugin replaces the experimental core macvtap binding implementation (including its API).
Note: The network binding plugin infrastructure and the macvtap plugin specifically are in Alpha stage. Please use them with care, preferably on a non-production deployment.
The macvtap binding plugin consists of the following components:
The plugin needs to:
And in detail:
"},{"location":"network/net_binding_plugins/macvtap/#feature-gate","title":"Feature Gate","text":"If not already set, add the NetworkBindingPlugins
FG.
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\n \"op\": \"add\",\n \"path\": \"/spec/configuration/developerConfiguration/featureGates/-\",\n \"value\": \"NetworkBindingPlugins\"\n}]'\n
Note: The specific macvtap plugin has no FG by its own. It is up to the cluster admin to decide if the plugin is to be available in the cluster. The macvtap binding is still in evaluation, use it with care.
"},{"location":"network/net_binding_plugins/macvtap/#macvtap-registration","title":"Macvtap Registration","text":"The macvtap binding plugin configuration needs to be added to the kubevirt CR in order to be used by VMs.
To register the macvtap binding, patch the kubevirt CR as follows:
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/network\", \"value\": {\n \"binding\": {\n \"macvtap\": {\n \"domainAttachmentType\": \"tap\"\n }\n }\n }}]'\n
"},{"location":"network/net_binding_plugins/macvtap/#vm-macvtap-network-interface","title":"VM Macvtap Network Interface","text":"Set the VM network interface binding name to reference the one defined in the kubevirt CR.
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-net-binding-macvtap\n name: vm-net-binding-macvtap\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-net-binding-macvtap\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: podnet\n masquerade: {}\n - name: hostnetwork\n binding:\n name: macvtap\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: podnet\n pod: {}\n - name: hostnetwork\n multus:\n networkName: macvtapnetwork\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.1.0\n name: containerdisk\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n name: cloudinitdisk\n
The multus networkName
value should correspond with the name used in the network attachment definition section.
The binding
value should correspond with the name used in the registration.
Plug A Simple Socket Transport is an enhanced alternative to SLIRP, providing user-space network connectivity.
passt
is a universal tool which implements a translation layer between a Layer-2 network interface and native Layer -4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host.
Its main benefits are:
sysctl -w net.core.rmem_max = 33554432\nsysctl -w net.core.wmem_max = 33554432\n
fs.file-max
should be increased (for a VM forwards all IPv4 and IPv6 ports, for TCP and UDP, passt needs to create ~2^18 sockets): sysctl -w fs.file-max = 9223372036854775807\n
NOTE: To achieve optimal memory consumption with Passt binding, specify ports required for your workload. When no ports are explicitly specified, all ports are forwarded, leading to memory overhead of up to 800 Mi.
"},{"location":"network/net_binding_plugins/passt/#passt-network-binding-plugin","title":"Passt network binding plugin","text":"[v1.1.0]
The binding plugin replaces the experimental core passt binding implementation (including its API).
Note: The network binding plugin infrastructure and the passt plugin specifically are in Alpha stage. Please use them with care, preferably on a non-production deployment.
The passt binding plugin consists of the following components:
As described in the definition & flow section, the passt plugin needs to:
And in detail:
"},{"location":"network/net_binding_plugins/passt/#passt-cni-deployment-on-nodes","title":"Passt CNI deployment on nodes","text":"The CNI plugin binary can be retrieved directly from the kubevirt release assets (on GitHub) or to be built from its sources.
Note: The kubevirt project uses Bazel to build the binaries and container images. For more information in how to build the whole project, visit the developer getting started guide.
Once the binary is ready, you may rename it to a meaningful name (e.g. kubevirt-passt-binding
). This name is used in the NetworkAttachmentDefinition configuration.
Copy the binary to each node in your cluster. The location of the CNI plugins may vary between platforms and versions. One common path is /opt/cni/bin/
.
The configuration needed for passt is minimalistic:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: netbindingpasst\nspec:\n config: '{\n \"cniVersion\": \"1.0.0\",\n \"name\": \"netbindingpasst\",\n \"plugins\": [\n {\n \"type\": \"kubevirt-passt-binding\"\n }\n ]\n }'\n
The object should be created in a \"default\" namespace where all other namespaces can access, or, in the same namespace the VMs reside in.
"},{"location":"network/net_binding_plugins/passt/#passt-sidecar-image","title":"Passt sidecar image","text":"Passt sidecar image is built and pushed to kubevirt quay repository.
The sidecar sources can be found here.
The relevant sidecar image needs to be accessible by the cluster and specified in the Kubevirt CR when registering the network binding plugin.
"},{"location":"network/net_binding_plugins/passt/#feature-gate","title":"Feature Gate","text":"If not already set, add the NetworkBindingPlugins
FG.
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/developerConfiguration/featureGates/-\", \"value\": \"NetworkBindingPlugins\"}]'\n
Note: The specific passt plugin has no FG by its own. It is up to the cluster admin to decide if the plugin is to be available in the cluster. The passt binding is still in evaluation, use it with care.
"},{"location":"network/net_binding_plugins/passt/#passt-registration","title":"Passt Registration","text":"As described in the registration section, passt binding plugin configuration needs to be added to the kubevirt CR.
To register the passt binding, patch the kubevirt CR as follows:
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/network\", \"value\": {\n \"binding\": {\n \"passt\": {\n \"networkAttachmentDefinition\": \"default/netbindingpasst\",\n \"sidecarImage\": \"quay.io/kubevirt/network-passt-binding:20231205_29a16d5c9\",\n \"migration\": {\n \"method\": \"link-refresh\"\n }\n }\n }\n }}]'\n
The NetworkAttachmentDefinition and sidecarImage values should correspond with the names used in the previous sections, here and here.
"},{"location":"network/net_binding_plugins/passt/#vm-passt-network-interface","title":"VM Passt Network Interface","text":"Set the VM network interface binding name to reference the one defined in the kubevirt CR.
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-net-binding-passt\n name: vm-net-binding-passt\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-net-binding-passt\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: passtnet\n binding:\n name: passt\n ports:\n - name: http\n port: 80\n protocol: TCP\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: passtnet\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.1.0\n name: containerdisk\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n name: cloudinitdisk\n
"},{"location":"network/net_binding_plugins/slirp/","title":"Slirp","text":""},{"location":"network/net_binding_plugins/slirp/#overview","title":"Overview","text":"SLIRP provides user-space network connectivity.
Note: in slirp
mode, the only supported protocols are TCP and UDP. ICMP is not supported.
[v1.1.0]
The binding plugin replaces the core slirp
binding API.
Note: The network binding plugin infrastructure is in Alpha stage. Please use them with care.
The slirp binding plugin consists of the following components:
As described in the definition & flow section, the slirp plugin needs to:
Note: In order for the core slirp binding to use the network binding plugin the registered name for this binding should be slirp
.
If not already set, add the NetworkBindingPlugins
FG.
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/developerConfiguration/featureGates/-\", \"value\": \"NetworkBindingPlugins\"}]'\n
Note: The specific slirp plugin has no FG by its own. It is up to the cluster admin to decide if the plugin is to be available in the cluster.
"},{"location":"network/net_binding_plugins/slirp/#slirp-registration","title":"Slirp Registration","text":"As described in the registration section, slirp binding plugin configuration needs to be added to the kubevirt CR.
To register the slirp binding, patch the kubevirt CR as follows:
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/network\", \"value\": {\n \"binding\": {\n \"slirp\": {\n \"sidecarImage\": \"quay.io/kubevirt/network-slirp-binding:v1.1.0\"\n }\n }\n }}]'\n
"},{"location":"network/net_binding_plugins/slirp/#vm-slirp-network-interface","title":"VM Slirp Network Interface","text":"Set the VM network interface binding name to reference the one defined in the kubevirt CR.
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-net-binding-slirp\n name: vm-net-binding-passt\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-net-binding-slirp\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: slirpnet\n binding:\n name: slirp\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: slirpnet\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.1.0\n name: containerdisk\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n name: cloudinitdisk\n
"},{"location":"storage/clone_api/","title":"Clone API","text":"The clone.kubevirt.io
API Group defines resources for cloning KubeVirt objects. Currently, the only supported cloning type is VirtualMachine
, but more types are planned to be supported in the future (see future roadmap below).
Please bear in mind that the clone API is in version v1alpha1
. This means that this API is not fully stable yet and that APIs may change in the future.
Under the hood, the clone API relies upon Snapshot & Restore APIs. Therefore, in order to be able to use the clone API, please see Snapshot & Restore prerequesites.
"},{"location":"storage/clone_api/#snapshot-feature-gate","title":"Snapshot Feature Gate","text":"Currently, clone API is guarded by Snapshot feature gate. The feature gates field in the KubeVirt CR must be expanded by adding the Snapshot
to it.
Firstly, as written above, the clone API relies upon Snapshot & Restore APIs under the hood. Therefore, it might be helpful to look at Snapshot & Restore user-guide page for more info.
"},{"location":"storage/clone_api/#virtualmachineclone-object-overview","title":"VirtualMachineClone object overview","text":"In order to initiate cloning, a VirtualMachineClone
object (CRD) needs to be created on the cluster. An example for such an object is:
kind: VirtualMachineClone\napiVersion: \"clone.kubevirt.io/v1alpha1\"\nmetadata:\n name: testclone\n\nspec:\n # source & target definitions\n source:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: vm-cirros\n target:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: vm-clone-target\n\n # labels & annotations definitions\n labelFilters:\n - \"*\"\n - \"!someKey/*\"\n annotationFilters:\n - \"anotherKey/*\"\n\n # template labels & annotations definitions\n template:\n labelFilters:\n - \"*\"\n - \"!someKey/*\"\n annotationFilters:\n - \"anotherKey/*\"\n\n # other identity stripping specs:\n newMacAddresses:\n interfaceName: \"00-11-22\"\n newSMBiosSerial: \"new-serial\"\n
In the next section I will go through the different settings to elaborate them.
"},{"location":"storage/clone_api/#source-target","title":"Source & Target","text":"The source and target indicate the source/target API group, kind and name. A few important notes:
Currently, the only supported kinds are VirtualMachine
(of kubevirt.io
api group) and VirtualMachineSnapshot
( of snapshot.kubevirt.io
api group), but more types are expected to be supported in the future. See \"future roadmap\" below for more info.
The target name is optional. If unspecified, the clone controller will generate a name for the target automatically.
The target and source must reside in the same namespace.
These spec fields are intended to determine which labels / annotations are being copied to the target or stripped away.
The filters are a list of strings. Each string represents a key that may exist at the source. Every source key that matches to one of these values is being copied to the cloned target. In addition, special regular-expression-like characters can be used:
Setting label / annotation filters is optional. If unset, all labels / annotations will be copied as a default.
"},{"location":"storage/clone_api/#template-label-template-annotation-filters","title":"Template Label & Template Annotation filters","text":"Some network CNIs such as Kube-OVN or OVN-Kubernetes inject network information into the annotations of a VM. When cloning a VM from a target VM the cloned VM will use the same network. To avoid this you can use template labels and annotation filters.
"},{"location":"storage/clone_api/#newmacaddresses","title":"newMacAddresses","text":"This field is used to explicitly replace MAC addresses for certain interfaces. The field is a string to string map; the keys represent interface names and the values represent the new MAC address for the clone target.
This field is optional. By default, all mac addresses are stripped out. This suits situations when kube-mac-pool is deployed in the cluster which would automatically assign the target with a fresh valid MAC address.
"},{"location":"storage/clone_api/#newsmbiosserial","title":"newSMBiosSerial","text":"This field is used to explicitly set an SMBios serial for the target.
This field is optional. By default, the target would have an auto-generated serial that's based on the VM name.
"},{"location":"storage/clone_api/#creating-a-virtualmachineclone-object","title":"Creating a VirtualMachineClone object","text":"After the clone manifest is ready, we can create it:
kubectl create -f clone.yaml\n
To wait for a clone to complete, execute:
kubectl wait vmclone testclone --for condition=Ready\n
You can check the clone's phase in the clone's status. It can be one of the following:
SnapshotInProgress
CreatingTargetVM
RestoreInProgress
Succeeded
Failed
Unknown
After the clone is finished, the target can be inspected:
kubectl get vm vm-clone-target -o yaml\n
"},{"location":"storage/clone_api/#future-roadmap","title":"Future roadmap","text":"The clone API is in an early alpha version and may change dramatically. There are many improvements and features that are expected to be added, the most significant goals are:
VirtualMachineInstace
in the future.One of the great things that could be accomplished with the clone API when the source is of kind VirtualMachineSnapshot
is to create \"golden VM images\" (a.k.a. Templates / Bookmark VMs / etc). In other words, the following workflow would be available:
Create a golden image
Create a VM
Prepare a \"golden VM\" environment
This can mean different things in different contexts. For example, write files, install applications, apply configurations, or anything else.
Snapshot the VM
Delete the VM
Then, this \"golden image\" can be duplicated as many times as needed. To instantiate a VM from the snapshot:
This feature is still under discussions and may be implemented differently then explained here.
"},{"location":"storage/containerized_data_importer/","title":"Containerized Data Importer","text":"The Containerized Data Importer (CDI) project provides facilities for enabling Persistent Volume Claims (PVCs) to be used as disks for KubeVirt VMs by way of DataVolumes. The three main CDI use cases are:
This document deals with the third use case. So you should have CDI installed in your cluster, a VM disk that you'd like to upload, and virtctl in your path.
"},{"location":"storage/containerized_data_importer/#install-cdi","title":"Install CDI","text":"Install the latest CDI release here
export TAG=$(curl -s -w %{redirect_url} https://github.com/kubevirt/containerized-data-importer/releases/latest)\nexport VERSION=$(echo ${TAG##*/})\nkubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-operator.yaml\nkubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-cr.yaml\n
"},{"location":"storage/containerized_data_importer/#expose-cdi-uploadproxy-service","title":"Expose cdi-uploadproxy service","text":"The cdi-uploadproxy
service must be accessible from outside the cluster. Here are some ways to do that:
NodePort Service
Ingress
Route
kubectl port-forward (not recommended for production clusters)
Look here for example manifests.
"},{"location":"storage/containerized_data_importer/#supported-image-formats","title":"Supported image formats","text":"CDI supports the raw
and qcow2
image formats which are supported by qemu. See the qemu documentation for more details. Bootable ISO images can also be used and are treated like raw
images. Images may be compressed with either the gz
or xz
format.
The example in this document uses this CirrOS image
"},{"location":"storage/containerized_data_importer/#virtctl-image-upload","title":"virtctl image-upload","text":"virtctl has an image-upload command with the following options:
virtctl image-upload --help\nUpload a VM image to a DataVolume/PersistentVolumeClaim.\n\nUsage:\n virtctl image-upload [flags]\n\nExamples:\n # Upload a local disk image to a newly created DataVolume:\n virtctl image-upload dv dv-name --size=10Gi --image-path=/images/fedora30.qcow2\n\n # Upload a local disk image to an existing DataVolume\n virtctl image-upload dv dv-name --no-create --image-path=/images/fedora30.qcow2\n\n # Upload a local disk image to an existing PersistentVolumeClaim\n virtctl image-upload pvc pvc-name --image-path=/images/fedora30.qcow2\n\n # Upload to a DataVolume with explicit URL to CDI Upload Proxy\n virtctl image-upload dv dv-name --uploadproxy-url=https://cdi-uploadproxy.mycluster.com --image-path=/images/fedora30.qcow2\n\nFlags:\n --access-mode string The access mode for the PVC. (default \"ReadWriteOnce\")\n --block-volume Create a PVC with VolumeMode=Block (default Filesystem).\n -h, --help help for image-upload\n --image-path string Path to the local VM image.\n --insecure Allow insecure server connections when using HTTPS.\n --no-create Don't attempt to create a new DataVolume/PVC.\n --pvc-name string DEPRECATED - The destination DataVolume/PVC name.\n --pvc-size string DEPRECATED - The size of the PVC to create (ex. 10Gi, 500Mi).\n --size string The size of the DataVolume to create (ex. 10Gi, 500Mi).\n --storage-class string The storage class for the PVC.\n --uploadproxy-url string The URL of the cdi-upload proxy service.\n --wait-secs uint Seconds to wait for upload pod to start. (default 60)\n\nUse \"virtctl options\" for a list of global command-line options (applies to all commands).\n
virtctl image-upload
works by creating a DataVolume of the requested size, sending an UploadTokenRequest
to the cdi-apiserver
, and uploading the file to the cdi-uploadproxy
.
virtctl image-upload dv cirros-vm-disk --size=500Mi --image-path=/home/mhenriks/images/cirros-0.4.0-x86_64-disk.img --uploadproxy-url=<url to upload proxy service>\n
"},{"location":"storage/containerized_data_importer/#addressing-certificate-issues-when-uploading-images","title":"Addressing Certificate Issues when Uploading Images","text":"Issues with the certificates can be circumvented by using the --insecure
flag to prevent the virtctl command from verifying the remote host. It is better to resolve certificate issues that prevent uploading images using the virtctl image-upload
command and not use the --insecure
flag.
The following are some common issues with certificates and some easy ways to fix them.
"},{"location":"storage/containerized_data_importer/#does-not-contain-any-ip-sans","title":"Does not contain any IP SANs","text":"This issue happens when trying to upload images using an IP address instead of a resolvable name. For example, trying to upload to the IP address 192.168.39.32 at port 31001 would produce the following error.
virtctl image-upload dv f33 \\\n --size 5Gi \\\n --image-path Fedora-Cloud-Base-33-1.2.x86_64.raw.xz \\\n --uploadproxy-url https://192.168.39.32:31001\n\nPVC default/f33 not found \nDataVolume default/f33 created\nWaiting for PVC f33 upload pod to be ready...\nPod now ready\nUploading data to https://192.168.39.32:31001\n\n 0 B / 193.89 MiB [-------------------------------------------------------] 0.00% 0s\n\nPost https://192.168.39.32:31001/v1beta1/upload: x509: cannot validate certificate for 192.168.39.32 because it doesn't contain any IP SANs\n
It is easily fixed by adding an entry it your local name resolution service. This could be a DNS server or the local hosts file. The URL used to upload the proxy should be changed to reflect the resolvable name.
The Subject
and the Subject Alternative Name
in the certificate contain valid names that can be used for resolution. Only one of these names needs to be resolvable. Use the openssl
command to view the names of the cdi-uploadproxy service.
echo | openssl s_client -showcerts -connect 192.168.39.32:31001 2>/dev/null \\\n | openssl x509 -inform pem -noout -text \\\n | sed -n -e '/Subject.*CN/p' -e '/Subject Alternative/{N;p}'\n\n Subject: CN = cdi-uploadproxy\n X509v3 Subject Alternative Name: \n DNS:cdi-uploadproxy, DNS:cdi-uploadproxy.cdi, DNS:cdi-uploadproxy.cdi.svc\n
Adding the following entry to the /etc/hosts file, if it provides name resolution, should fix this issue. Any service that provides name resolution for the system could be used.
echo \"192.168.39.32 cdi-uploadproxy\" >> /etc/hosts\n
The upload should now work.
virtctl image-upload dv f33 \\\n --size 5Gi \\\n --image-path Fedora-Cloud-Base-33-1.2.x86_64.raw.xz \\\n --uploadproxy-url https://cdi-uploadproxy:31001\n\nPVC default/f33 not found \nDataVolume default/f33 created\nWaiting for PVC f33 upload pod to be ready...\nPod now ready\nUploading data to https://cdi-uploadproxy:31001\n\n 193.89 MiB / 193.89 MiB [=============================================] 100.00% 1m38s\n\nUploading data completed successfully, waiting for processing to complete, you can hit ctrl-c without interrupting the progress\nProcessing completed successfully\nUploading Fedora-Cloud-Base-33-1.2.x86_64.raw.xz completed successfully\n
"},{"location":"storage/containerized_data_importer/#certificate-signed-by-unknown-authority","title":"Certificate Signed by Unknown Authority","text":"This happens because the cdi-uploadproxy certificate is self signed and the system does not trust the cdi-uploadproxy as a Certificate Authority.
virtctl image-upload dv f33 \\\n --size 5Gi \\\n --image-path Fedora-Cloud-Base-33-1.2.x86_64.raw.xz \\\n --uploadproxy-url https://cdi-uploadproxy:31001\n\nPVC default/f33 not found \nDataVolume default/f33 created\nWaiting for PVC f33 upload pod to be ready...\nPod now ready\nUploading data to https://cdi-uploadproxy:31001\n\n 0 B / 193.89 MiB [-------------------------------------------------------] 0.00% 0s\n\nPost https://cdi-uploadproxy:31001/v1beta1/upload: x509: certificate signed by unknown authority\n
This can be fixed by adding the certificate to the systems trust store. Download the cdi-uploadproxy-server-cert.
kubectl get secret -n cdi cdi-uploadproxy-server-cert \\\n -o jsonpath=\"{.data['tls\\.crt']}\" \\\n | base64 -d > cdi-uploadproxy-server-cert.crt\n
Add this certificate to the systems trust store. On Fedora, this can be done as follows.
sudo cp cdi-uploadproxy-server-cert.crt /etc/pki/ca-trust/source/anchors\n\nsudo update-ca-trust\n
The upload should now work.
virtctl image-upload dv f33 \\\n --size 5Gi \\\n --image-path Fedora-Cloud-Base-33-1.2.x86_64.raw.xz \\\n --uploadproxy-url https://cdi-uploadproxy:31001\n\nPVC default/f33 not found \nDataVolume default/f33 created\nWaiting for PVC f33 upload pod to be ready...\nPod now ready\nUploading data to https://cdi-uploadproxy:31001\n\n 193.89 MiB / 193.89 MiB [=============================================] 100.00% 1m36s\n\nUploading data completed successfully, waiting for processing to complete, you can hit ctrl-c without interrupting the progress\nProcessing completed successfully\nUploading Fedora-Cloud-Base-33-1.2.x86_64.raw.xz completed successfully\n
"},{"location":"storage/containerized_data_importer/#setting-the-url-of-the-cdi-upload-proxy-service","title":"Setting the URL of the cdi-upload Proxy Service","text":"Setting the URL for the cdi-upload proxy service allows the virtctl image-upload
command to upload the images without specifying the --uploadproxy-url
flag. Permanently setting the URL is done by patching the CDI configuration.
The following will set the default upload proxy to use port 31001 of cdi-uploadproxy. An IP address could also be used instead of the dns name.
See the section Addressing Certificate Issues when Uploading for why cdi-uploadproxy was chosen and issues that can be encountered when using an IP address.
kubectl patch cdi cdi \\\n --type merge \\\n --patch '{\"spec\":{\"config\":{\"uploadProxyURLOverride\":\"https://cdi-uploadproxy:31001\"}}}'\n
"},{"location":"storage/containerized_data_importer/#create-a-virtualmachineinstance","title":"Create a VirtualMachineInstance","text":"To create a VirtualMachineInstance
from a DataVolume, you can execute the following:
cat <<EOF | kubectl apply -f -\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: cirros-vm\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: dvdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: dvdisk\n dataVolume:\n name: cirros-vm-disk\nstatus: {}\nEOF\n
"},{"location":"storage/containerized_data_importer/#connect-to-virtualmachineinstance-console","title":"Connect to VirtualMachineInstance console","text":"Use virtctl
to connect to the newly create VirtualMachineInstance
.
virtctl console cirros-vm\n
"},{"location":"storage/disks_and_volumes/","title":"Filesystems, Disks and Volumes","text":"Making persistent storage in the cluster (volumes) accessible to VMs consists of three parts. First, volumes are specified in spec.volumes
. Second, disks are added to the VM by specifying them in spec.domain.devices.disks
. Finally, a reference to the specified volume is added to the disk specification by name.
Like all other vmi devices a spec.domain.devices.disks
element has a mandatory name
, and furthermore, the disk's name
must reference the name
of a volume inside spec.volumes
.
A disk can be made accessible via four different types:
lun
disk
cdrom
fileystems
All possible configuration options are available in the Disk API Reference.
All types allow you to specify the bus
attribute. The bus
attribute determines how the disk will be presented to the guest operating system.
A lun
disk will expose the volume as a LUN device to the VM. This allows the VM to execute arbitrary iSCSI command passthrough.
A minimal example which attaches a PersistentVolumeClaim
named mypvc
as a lun
device to the VM:
metadata:\n name: testvmi-lun\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n # This makes it a lun device\n lun: {}\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#persistent-reservation","title":"persistent reservation","text":"It is possible to reserve a LUN through the the SCSI Persistent Reserve commands. In order to issue privileged SCSI ioctls, the VM requires activation of the persistent resevation flag:
devices:\n disks:\n - name: mypvcdisk\n lun:\n reservation: true\n
This feature is enabled by the feature gate PersistentReservation
:
configuration:\n developerConfiguration:\n featureGates:\n - PersistentReservation\n
Note: The persistent reservation feature enables an additional privileged component to be deployed together with virt-handler. Because this feature allows for sensitive security procedures, it is disabled by default and requires cluster administrator configuration.
"},{"location":"storage/disks_and_volumes/#disk","title":"disk","text":"A disk
disk will expose the volume as an ordinary disk to the VM.
A minimal example which attaches a PersistentVolumeClaim
named mypvc
as a disk
device to the VM:
metadata:\n name: testvmi-disk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n # This makes it a disk\n disk: {}\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
You can set the disk bus
type, overriding the defaults, which in turn depends on the chipset the VM is configured to use:
metadata:\n name: testvmi-disk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n # This makes it a disk\n disk:\n # This makes it exposed as /dev/vda, being the only and thus first\n # disk attached to the VM\n bus: virtio\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#cdrom","title":"cdrom","text":"A cdrom
disk will expose the volume as a cdrom drive to the VM. It is read-only by default.
A minimal example which attaches a PersistentVolumeClaim
named mypvc
as a cdrom
device to the VM:
metadata:\n name: testvmi-cdrom\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n # This makes it a cdrom\n cdrom:\n # This makes the cdrom writeable\n readOnly: false\n # This makes the cdrom be exposed as SATA device\n bus: sata\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#filesystems","title":"filesystems","text":"A filesystem
device will expose the volume as a filesystem to the VM. filesystems
rely on virtiofs
to make visible external filesystems to KubeVirt
VMs. Further information about virtiofs
can be found at the Official Virtiofs Site.
Compared with disk
, filesystems
allow changes in the source to be dynamically reflected in the volumes inside the VM. For instance, if a given configMap
is shared with filesystems
any change made on it will be reflected in the VMs. However, it is important to note that filesystems
do not allow live migration.
Additionally, filesystem
devices must be mounted inside the VM. This can be done through cloudInitNoCloud or manually connecting to the VM shell and targeting the same command. The main challenge is to understand how the device tag used to identify the new filesystem and mount it with the mount -t virtiofs [device tag] [path]
command. For that purpose, the tag is assigned to the filesystem in the VM spec spec.domain.devices.filesystems.name
. For instance, if in a given VM spec is spec.domain.devices.filesystems.name: foo
, the required command inside the VM to mount the filesystem in the /tmp/foo
path will be mount -t virtiofs foo /tmp/foo
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-filesystems\nspec:\n domain:\n devices:\n filesystems:\n - name: foo\n virtiofs: {}\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk \n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n bootcmd:\n - \"sudo mkdir /tmp/foo\"\n - \"sudo mount -t virtiofs foo /tmp/foo\"\n - persistentVolumeClaim:\n claimName: mypvc\n name: foo\n
Note: As stated, filesystems
rely on virtiofs
. Moreover, virtiofs
requires kernel linux support to work in the VM. To check if the linux image of the VM has the required support, you can address the following command: modprobe virtiofs
. If the command output is modprobe: FATAL: Module virtiofs not found
, the linux image of the VM does not support virtiofs. Also, you can check if the kernel version is up to 5.4 in any linux distribution or up to 4.18 in centos/rhel. To check this, you can target the following command: uname -r
.
Refer to section Sharing Directories with VMs for usage examples of filesystems
.
The error policy controls how the hypervisor should behave when an IO error occurs on a disk read or write. The default behaviour is to stop the guest and a Kubernetes event is generated. However, it is possible to change the value to either:
report
: the error is reported in the guestignore
: the error is ignored, but the read/write failure goes undetectedenospace
: error when there isn't enough space on the diskThe error policy can be specified per disk or lun.
Example:
spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n errorPolicy: \"report\"\n - lun:\n bus: scsi\n name: scsi-disk\n errorPolicy: \"report\"\n
"},{"location":"storage/disks_and_volumes/#volumes","title":"Volumes","text":"Supported volume sources are
cloudInitNoCloud
cloudInitConfigDrive
persistentVolumeClaim
dataVolume
ephemeral
containerDisk
emptyDisk
hostDisk
configMap
secret
serviceAccount
downwardMetrics
All possible configuration options are available in the Volume API Reference.
"},{"location":"storage/disks_and_volumes/#cloudinitnocloud","title":"cloudInitNoCloud","text":"Allows attaching cloudInitNoCloud
data-sources to the VM. If the VM contains a proper cloud-init setup, it will pick up the disk as a user-data source.
A simple example which attaches a Secret
as a cloud-init disk
datasource may look like this:
metadata:\n name: testvmi-cloudinitnocloud\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mybootdisk\n lun: {}\n - name: mynoclouddisk\n disk: {}\n volumes:\n - name: mybootdisk\n persistentVolumeClaim:\n claimName: mypvc\n - name: mynoclouddisk\n cloudInitNoCloud:\n secretRef:\n name: testsecret\n
"},{"location":"storage/disks_and_volumes/#cloudinitconfigdrive","title":"cloudInitConfigDrive","text":"Allows attaching cloudInitConfigDrive
data-sources to the VM. If the VM contains a proper cloud-init setup, it will pick up the disk as a user-data source.
A simple example which attaches a Secret
as a cloud-init disk
datasource may look like this:
metadata:\n name: testvmi-cloudinitconfigdrive\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mybootdisk\n lun: {}\n - name: myconfigdrivedisk\n disk: {}\n volumes:\n - name: mybootdisk\n persistentVolumeClaim:\n claimName: mypvc\n - name: myconfigdrivedisk\n cloudInitConfigDrive:\n secretRef:\n name: testsecret\n
The cloudInitConfigDrive
can also be used to configure VMs with Ignition. You just need to replace the cloud-init data by the Ignition data.
Allows connecting a PersistentVolumeClaim
to a VM disk.
Use a PersistentVolumeClaim when the VirtualMachineInstance's disk needs to persist after the VM terminates. This allows for the VM's data to remain persistent between restarts.
A PersistentVolume
can be in \"filesystem\" or \"block\" mode:
Filesystem: For KubeVirt to be able to consume the disk present on a PersistentVolume's filesystem, the disk must be named disk.img
and be placed in the root path of the filesystem. Currently the disk is also required to be in raw format. > Important: The disk.img
image file needs to be owned by the user-id 107
in order to avoid permission issues.
Note: If the disk.img
image file has not been created manually before starting a VM then it will be created automatically with the PersistentVolumeClaim
size. Since not every storage provisioner provides volumes with the exact usable amount of space as requested (e.g. due to filesystem overhead), KubeVirt tolerates up to 10% less available space. This can be configured with the developerConfiguration.pvcTolerateLessSpaceUpToPercent
value in the KubeVirt CR (kubectl edit kubevirt kubevirt -n kubevirt
).
Block: Use a block volume for consuming raw block devices. Note: you need to enable the BlockVolume
feature gate.
A simple example which attaches a PersistentVolumeClaim
as a disk
may look like this:
metadata:\n name: testvmi-pvc\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#thick-and-thin-volume-provisioning","title":"Thick and thin volume provisioning","text":"Sparsification can make a disk thin-provisioned, in other words it allows to convert the freed space within the disk image into free space back on the host. The fstrim utility can be used on a mounted filesystem to discard the blocks not used by the filesystem. In order to be able to sparsify a disk inside the guest, the disk needs to be configured in the libvirt xml with the option discard=unmap
. In KubeVirt, every disk is passed as default with this option enabled. It is possible to check if the trim configuration is supported in the guest by runninglsblk -D
, and check the discard options supported on every disk.
Example:
$ lsblk -D\nNAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO\nloop0 0 4K 4G 0\nloop1 0 64K 4M 0\nsr0 0 0B 0B 0\nrbd0 0 64K 4M 0\nvda 512 512B 2G 0\n\u2514\u2500vda1 0 512B 2G 0\n
However, in certain cases like preallocaton or when the disk is thick provisioned, the option needs to be disabled. The disk's PVC has to be marked with an annotation that contains /storage.preallocation
or /storage.thick-provisioned
, and set to true. If the volume is preprovisioned using CDI and the preallocation is enabled, then the PVC is automatically annotated with: cdi.kubevirt.io/storage.preallocation: true
and the discard passthrough option is disabled.
Example of a PVC definition with the annotation to disable discard passthrough:
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: pvc\n annotations:\n user.custom.annotation/storage.thick-provisioned: \"true\"\nspec:\n storageClassName: local\n accessModes:\n - ReadWriteOnce\n volumeMode: Filesystem\n resources:\n requests:\n storage: 1Gi\n
"},{"location":"storage/disks_and_volumes/#disk-expansion","title":"disk expansion","text":"For some storage methods, Kubernetes may support expanding storage in-use (allowVolumeExpansion feature). KubeVirt can respond to it by making the additional storage available for the virtual machines. This feature is currently off by default, and requires enabling a feature gate. To enable it, add the ExpandDisks feature gate in the kubevirt object:
spec:\n configuration:\n developerConfiguration:\n featureGates:\n - ExpandDisks\n
Enabling this feature does two things: - Notify the virtual machine about size changes - If the disk is a Filesystem PVC, the matching file is expanded to the remaining size (while reserving some space for file system overhead).
"},{"location":"storage/disks_and_volumes/#statically-provisioned-block-pvcs","title":"Statically provisioned block PVCs","text":"To use an externally managed local block device from a host ( e.g. /dev/sdb , zvol, LVM, etc... ) in a VM directly, you would need a provisioner that supports block devices, such as OpenEBS LocalPV.
Alternatively, local volumes can be provisioned by hand. I.e. the following PVC:
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: myblock\nspec:\n storageClassName: local-device\n volumeMode: Block\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 100Gi\n
can claim a PersistentVolume pre-created by a cluster admin like so:
apiVersion: storage.k8s.io/v1\nkind: StorageClass\nmetadata:\n name: local-device\nprovisioner: kubernetes.io/no-provisioner\n---\napiVersion: v1\nkind: PersistentVolume\nmetadata:\n name: myblock\nspec:\n volumeMode: Block\n storageClassName: local-device\n nodeAffinity:\n required:\n nodeSelectorTerms:\n - matchExpressions:\n - key: kubernetes.io/hostname\n operator: In\n values:\n - my-node\n accessModes:\n - ReadWriteOnce\n capacity:\n storage: 100Gi\n local:\n path: /dev/sdb\n
"},{"location":"storage/disks_and_volumes/#datavolume","title":"dataVolume","text":"DataVolumes are a way to automate importing virtual machine disks onto PVCs during the virtual machine's launch flow. Without using a DataVolume, users have to prepare a PVC with a disk image before assigning it to a VM or VMI manifest. With a DataVolume, both the PVC creation and import is automated on behalf of the user.
"},{"location":"storage/disks_and_volumes/#datavolume-vm-behavior","title":"DataVolume VM Behavior","text":"DataVolumes can be defined in the VM spec directly by adding the DataVolumes to the dataVolumeTemplates
list. Below is an example.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-alpine-datavolume\n name: vm-alpine-datavolume\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-alpine-datavolume\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: datavolumedisk1\n resources:\n requests:\n memory: 64M\n volumes:\n - dataVolume:\n name: alpine-dv\n name: datavolumedisk1\n dataVolumeTemplates:\n - metadata:\n name: alpine-dv\n spec:\n storage:\n resources:\n requests:\n storage: 2Gi\n source:\n http:\n url: http://cdi-http-import-server.kubevirt/images/alpine.iso\n
You can see the DataVolume defined in the dataVolumeTemplates section has two parts. The source and pvc
The source part declares that there is a disk image living on an http server that we want to use as a volume for this VM. The pvc part declares the spec that should be used to create the PVC that hosts the source data.
When this VM manifest is posted to the cluster, as part of the launch flow a PVC will be created using the spec provided and the source data will be automatically imported into that PVC before the VM starts. When the VM is deleted, the storage provisioned by the DataVolume will automatically be deleted as well.
"},{"location":"storage/disks_and_volumes/#datavolume-vmi-behavior","title":"DataVolume VMI Behavior","text":"For a VMI object, DataVolumes can be referenced as a volume source for the VMI. When this is done, it is expected that the referenced DataVolume exists in the cluster. The VMI will consume the DataVolume, but the DataVolume's life-cycle will not be tied to the VMI.
Below is an example of a DataVolume being referenced by a VMI. It is expected that the DataVolume alpine-datavolume was created prior to posting the VMI manifest to the cluster. It is okay to post the VMI manifest to the cluster while the DataVolume is still having data imported. KubeVirt knows not to start the VMI until all referenced DataVolumes have finished their clone and import phases.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-alpine-datavolume\n name: vmi-alpine-datavolume\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: disk1\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: disk1\n dataVolume:\n name: alpine-datavolume\n
"},{"location":"storage/disks_and_volumes/#enabling-datavolume-support","title":"Enabling DataVolume support.","text":"A DataVolume is a custom resource provided by the Containerized Data Importer (CDI) project. KubeVirt integrates with CDI in order to provide users a workflow for dynamically creating PVCs and importing data into those PVCs.
In order to take advantage of the DataVolume volume source on a VM or VMI, CDI must be installed.
Installing CDI
Go to the CDI release page
Pick the latest stable release and post the corresponding cdi-controller-deployment.yaml manifest to your cluster.
"},{"location":"storage/disks_and_volumes/#ephemeral","title":"ephemeral","text":"An ephemeral volume is a local COW (copy on write) image that uses a network volume as a read-only backing store. With an ephemeral volume, the network backing store is never mutated. Instead all writes are stored on the ephemeral image which exists on local storage. KubeVirt dynamically generates the ephemeral images associated with a VM when the VM starts, and discards the ephemeral images when the VM stops.
Ephemeral volumes are useful in any scenario where disk persistence is not desired. The COW image is discarded when VM reaches a final state (e.g., succeeded, failed).
Currently, only PersistentVolumeClaim
may be used as a backing store of the ephemeral volume.
Up-to-date information on supported backing stores can be found in the KubeVirt API.
metadata:\n name: testvmi-ephemeral-pvc\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n volumes:\n - name: mypvcdisk\n ephemeral:\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#containerdisk","title":"containerDisk","text":"containerDisk was originally registryDisk, please update your code when needed.
The containerDisk
feature provides the ability to store and distribute VM disks in the container image registry. containerDisks
can be assigned to VMs in the disks section of the VirtualMachineInstance spec.
No network shared storage devices are utilized by containerDisks
. The disks are pulled from the container registry and reside on the local node hosting the VMs that consume the disks.
containerDisks
are ephemeral storage devices that can be assigned to any number of active VirtualMachineInstances. This makes them an ideal tool for users who want to replicate a large number of VM workloads that do not require persistent data. containerDisks
are commonly used in conjunction with VirtualMachineInstanceReplicaSets.
containerDisks
are not a good solution for any workload that requires persistent root disks across VM restarts.
Users can inject a VirtualMachineInstance disk into a container image in a way that is consumable by the KubeVirt runtime. Disks must be placed into the /disk
directory inside the container. Raw and qcow2 formats are supported. Qcow2 is recommended in order to reduce the container image's size. containerdisks
can and should be based on scratch
. No content except the image is required.
Note: Prior to kubevirt 0.20, the containerDisk image needed to have kubevirt/container-disk-v1alpha as base image.
Note: The containerDisk needs to be readable for the user with the UID 107 (qemu).
Example: Inject a local VirtualMachineInstance disk into a container image.
cat << END > Dockerfile\nFROM scratch\nADD --chown=107:107 fedora25.qcow2 /disk/\nEND\n\ndocker build -t vmidisks/fedora25:latest .\n
Example: Inject a remote VirtualMachineInstance disk into a container image.
cat << END > Dockerfile\nFROM scratch\nADD --chown=107:107 https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2 /disk/\nEND\n
Example: Upload the ContainerDisk container image to a registry.
docker push vmidisks/fedora25:latest\n
Example: Attach the ContainerDisk as an ephemeral disk to a VM.
metadata:\n name: testvmi-containerdisk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk: {}\n volumes:\n - name: containerdisk\n containerDisk:\n image: vmidisks/fedora25:latest\n
Note that a containerDisk
is file-based and therefore cannot be attached as a lun
device to the VM.
ContainerDisk also allows to store disk images in any folder, when required. The process is the same as previous. The main difference is, that in custom location, kubevirt does not scan for any image. It is your responsibility to provide full path for the disk image. Providing image path
is optional. When no path
is provided, kubevirt searches for disk images in default location: /disk
.
Example: Build container disk image:
cat << END > Dockerfile\nFROM scratch\nADD fedora25.qcow2 /custom-disk-path/fedora25.qcow2\nEND\n\ndocker build -t vmidisks/fedora25:latest .\ndocker push vmidisks/fedora25:latest\n
Create VMI with container disk pointing to the custom location:
metadata:\n name: testvmi-containerdisk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk: {}\n volumes:\n - name: containerdisk\n containerDisk:\n image: vmidisks/fedora25:latest\n path: /custom-disk-path/fedora25.qcow2\n
"},{"location":"storage/disks_and_volumes/#emptydisk","title":"emptyDisk","text":"An emptyDisk
works similar to an emptyDir
in Kubernetes. An extra sparse qcow2
disk will be allocated and it will live as long as the VM. Thus it will survive guest side VM reboots, but not a VM re-creation. The disk capacity
needs to be specified.
Example: Boot cirros with an extra emptyDisk
with a size of 2GiB
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: emptydisk\n disk:\n bus: virtio\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-registry-disk-demo:latest\n - name: emptydisk\n emptyDisk:\n capacity: 2Gi\n
"},{"location":"storage/disks_and_volumes/#when-to-use-an-emptydisk","title":"When to use an emptyDisk","text":"Ephemeral VMs very often come with read-only root images and limited tmpfs space. In many cases this is not enough to install application dependencies and provide enough disk space for the application data. While this data is not critical and thus can be lost, it is still needed for the application to function properly during its lifetime. This is where an emptyDisk
can be useful. An emptyDisk is often used and mounted somewhere in /var/lib
or /var/run
.
A hostDisk
volume type provides the ability to create or use a disk image located somewhere on a node. It works similar to a hostPath
in Kubernetes and provides two usage types:
DiskOrCreate
if a disk image does not exist at a given location then create one
Disk
a disk image must exist at a given location
Note: you need to enable the HostDisk feature gate.
Example: Create a 1Gi disk image located at /data/disk.img and attach it to a VM.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-host-disk\n name: vmi-host-disk\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: host-disk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - hostDisk:\n capacity: 1Gi\n path: /data/disk.img\n type: DiskOrCreate\n name: host-disk\nstatus: {}\n
Note: This does not always work as expected. Instead you may want to consider creating a PersistentVolume
"},{"location":"storage/disks_and_volumes/#configmap","title":"configMap","text":"A configMap
is a reference to a ConfigMap in Kubernetes. A configMap
can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk
does not support dynamic change propagation and filesystem
does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.
By using disk, an extra iso
disk will be allocated which has to be mounted on a VM. To mount the configMap
users can use cloudInit
and the disk's serial number. The name
needs to be set for a reference to the created kubernetes ConfigMap
.
Note: Currently, ConfigMap update is not propagate into the VMI. If a ConfigMap is updated, only a pod will be aware of changes, not running VMIs.
Note: Due to a Kubernetes CRD issue, you cannot control the paths within the volume where ConfigMap keys are projected.
Example: Attach the configMap
to a VM and use cloudInit
to mount the iso
disk:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n name: app-config-disk\n # set serial\n serial: CVLY623300HK240D\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n bootcmd:\n # mount the ConfigMap\n - \"sudo mkdir /mnt/app-config\"\n - \"sudo mount /dev/$(lsblk --nodeps -no name,serial | grep CVLY623300HK240D | cut -f1 -d' ') /mnt/app-config\"\n name: cloudinitdisk\n - configMap:\n name: app-config\n name: app-config-disk\nstatus: {}\n
"},{"location":"storage/disks_and_volumes/#as-a-filesystem","title":"As a filesystem","text":"By using filesystem, configMaps
are shared through virtiofs
. In contrast with using disk for sharing configMaps
, filesystem
allows you to dynamically propagate changes on configMaps
to VMIs (i.e. the VM does not need to be rebooted).
Note: Currently, VMIs can not be live migrated since virtiofs
does not support live migration.
To share a given configMap
, the following VM definition could be used:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n filesystems:\n - name: config-fs\n virtiofs: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n bootcmd:\n # mount the ConfigMap\n - \"sudo mkdir /mnt/app-config\"\n - \"sudo mount -t virtiofs config-fs /mnt/app-config\"\n name: cloudinitdisk \n - configMap:\n name: app-config\n name: config-fs\n
"},{"location":"storage/disks_and_volumes/#secret","title":"secret","text":"A secret
is a reference to a Secret in Kubernetes. A secret
can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk
does not support dynamic change propagation and filesystem
does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.
By using disk, an extra iso
disk will be allocated which has to be mounted on a VM. To mount the secret
users can use cloudInit
and the disks serial number. The secretName
needs to be set for a reference to the created kubernetes Secret
.
Note: Currently, Secret update propagation is not supported. If a Secret is updated, only a pod will be aware of changes, not running VMIs.
Note: Due to a Kubernetes CRD issue, you cannot control the paths within the volume where Secret keys are projected.
Example: Attach the secret
to a VM and use cloudInit
to mount the iso
disk:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n name: app-secret-disk\n # set serial\n serial: D23YZ9W6WA5DJ487\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n bootcmd:\n # mount the Secret\n - \"sudo mkdir /mnt/app-secret\"\n - \"sudo mount /dev/$(lsblk --nodeps -no name,serial | grep D23YZ9W6WA5DJ487 | cut -f1 -d' ') /mnt/app-secret\"\n name: cloudinitdisk\n - secret:\n secretName: app-secret\n name: app-secret-disk\nstatus: {}\n
"},{"location":"storage/disks_and_volumes/#as-a-filesystem_1","title":"As a filesystem","text":"By using filesystem, secrets
are shared through virtiofs
. In contrast with using disk for sharing secrets
, filesystem
allows you to dynamically propagate changes on secrets
to VMIs (i.e. the VM does not need to be rebooted).
Note: Currently, VMIs can not be live migrated since virtiofs
does not support live migration.
To share a given secret
, the following VM definition could be used:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n filesystems:\n - name: app-secret-fs\n virtiofs: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n bootcmd:\n # mount the Secret\n - \"sudo mkdir /mnt/app-secret\"\n - \"sudo mount -t virtiofs app-secret-fs /mnt/app-secret\"\n name: cloudinitdisk\n - secret:\n secretName: app-secret\n name: app-secret-fs\n
"},{"location":"storage/disks_and_volumes/#serviceaccount","title":"serviceAccount","text":"A serviceAccount
volume references a Kubernetes ServiceAccount
. A serviceAccount
can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk
does not support dynamic change propagation and filesystem
does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.
By using disk, a new iso
disk will be allocated with the content of the service account (namespace
, token
and ca.crt
), which needs to be mounted in the VM. For automatic mounting, see the configMap
and secret
examples above.
Note: Currently, ServiceAccount update propagation is not supported. If a ServiceAccount is updated, only a pod will be aware of changes, not running VMIs.
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n disks:\n - disk:\n name: containerdisk\n - disk:\n name: serviceaccountdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - name: serviceaccountdisk\n serviceAccount:\n serviceAccountName: default\n
"},{"location":"storage/disks_and_volumes/#as-a-filesystem_2","title":"As a filesystem","text":"By using filesystem, serviceAccounts
are shared through virtiofs
. In contrast with using disk for sharing serviceAccounts
, filesystem
allows you to dynamically propagate changes on serviceAccounts
to VMIs (i.e. the VM does not need to be rebooted).
Note: Currently, VMIs can not be live migrated since virtiofs
does not support live migration.
To share a given serviceAccount
, the following VM definition could be used:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n filesystems:\n - name: serviceaccount-fs\n virtiofs: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n bootcmd:\n # mount the ConfigMap\n - \"sudo mkdir /mnt/serviceaccount\"\n - \"sudo mount -t virtiofs serviceaccount-fs /mnt/serviceaccount\"\n name: cloudinitdisk\n - name: serviceaccount-fs\n serviceAccount:\n serviceAccountName: default\n
"},{"location":"storage/disks_and_volumes/#downwardmetrics","title":"downwardMetrics","text":"downwardMetrics
expose a limited set of VM and host metrics to the guest. The format is compatible with vhostmd.
Getting a limited set of host and VM metrics is in some cases required to allow third-parties diagnosing performance issues on their appliances. One prominent example is SAP HANA.
In order to expose downwardMetrics
to VMs, the methods disk
and virtio-serial port
are supported.
Note: The DownwardMetrics feature gate must be enabled to use the metrics. Available starting with KubeVirt v0.42.0.
"},{"location":"storage/disks_and_volumes/#disk_1","title":"Disk","text":"A volume is created, and it is exposed to the guest as a raw block volume. KubeVirt will update it periodically (by default, every 5 seconds).
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: metrics\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - name: metrics\n downwardMetrics: {}\n
"},{"location":"storage/disks_and_volumes/#virtio-serial-port","title":"Virtio-serial port","text":"This method uses a virtio-serial port to expose the metrics data to the VM. KubeVirt creates a port named /dev/virtio-ports/org.github.vhostmd.1
inside the VM, in which the Virtio Transport protocol is supported. downwardMetrics
can be retrieved from this port. See vhostmd documentation under Virtio Transport
for further information.
To expose the metrics using a virtio-serial port, a downwardMetrics
device must be added (i.e., spec.domain.devices.downwardMetrics: {}
).
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n downwardMetrics: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n
"},{"location":"storage/disks_and_volumes/#accessing-metrics-data","title":"Accessing Metrics Data","text":"To access the DownwardMetrics shared with a disk or a virtio-serial port, the vm-dump-metrics
tool can be used:
$ sudo dnf install -y vm-dump-metrics\n$ sudo vm-dump-metrics\n<metrics>\n <metric type=\"string\" context=\"host\">\n <name>HostName</name>\n <value>node01</value>\n[...]\n <metric type=\"int64\" context=\"host\" unit=\"s\">\n <name>Time</name>\n <value>1619008605</value>\n </metric>\n <metric type=\"string\" context=\"host\">\n <name>VirtualizationVendor</name>\n <value>kubevirt.io</value>\n </metric>\n</metrics>\n
vm-dump-metrics
is useful as a standalone tool to verify the serial port is working and to inspect the metrics. However, applications that consume metrics will usually connect to the virtio-serial port themselves.
Note: The tool vm-dump-metrics
provides the option --virtio
in case the virtio-serial port is used. Please, refer to vm-dump-metrics --help
for further information.
Libvirt has the ability to use IOThreads for dedicated disk access (for supported devices). These are dedicated event loop threads that perform block I/O requests and improve scalability on SMP systems. KubeVirt exposes this libvirt feature through the ioThreadsPolicy
setting. Additionally, each Disk
device exposes a dedicatedIOThread
setting. This is a boolean that indicates the specified disk should be allocated an exclusive IOThread that will never be shared with other disks.
Currently valid policies are shared
and auto
. If ioThreadsPolicy
is omitted entirely, use of IOThreads will be disabled. However, if any disk requests a dedicated IOThread, ioThreadsPolicy
will be enabled and default to shared
.
An ioThreadsPolicy
of shared
indicates that KubeVirt should use one thread that will be shared by all disk devices. This policy stems from the fact that large numbers of IOThreads is generally not useful as additional context switching is incurred for each thread.
Disks with dedicatedIOThread
set to true
will not use the shared thread, but will instead be allocated an exclusive thread. This is generally useful if a specific Disk is expected to have heavy I/O traffic, e.g. a database spindle.
auto
IOThreads indicates that KubeVirt should use a pool of IOThreads and allocate disks to IOThreads in a round-robin fashion. The pool size is generally limited to twice the number of VCPU's allocated to the VM. This essentially attempts to dedicate disks to separate IOThreads, but only up to a reasonable limit. This would come in to play for systems with a large number of disks and a smaller number of CPU's for instance.
As a caveat to the size of the IOThread pool, disks with dedicatedIOThread
will always be guaranteed their own thread. This effectively diminishes the upper limit of the number of threads allocated to the rest of the disks. For example, a VM with 2 CPUs would normally use 4 IOThreads for all disks. However if one disk had dedicatedIOThread
set to true, then KubeVirt would only use 3 IOThreads for the shared pool.
There is always guaranteed to be at least one thread for disks that will use the shared IOThreads pool. Thus if a sufficiently large number of disks have dedicated IOThreads assigned, auto
and shared
policies would essentially result in the same layout.
When guest's vCPUs are pinned to a host's physical CPUs, it is also best to pin the IOThreads to specific CPUs to prevent these from floating between the CPUs. KubeVirt will automatically calculate and pin each IOThread to a CPU or a set of CPUs, depending on the ration between them. In case there are more IOThreads than CPUs, each IOThread will be pinned to a CPU, in a round-robin fashion. Otherwise, when there are fewer IOThreads than CPU, each IOThread will be pinned to a set of CPUs.
"},{"location":"storage/disks_and_volumes/#iothreads-with-qemu-emulator-thread-and-dedicated-pinned-cpus","title":"IOThreads with QEMU Emulator thread and Dedicated (pinned) CPUs","text":"To further improve the vCPUs latency, KubeVirt can allocate an additional dedicated physical CPU1, exclusively for the emulator thread, to which it will be pinned. This will effectively \"isolate\" the emulator thread from the vCPUs of the VMI. When ioThreadsPolicy
is set to auto
IOThreads will also be \"isolated\" from the vCPUs and placed on the same physical CPU as the QEMU emulator thread.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-shared\n name: vmi-shared\nspec:\n domain:\n ioThreadsPolicy: shared\n cpu:\n cores: 2\n devices:\n disks:\n - disk:\n bus: virtio\n name: vmi-shared_disk\n - disk:\n bus: virtio\n name: emptydisk\n dedicatedIOThread: true\n - disk:\n bus: virtio\n name: emptydisk2\n dedicatedIOThread: true\n - disk:\n bus: virtio\n name: emptydisk3\n - disk:\n bus: virtio\n name: emptydisk4\n - disk:\n bus: virtio\n name: emptydisk5\n - disk:\n bus: virtio\n name: emptydisk6\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n volumes:\n - name: vmi-shared_disk\n persistentVolumeClaim:\n claimName: vmi-shared_pvc\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk2\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk3\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk4\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk5\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk6\n
In this example, emptydisk and emptydisk2 both request a dedicated IOThread. vmi-shared_disk, and emptydisk 3 through 6 will all shared one IOThread.
mypvc: 1\nemptydisk: 2\nemptydisk2: 3\nemptydisk3: 1\nemptydisk4: 1\nemptydisk5: 1\nemptydisk6: 1\n
"},{"location":"storage/disks_and_volumes/#auto-iothreads","title":"Auto IOThreads","text":"apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-shared\n name: vmi-shared\nspec:\n domain:\n ioThreadsPolicy: auto\n cpu:\n cores: 2\n devices:\n disks:\n - disk:\n bus: virtio\n name: mydisk\n - disk:\n bus: virtio\n name: emptydisk\n dedicatedIOThread: true\n - disk:\n bus: virtio\n name: emptydisk2\n dedicatedIOThread: true\n - disk:\n bus: virtio\n name: emptydisk3\n - disk:\n bus: virtio\n name: emptydisk4\n - disk:\n bus: virtio\n name: emptydisk5\n - disk:\n bus: virtio\n name: emptydisk6\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n volumes:\n - name: mydisk\n persistentVolumeClaim:\n claimName: mypvc\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk2\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk3\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk4\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk5\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk6\n
This VM is identical to the first, except it requests auto IOThreads. emptydisk
and emptydisk2
will still be allocated individual IOThreads, but the rest of the disks will be split across 2 separate iothreads (twice the number of CPU cores is 4).
Disks will be assigned to IOThreads like this:
mypvc: 1\nemptydisk: 3\nemptydisk2: 4\nemptydisk3: 2\nemptydisk4: 1\nemptydisk5: 2\nemptydisk6: 1\n
"},{"location":"storage/disks_and_volumes/#virtio-block-multi-queue","title":"Virtio Block Multi-Queue","text":"Block Multi-Queue is a framework for the Linux block layer that maps Device I/O queries to multiple queues. This splits I/O processing up across multiple threads, and therefor multiple CPUs. libvirt recommends that the number of queues used should match the number of CPUs allocated for optimal performance.
This feature is enabled by the BlockMultiQueue
setting under Devices
:
spec:\n domain:\n devices:\n blockMultiQueue: true\n disks:\n - disk:\n bus: virtio\n name: mydisk\n
Note: Due to the way KubeVirt implements CPU allocation, blockMultiQueue can only be used if a specific CPU allocation is requested. If a specific number of CPUs hasn't been allocated to a VirtualMachine, KubeVirt will use all CPU's on the node on a best effort basis. In that case the amount of CPU allocation to a VM at the host level could change over time. If blockMultiQueue were to request a number of queues to match all the CPUs on a node, that could lead to over-allocation scenarios. To avoid this, KubeVirt enforces that a specific slice of CPU resources is requested in order to take advantage of this feature.
"},{"location":"storage/disks_and_volumes/#example","title":"Example","text":"metadata:\n name: testvmi-disk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n cpu: 4\n devices:\n blockMultiQueue: true\n disks:\n - name: mypvcdisk\n disk:\n bus: virtio\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
This example will enable Block Multi-Queue for the disk mypvcdisk
and allocate 4 queues (to match the number of CPUs requested).
KubeVirt supports none
, writeback
, and writethrough
KVM/QEMU cache modes.
none
I/O from the guest is not cached on the host. Use this option for guests with large I/O requirements. This option is generally the best choice.
writeback
I/O from the guest is cached on the host and written through to the physical media when the guest OS issues a flush.
writethrough
I/O from the guest is cached on the host but must be written through to the physical medium before the write operation completes.
Important: none
cache mode is set as default if the file system supports direct I/O, otherwise, writethrough
is used.
Note: It is possible to force a specific cache mode, although if none
mode has been chosen and the file system does not support direct I/O then started VMI will return an error.
Example: force writethrough
cache mode
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-pvc\n name: vmi-pvc\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: pvcdisk\n cache: writethrough\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: pvcdisk\n persistentVolumeClaim:\n claimName: disk-alpine\nstatus: {}\n
"},{"location":"storage/disks_and_volumes/#disk-sharing","title":"Disk sharing","text":"Shareable disks allow multiple VMs to share the same underlying storage. In order to use this feature, special care is required because this could lead to data corruption and the loss of important data. Shareable disks demand either data synchronization at the application level or the use of clustered filesystems. These advanced configurations are not within the scope of this documentation and are use-case specific.
If the shareable
option is set, it indicates to libvirt/QEMU that the disk is going to be accessed by multiple VMs and not to create a lock for the writes.
In this example, we use Rook Ceph in order to dynamically provisioning the PVC.
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: block-pvc\nspec:\n accessModes:\n - ReadWriteMany\n volumeMode: Block\n resources:\n requests:\n storage: 1Gi\n storageClassName: rook-ceph-block\n
$ kubectl get pvc\nNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE\nblock-pvc Bound pvc-0a161bb2-57c7-4d97-be96-0a20ff0222e2 1Gi RWO rook-ceph-block 51s\n
Then, we can declare 2 VMs and set the shareable
option to true for the shared disk. apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-block-1\n name: vm-block-1\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-block-1\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n bus: virtio\n shareable: true\n name: block-disk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 2G\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n - name: block-disk\n persistentVolumeClaim:\n claimName: block-pvc\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-block-2\n name: vm-block-2\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-block-2\n spec:\n affinity:\n podAffinity:\n requiredDuringSchedulingIgnoredDuringExecution:\n - labelSelector:\n matchExpressions:\n - key: kubevirt.io/vm\n operator: In\n values:\n - vm-block-1\n topologyKey: \"kubernetes.io/hostname\"\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n bus: virtio\n shareable: true\n name: block-disk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 2G\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n - name: block-disk\n persistentVolumeClaim:\n claimName: block-pvc \n
We can now attempt to write a string from the first guest and then read the string from the second guest to test that the sharing is working. $ virtctl console vm-block-1\n$ printf \"Test awesome shareable disks\" | sudo dd of=/dev/vdc bs=1 count=150 conv=notrunc\n28+0 records in\n28+0 records out\n28 bytes copied, 0.0264182 s, 1.1 kB/s\n# Log into the second guest\n$ virtctl console vm-block-2\n$ sudo dd if=/dev/vdc bs=1 count=150 conv=notrunc\nTest awesome shareable disks150+0 records in\n150+0 records out\n150 bytes copied, 0.136753 s, 1.1 kB/s\n
If you are using local devices or RWO PVCs, setting the affinity on the VMs that share the storage guarantees they will be scheduled on the same node. In the example, we set the affinity on the second VM using the label used on the first VM. If you are using shared storage with RWX PVCs, then the affinity rule is not necessary as the storage can be attached simultaneously on multiple nodes.
"},{"location":"storage/disks_and_volumes/#sharing-directories-with-vms","title":"Sharing Directories with VMs","text":"Virtiofs
allows to make visible external filesystems to KubeVirt
VMs. Virtiofs
is a shared file system that lets VMs access a directory tree on the host. Further details can be found at Official Virtiofs Site.
KubeVirt supports two PVC sharing modes: non-privileged and privileged.
The non-privileged mode is enabled by default. This mode has the advantage of not requiring any administrative privileges for creating the VM. However, it has some limitations:
To switch to the privileged mode, the feature gate ExperimentalVirtiofsSupport has to be enabled. Take into account that this mode requires privileges to run rootful containers.
"},{"location":"storage/disks_and_volumes/#sharing-persistent-volume-claims","title":"Sharing Persistent Volume Claims","text":""},{"location":"storage/disks_and_volumes/#cluster-configuration","title":"Cluster Configuration","text":"We need to create a new VM definition including the spec.devices.disk.filesystems.virtiofs
and a PVC. Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-fs\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n filesystems:\n - name: virtiofs-disk\n virtiofs: {}\n resources:\n requests:\n memory: 1024Mi\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n - name: virtiofs-disk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#configuration-inside-the-vm","title":"Configuration Inside the VM","text":"The following configuration can be done in using startup script. See cloudInitNoCloud section for more details. However, we can do it manually by logging in to the VM and mounting it. Here are examples of how to mount it in a linux and windows VMs:
$ sudo mkdir -p /mnt/disks/virtio\n$ sudo mount -t virtiofs virtiofs-disk /mnt/disks/virtio\n
See this guide for details on startup steps needed for Windows VMs.
"},{"location":"storage/disks_and_volumes/#sharing-node-directories","title":"Sharing Node Directories","text":"It is allowed using hostpaths. The following configuration example is shown for illustrative purposes. However, the PVCs method is preferred since using hostpath is generally discouraged for security reasons.
"},{"location":"storage/disks_and_volumes/#configuration-inside-the-node","title":"Configuration Inside the Node","text":"To share the directory with the VMs, we need to log in to the node, create the shared directory (if it does not already exist), and set the proper SELinux context label container_file_t
to the shared directory. In this example we are going to share a new directory /mnt/data
(if the desired directory is an existing one, you can skip the mkdir
command):
$ mkdir /tmp/data\n$ sudo chcon -t container_file_t /tmp/data\n
Note: If you are attempting to share an existing directory, you must first check the SELinux context label with the command ls -Z <directory>
. In the case that the label is not present or is not container_file_t
you need to label it with the chcon
command.
We need a StorageClass
which uses the provider no-provisioner
:
apiVersion: storage.k8s.io/v1\nkind: StorageClass\nmetadata:\n name: no-provisioner-storage-class\nprovisioner: kubernetes.io/no-provisioner\nreclaimPolicy: Delete\nvolumeBindingMode: WaitForFirstConsumer\n
To make the shared directory available for VMs, we need to create a PV and a PVC that could be consumed by the VMs:
kind: PersistentVolume\napiVersion: v1\nmetadata:\n name: hostpath\nspec:\n capacity:\n storage: 10Gi\n accessModes:\n - ReadWriteMany\n hostPath:\n path: \"/tmp/data\"\n storageClassName: \"no-provisioner-storage-class\"\n nodeAffinity:\n required:\n nodeSelectorTerms:\n - matchExpressions:\n - key: kubernetes.io/hostname\n operator: In\n values:\n - node01\n--- \napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: hostpath-claim\nspec:\n accessModes:\n - ReadWriteMany\n storageClassName: \"no-provisioner-storage-class\"\n resources:\n requests:\n storage: 10Gi\n
Note: Change the node01
value for the node name where you want the shared directory will be located.
The VM definitions have to request the PVC hostpath-claim
and attach it as a virtiofs filesystem:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: hostpath-vm\n name: hostpath\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/domain: hostpath\n kubevirt.io/vm: hostpath\n spec:\n domain:\n cpu:\n cores: 1\n sockets: 1\n threads: 1\n devices:\n filesystems:\n - name: vm-hostpath\n virtiofs: {}\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n interfaces:\n - name: default\n masquerade: {}\n rng: {}\n resources:\n requests:\n memory: 1Gi\n networks:\n - name: default\n pod: {}\n terminationGracePeriodSeconds: 180\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: password\n user: fedora\n name: cloudinitdisk\n - name: vm-hostpath\n persistentVolumeClaim:\n claimName: hostpath-claim\n
"},{"location":"storage/disks_and_volumes/#configuration-inside-the-vm_1","title":"Configuration Inside the VM","text":"We need to log in to the VM and mount the shared directory:
$ sudo mount -t virtiofs vm-hostpath /mnt\n
"},{"location":"storage/export_api/","title":"Export API","text":"It can be desirable to export a Virtual Machine and its related disks out of a cluster so you can import that Virtual Machine into another system or cluster. The Virtual Machine disks are the most prominent things you will want to export. The export API makes it possible to declaratively export Virtual Machine disks. It is also possible to export individual PVCs and their contents, for instance when you have created a memory dump from a VM or are using virtio-fs to have a Virtual Machine populate a PVC.
In order not to overload the kubernetes API server the data is transferred through a dedicated export proxy server. The proxy server can then be exposed to the outside world through a service associated with an Ingress/Route or NodePort. As an alternative, the port-forward
flag can be used with the virtctl integration to bypass the need of an Ingress/Route.
VMExport support must be enabled in the feature gates to be available. The feature gates field in the KubeVirt CR must be expanded by adding the VMExport
to it.
In order to securely export a Virtual Machine Disk, you must create a token that is used to authorize users accessing the export endpoint. This token must be in the same namespace as the Virtual Machine. The contents of the secret can be passed as a token header or parameter to the export URL. The name of the header or argument is x-kubevirt-export-token
with a value that matches the content of the secret. The secret can be named any valid secret in the namespace. We recommend you generate an alpha numeric token of at least 12 characters. The data key should be token
. For example:
apiVersion: v1\nkind: Secret\nmetadata:\n name: example-token\nstringData:\n token: 1234567890ab\n
"},{"location":"storage/export_api/#export-virtual-machine-volumes","title":"Export Virtual Machine volumes","text":"After you have created the token you can now create a VMExport CR that identifies the Virtual Machine you want to export. You can create a VMExport that looks like this:
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n tokenSecretRef: example-token\n source:\n apiGroup: \"kubevirt.io\"\n kind: VirtualMachine\n name: example-vm\n
The following volumes present in the VM will be exported:
All other volume types are not exported. To avoid the export of inconsistent data, a Virtual Machine can only be exported while it is powered off. Any active VM exports will be terminated if the Virtual Machine is started. To export data from a running Virtual Machine you must first create a Virtual Machine Snapshot (see below).
If the VM contains multiple volumes that can be exported, each volume will get its own URL links. If the VM contains no volumes that can be exported, the VMExport will go into a Skipped
phase, and no export server is started.
You can create a VMExport CR that identifies the Virtual Machine Snapshot you want to export. You can create a VMExport that looks like this:
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n tokenSecretRef: example-token\n source:\n apiGroup: \"snapshot.kubevirt.io\"\n kind: VirtualMachineSnapshot\n name: example-vmsnapshot\n
When you create a VMExport based on a Virtual Machine Snapshot, the controller will attempt to create PVCs from the volume snapshots contained in Virtual Machine Snapshot. Once all the PVCs are ready, the export server will start and you can begin the export. If the Virtual Machine Snapshot contains multiple volumes that can be exported, each volume will get its own URL links. If the Virtual Machine snapshot contains no volumes that can be exported, the VMExport will go into a skipped
phase, and no export server is started.
You can create a VMExport CR that identifies the Persistent Volume Claim (PVC) you want to export. You can create a VMExport that looks like this:
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n tokenSecretRef: example-token\n source:\n apiGroup: \"\"\n kind: PersistentVolumeClaim\n name: example-pvc\n
In this example the PVC name is example-pvc
. Note the PVC doesn't need to contain a Virtual Machine Disk, it can contain any content, but the main use case is exporting Virtual Machine Disks. After you post this yaml to the cluster, a new export server is created in the same namespace as the PVC. If the source PVC is in use by another pod (such as the virt-launcher pod) then the export will remain pending until the PVC is no longer in use. If the exporter server is active and another pod starts using the PVC, the exporter server will be terminated until the PVC is not in use anymore.
The VirtualMachineExport CR will contain a status with internal and external links to the export service. The internal links are only valid inside the cluster, and the external links are valid for external access through an Ingress or Route. The cert
field will contain the CA that signed the certificate of the export server for internal links, or the CA that signed the Route or Ingress.
The following is an example of exporting a PVC that contains a KubeVirt disk image. The controller determines if the PVC contains a kubevirt disk by checking if there is a special annotation on the PVC, or if there is a DataVolume ownerReference on the PVC, or if the PVC has a volumeMode of block.
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\n namespace: example\nspec:\n source:\n apiGroup: \"\"\n kind: PersistentVolumeClaim\n name: example-pvc\n tokenSecretRef: example-token\nstatus:\n conditions:\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:10:09Z\"\n reason: podReady\n status: \"True\"\n type: Ready\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:09:02Z\"\n reason: pvcBound\n status: \"True\"\n type: PVCReady\n links:\n external:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: raw\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example-disk/disk.img\n - format: gzip\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example-disk/disk.img.gz\n name: example-disk\n internal:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: raw\n url: https://virt-export-example-export.example.svc/volumes/example-disk/disk.img\n - format: gzip\n url: https://virt-export-example-export.example.svc/volumes/example-disk/disk.img.gz\n name: example-disk\n phase: Ready\n serviceName: virt-export-example-export\n
"},{"location":"storage/export_api/#archive-content-type","title":"Archive content-type","text":"Archive content-type is automatically selected if we are unable to determine the PVC contains a KubeVirt disk. The archive will contain all the files that are in the PVC.
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\n namespace: example\nspec:\n source:\n apiGroup: \"\"\n kind: PersistentVolumeClaim\n name: example-pvc\n tokenSecretRef: example-token\nstatus:\n conditions:\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:10:09Z\"\n reason: podReady\n status: \"True\"\n type: Ready\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:09:02Z\"\n reason: pvcBound\n status: \"True\"\n type: PVCReady\n links:\n external:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: dir\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example/dir\n - format: tar.gz\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example/disk.tar.gz\n name: example-disk\n internal:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: dir\n url: https://virt-export-example-export.example.svc/volumes/example/dir\n - format: tar.gz\n url: https://virt-export-example-export.example.svc/volumes/example/disk.tar.gz\n name: example-disk\n phase: Ready\n serviceName: virt-export-example-export\n
"},{"location":"storage/export_api/#manifests","title":"Manifests","text":"The VirtualMachine manifests can be retrieved by accessing the manifests
in the VirtualMachineExport status. The all
type will return the VirtualMachine manifest, any DataVolumes, and a configMap that contains the public CA certificate of the Ingress/Route of the external URL, or the CA of the export server of the internal URL. The auth-header-secret
will be a secret that contains a Containerized Data Importer (CDI) compatible header. This header contains a text version of the export token.
Both internal and external links will contain a manifests
field. If there are no external links, then there will not be any external manifests either. The virtualMachine manifests
field is only available if the source is a VirtualMachine
or VirtualMachineSnapshot
. Exporting a PersistentVolumeClaim
will not generate a Virtual Machine manifest.
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\n namespace: example\nspec:\n source:\n apiGroup: \"\"\n kind: PersistentVolumeClaim\n name: example-pvc\n tokenSecretRef: example-token\nstatus:\n conditions:\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:10:09Z\"\n reason: podReady\n status: \"True\"\n type: Ready\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:09:02Z\"\n reason: pvcBound\n status: \"True\"\n type: PVCReady\n links:\n external:\n ...\n manifests:\n - type: all\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/external/manifests/all\n - type: auth-header-secret\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/external/manifests/secret\n internal:\n ...\n manifests:\n - type: all\n url: https://virt-export-export-pvc.default.svc/internal/manifests/all\n - type: auth-header-secret\n url: https://virt-export-export-pvc.default.svc/internal/manifests/secret\n phase: Ready\n serviceName: virt-export-example-export\n
"},{"location":"storage/export_api/#format-types","title":"Format types","text":"There are 4 format types that are possible:
Raw and Gzip will be selected if the PVC is determined to be a KubeVirt disk. KubeVirt disks contain a single disk.img file (or are a block device). Dir will return a list of the files in the PVC, to download a specific file you can replace /dir
in the URL with the path and file name. For instance if the PVC contains the file /example/data.txt
you can replace /dir
with /example/data.txt
to download just data.txt file. Or you can use the tar.gz URL to get all the contents of the PVC in a tar file.
The export server certificate is valid for 7 days after which it is rotated by deleting the export server pod and associated secret and generating a new one. If for whatever reason the export server pod dies, the associated secret is also automatically deleted and a new pod and secret are generated. The VirtualMachineExport object status will be automatically updated to reflect the new certificate.
"},{"location":"storage/export_api/#external-link-certificates","title":"External link certificates","text":"The external link certificates are associated with the Ingress/Route that points to the service created by the KubeVirt operator. The CA that signed the Ingress/Route will part of the certificates.
"},{"location":"storage/export_api/#ttl-time-to-live-for-an-export","title":"TTL (Time to live) for an Export","text":"For various reasons (security being one), users should be able to specify a TTL for the VMExport objects that limits the lifetime of an export. This is done via the ttlDuration
field which accepts a k8s duration, which defaults to 2 hours when not specified.
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n source:\n apiGroup: \"kubevirt.io\"\n kind: VirtualMachine\n name: example-vm\n tokenSecretRef: example-token\n ttlDuration: 1h\n
"},{"location":"storage/export_api/#virtctl-integration-vmexport","title":"virtctl integration: vmexport","text":"The virtctl vmexport
command allows users to interact with the export API in an easy-to-use way.
vmexport
uses two mandatory arguments:
These three functions are:
"},{"location":"storage/export_api/#create","title":"Create","text":"# Creates a VMExport object according to the specified flag.\n\n# The flag should either be:\n\n# --pvc, to specify the name of the pvc to export.\n# --snapshot, to specify the name of the VM snapshot to export.\n# --vm, to specify the name of the Virtual Machine to export.\n\n$ virtctl vmexport create name [flags]\n
"},{"location":"storage/export_api/#delete","title":"Delete","text":"# Deletes the specified VMExport object.\n\n$ virtctl vmexport delete name\n
"},{"location":"storage/export_api/#download","title":"Download","text":"# Downloads a volume from the defined VMExport object.\n\n# The main available flags are:\n\n# --output, mandatory flag to specify the output file.\n# --volume, optional flag to specify the name of the downloadable volume.\n# --vm|--snapshot|--pvc, if specified, are used to create the VMExport object assuming it doesn't exist. The name of the object to export has to be specified.\n# --format, optional flag to specify wether to download the file in compressed (default) or raw format.\n# --port-forward, optional flag to easily download the volume without the need of an ingress or route. Also, the local port can be optionally specified with the --local-port flag.\n\n$ virtctl vmexport download name [flags]\n
By default, the volume will be downloaded in compressed format. Users can specify the desired format (gzip or raw) by using the format
flag, as shown below:
# Downloads a volume from the defined VMExport object and, if necessary, decompresses it.\n$ virtctl vmexport download name --format=raw [flags]\n
"},{"location":"storage/export_api/#ttl-time-to-live","title":"TTL (Time to live)","text":"TTL can also be added when creating a VMExport via virtctl
$ virtctl vmexport create name --ttl=1h\n
For more information about usage and examples:
$ virtctl vmexport --help\n\nExport a VM volume.\n\nUsage:\n virtctl vmexport [flags]\n\nExamples:\n # Create a VirtualMachineExport to export a volume from a virtual machine:\n virtctl vmexport create vm1-export --vm=vm1\n\n # Create a VirtualMachineExport to export a volume from a virtual machine snapshot\n virtctl vmexport create snap1-export --snapshot=snap1\n\n # Create a VirtualMachineExport to export a volume from a PVC\n virtctl vmexport create pvc1-export --pvc=pvc1\n\n # Delete a VirtualMachineExport resource\n virtctl vmexport delete snap1-export\n\n # Download a volume from an already existing VirtualMachineExport (--volume is optional when only one volume is available)\n virtctl vmexport download vm1-export --volume=volume1 --output=disk.img.gz\n\n # Create a VirtualMachineExport and download the requested volume from it\n virtctl vmexport download vm1-export --vm=vm1 --volume=volume1 --output=disk.img.gz\n\nFlags:\n -h, --help help for vmexport\n --insecure When used with the 'download' option, specifies that the http request should be insecure.\n --keep-vme When used with the 'download' option, specifies that the vmexport object should not be deleted after the download finishes.\n --output string Specifies the output path of the volume to be downloaded.\n --pvc string Sets PersistentVolumeClaim as vmexport kind and specifies the PVC name.\n --snapshot string Sets VirtualMachineSnapshot as vmexport kind and specifies the snapshot name.\n --vm string Sets VirtualMachine as vmexport kind and specifies the vm name.\n --volume string Specifies the volume to be downloaded.\n\nUse \"virtctl options\" for a list of global command-line options (applies to all commands).\n
"},{"location":"storage/export_api/#use-cases","title":"Use cases","text":""},{"location":"storage/export_api/#clone-vm-from-one-cluster-to-another-cluster","title":"Clone VM from one cluster to another cluster","text":"If you want to transfer KubeVirt disk images from a source cluster to another target cluster, you can use the VMExport in the source to expose the disks and use Containerized Data Importer (CDI) in the target cluster to import the image into the target cluster. Let's assume we have an Ingress or Route in the source cluster that exposes the export proxy with the following example domain virt-exportproxy-example.example.com
and we have a Virtual Machine in the source cluster with one disk, which looks like this:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-example-datavolume\n name: example-vm\nspec:\n dataVolumeTemplates:\n - metadata:\n creationTimestamp: null\n name: example-dv\n spec:\n storage:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 20Gi\n storageClassName: local\n source:\n registry:\n url: docker://quay.io/containerdisks/centos-stream:9\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-example-datavolume\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: datavolumedisk1\n resources:\n requests:\n memory: 2Gi\n terminationGracePeriodSeconds: 0\n volumes:\n - dataVolume:\n name: example-dv\n name: datavolumedisk1\n
This is a VM that has a DataVolume (DV) example-dv
that is populated from a container disk and we want to export that disk to the target cluster. To export this VM we have to create a token that we can use in the target cluster to get access to the export, or we can let the export controller generate one for us. For example
apiVersion: v1\nkind: Secret\nmetadata:\n name: example-token\nstringData:\n token: 1234567890ab\n
The value of the token is 1234567890ab
hardly a secure token, but it is an example. We can now create a VMExport that looks like this: apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n tokenSecretRef: example-token #optional, if omitted the export controller will generate a token\n source:\n apiGroup: \"kubevirt.io\"\n kind: VirtualMachine\n name: example-vm\n
If the VM is not running the status of the VMExport object will get updated once the export-server pod is running to look something like this: apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\n namespace: example\nspec:\n tokenSecretRef: example-token\n source:\n apiGroup: \"kubevirt.io\"\n kind: VirtualMachine\n name: example-vm\nstatus:\n conditions:\n - lastProbeTime: null\n reason: podReady\n status: \"True\"\n type: Ready\n - lastProbeTime: null\n reason: pvcBound\n status: \"True\"\n type: PVCReady\n links:\n external:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: raw\n url: https://virt-exportproxy-example.example.com/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example-dv/disk.img\n - format: gzip\n url: https://virt-exportproxy-example.example.com/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example-dv/disk.img.gz\n name: example-disk\n internal:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: raw\n url: https://virt-export-example-export.example.svc/volumes/example-dv/disk.img\n - format: gzip\n url: https://virt-export-example-export.example.svc/volumes/example-dv/disk.img.gz\n name: example-disk\n phase: Ready\n serviceName: virt-export-example-export\n
Note in this example we are in the example
namespace in the source cluster, which is why the internal links domain ends with .example.svc
. The external links are what will be visible to outside of the source cluster, so we can use that for when we import into the target cluster. Now we are ready to import this disk into the target cluster. In order for CDI to import, we will need to provide appropriate yaml that contains the following: - CA cert (as config map) - The token needed to access the disk images in a CDI compatible format - The VM yaml - DataVolume yaml (optional if not part of the VM definition)
virtctl provides an additional argument to the download command called --manifest
that will retrieve the appropriate information from the export server, and either save it to a file with the --output
argument or write to standard out. By default this output will not contain the header secret as it contains the token in plaintext. To get the header secret you specify the --include-secret
argument. The default output format is yaml
but it is possible to get json
output as well.
Assuming there is a running VirtualMachineExport called example-export
and the same namespace exists in the target cluster. The name of the kubeconfig of the target cluster is named kubeconfig-target
, to clone the vm into the target cluster run the following commands:
$ virtctl vmexport download example-export --manifest --include-secret --output=import.yaml\n$ kubectl apply -f import.yaml --kubeconfig=kubeconfig-target\n
The first command generates the yaml and writes it to import.yaml
. The second command applies the generated yaml to the target cluster. It is possible to combine the two commands writing to standard out
with the first command, and piping it into the second command. Use this option if the export token should not be written to a file anywhere. This will create the VM in the target cluster, and provides CDI in the target cluster with everything required to import the disk images.
After the import completes you should be able to start the VM in the target cluster.
"},{"location":"storage/export_api/#download-a-vm-volume-locally-using-virtctl-vmexport","title":"Download a VM volume locally using virtctl vmexport","text":"Several steps from the previous section can be simplified considerably by using the vmexport
command.
Again, let's assume we have an Ingress or Route in our cluster that exposes the export proxy, and that we have a Virtual Machine in the cluster with one disk like this:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-example-datavolume\n name: example-vm\nspec:\n dataVolumeTemplates:\n - metadata:\n creationTimestamp: null\n name: example-dv\n spec:\n storage:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 20Gi\n storageClassName: local\n source:\n registry:\n url: docker://quay.io/containerdisks/centos-stream:9\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-example-datavolume\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: datavolumedisk1\n resources:\n requests:\n memory: 2Gi\n terminationGracePeriodSeconds: 0\n volumes:\n - dataVolume:\n name: example-dv\n name: datavolumedisk1\n
Once we meet these requirements, the process of downloading the volume locally can be accomplished by different means:
"},{"location":"storage/export_api/#performing-each-step-separately","title":"Performing each step separately","text":"We can download the volume by performing every single step in a different command. We start by creating the export object:
# We use an arbitrary name for the VMExport object, but specify our VM name in the flag.\n\n$ virtctl vmexport create vmexportname --vm=example-vm\n
Then, we download the volume in the specified output:
# Since our virtual machine only has one volume, there's no need to specify the volume name with the --volume flag.\n\n# After the download, the VMExport object is deleted by default, so we are using the optional --keep-vme flag to delete it manually.\n\n$ virtctl vmexport download vmexportname --output=/tmp/disk.img --keep-vme\n
Lastly, we delete the VMExport object:
$ virtctl vmexport delete vmexportname\n
"},{"location":"storage/export_api/#performing-one-single-step","title":"Performing one single step","text":"All the previous steps can be simplified in one, single command:
# Since we are using a create flag (--vm) with download, the command creates the object assuming the VMExport doesn't exist.\n\n# Also, since we are not using --keep-vme, the VMExport object is deleted after the download.\n\n$ virtctl vmexport download vmexportname --vm=example-vm --output=/tmp/disk.img\n
After the download finishes, we can find our disk in /tmp/disk.img
.
Libguestfs tools are a set of utilities for accessing and modifying VM disk images. The command virtctl guestfs
helps to deploy an interactive container with the libguestfs-tools and the PVC attached to it. This command is particularly useful if the users need to modify, inspect or debug VM disks on a PVC.
$ virtctl guestfs -h\nCreate a pod with libguestfs-tools, mount the pvc and attach a shell to it. The pvc is mounted under the /disks directory inside the pod for filesystem-based pvcs, or as /dev/vda for block-based pvcs\n\nUsage:\n virtctl guestfs [flags]\n\nExamples:\n # Create a pod with libguestfs-tools, mount the pvc and attach a shell to it:\n virtctl guestfs <pvc-name>\n\nFlags:\n -h, --help help for guestfs\n --image string libguestfs-tools container image\n --kvm Use kvm for the libguestfs-tools container (default true)\n --pull-policy string pull policy for the libguestfs image (default \"IfNotPresent\")\n\nUse \"virtctl options\" for a list of global command-line options (applies to all commands).\n
By default virtctl guestfs
sets up kvm
for the interactive container. This considerably speeds up the execution of the libguestfs-tools since they use QEMU. If the cluster doesn't have any kvm supporting nodes, the user must disable kvm by setting the option --kvm=false
. If not set, the libguestfs-tools pod will remain pending because it cannot be scheduled on any node.
The command automatically uses the image exposed by KubeVirt under the http endpoint /apis/subresources.kubevirt.io/<kubevirt-version>/guestfs
, but it can be configured to use a custom image by using the option --image
. Users can also overwrite the pull policy of the image by setting the option pull-policy
.
The command checks if a PVC is used by another pod in which case it will fail. However, once libguestfs-tools has started, the setup doesn't prevent a new pod starting and using the same PVC. The user needs to verify that there are no active virtctl guestfs pods before starting the VM which accesses the same PVC.
Currently, virtctl guestfs
supports only a single PVC. Future versions might support multiple PVCs attached to the interactive pod.
Generally, the user can take advantage of the virtctl guestfs
command for all typical usage of libguestfs-tools. It is strongly recommended to consult the official documentation. This command simply aims to help in configuring the correct containerized environment in the Kubernetes cluster where KubeVirt is installed.
For all the examples, the user has to start the interactive container by referencing the PVC in the virtctl guestfs
command. This will deploy the interactive pod and attach the stdin and stdout.
Example:
$ virtctl guestfs pvc-test\nUse image: registry:5000/kubevirt/libguestfs-tools@sha256:6644792751b2ba9442e06475a809448b37d02d1937dbd15ad8da4d424b5c87dd \nThe PVC has been mounted at /disk \nWaiting for container libguestfs still in pending, reason: ContainerCreating, message: \nWaiting for container libguestfs still in pending, reason: ContainerCreating, message: \nbash-5.0#\n
Once the libguestfs-tools pod has been deployed, the user can access the disk and execute the desired commands. Later, once the user has completed the operations on the disk, simply exit
the container and the pod be will automatically terminated. bash-5.0# virt-cat -a disk.img /etc/os-release \nNAME=Fedora\nVERSION=\"34 (Cloud Edition)\"\nID=fedora\nVERSION_ID=34\nVERSION_CODENAME=\"\"\nPLATFORM_ID=\"platform:f34\"\nPRETTY_NAME=\"Fedora 34 (Cloud Edition)\"\nANSI_COLOR=\"0;38;2;60;110;180\"\nLOGO=fedora-logo-icon\nCPE_NAME=\"cpe:/o:fedoraproject:fedora:34\"\nHOME_URL=\"https://fedoraproject.org/\"\nDOCUMENTATION_URL=\"https://docs.fedoraproject.org/en-US/fedora/34/system-administrators-guide/\"\nSUPPORT_URL=\"https://fedoraproject.org/wiki/Communicating_and_getting_help\"\nBUG_REPORT_URL=\"https://bugzilla.redhat.com/\"\nREDHAT_BUGZILLA_PRODUCT=\"Fedora\"\nREDHAT_BUGZILLA_PRODUCT_VERSION=34\nREDHAT_SUPPORT_PRODUCT=\"Fedora\"\nREDHAT_SUPPORT_PRODUCT_VERSION=34\nPRIVACY_POLICY_URL=\"https://fedoraproject.org/wiki/Legal:PrivacyPolicy\"\nVARIANT=\"Cloud Edition\"\nVARIANT_ID=cloud\n
bash-5.0# virt-customize -a disk.img --run-command 'useradd -m test-user -s /bin/bash' --password 'test-user:password:test-password'\n[ 0.0] Examining the guest ...\n[ 4.1] Setting a random seed\n[ 4.2] Setting the machine ID in /etc/machine-id\n[ 4.2] Running: useradd -m test-user -s /bin/bash\n[ 4.3] Setting passwords\n[ 5.3] Finishing off\n
Run virt-rescue and repair a broken partition or initrd (for example by running dracut)
bash-5.0# virt-rescue -a disk.img\n[...]\nThe virt-rescue escape key is \u2018^]\u2019. Type \u2018^] h\u2019 for help.\n\n------------------------------------------------------------\n\nWelcome to virt-rescue, the libguestfs rescue shell.\n\nNote: The contents of / (root) are the rescue appliance.\nYou have to mount the guest\u2019s partitions under /sysroot\nbefore you can examine them.\n><rescue> fdisk -l\nDisk /dev/sda: 6 GiB, 6442450944 bytes, 12582912 sectors\nDisk model: QEMU HARDDISK \nUnits: sectors of 1 * 512 = 512 bytes\nSector size (logical/physical): 512 bytes / 512 bytes\nI/O size (minimum/optimal): 512 bytes / 512 bytes\nDisklabel type: gpt\nDisk identifier: F8DC0844-9194-4B34-B432-13FA4B70F278\n\nDevice Start End Sectors Size Type\n/dev/sda1 2048 4095 2048 1M BIOS boot\n/dev/sda2 4096 2101247 2097152 1G Linux filesystem\n/dev/sda3 2101248 12580863 10479616 5G Linux filesystem\n\n\nDisk /dev/sdb: 4 GiB, 4294967296 bytes, 8388608 sectors\nDisk model: QEMU HARDDISK \nUnits: sectors of 1 * 512 = 512 bytes\nSector size (logical/physical): 512 bytes / 512 bytes\nI/O size (minimum/optimal): 512 bytes / 512 bytes\n><rescue> mount /dev/sda3 sysroot/\n><rescue> mount /dev/sda2 sysroot/boot\n><rescue> chroot sysroot/\n><rescue> ls boot/\nSystem.map-5.11.12-300.fc34.x86_64\nconfig-5.11.12-300.fc34.x86_64\nefi\ngrub2\ninitramfs-0-rescue-8afb5b540fab48728e48e4196a3a48ee.img\ninitramfs-5.11.12-300.fc34.x86_64.img\nloader\nvmlinuz-0-rescue-8afb5b540fab48728e48e4196a3a48ee\n><rescue> dracut -f boot/initramfs-5.11.12-300.fc34.x86_64.img 5.11.12-300.fc34.x86_64\n[...]\n><rescue> exit # <- exit from chroot\n><rescue> umount sysroot/boot\n><rescue> umount sysroot\n><rescue> exit\n
Install an OS from scratch
bash-5.0# virt-builder centos-8.2 -o disk.img --root-password password:password-test\n[ 1.5] Downloading: http://builder.libguestfs.org/centos-8.2.xz\n######################################################################## 100.0%#=#=# ######################################################################## 100.0%\n[ 58.3] Planning how to build this image\n[ 58.3] Uncompressing\n[ 65.7] Opening the new disk\n[ 70.8] Setting a random seed\n[ 70.8] Setting passwords\n[ 72.0] Finishing off\n Output file: disk.img\n Output size: 6.0G\n Output format: raw\n Total usable space: 5.3G\n Free space: 4.0G (74%)\n
bash-5.0# virt-filesystems -a disk.img --partitions --filesystem --long\nName Type VFS Label MBR Size Parent\n/dev/sda2 filesystem ext4 - - 1023303680 -\n/dev/sda4 filesystem xfs - - 4710203392 -\n/dev/sda1 partition - - - 1048576 /dev/sda\n/dev/sda2 partition - - - 1073741824 /dev/sda\n/dev/sda3 partition - - - 644874240 /dev/sda\n/dev/sda4 partition - - - 4720689152 /dev/sda\n
Currently, it is not possible to resize the xfs filesystem.
"},{"location":"storage/hotplug_volumes/","title":"Hotplug Volumes","text":"KubeVirt now supports hotplugging volumes into a running Virtual Machine Instance (VMI). The volume must be either a block volume or contain a disk image. When a VM that has hotplugged volumes is rebooted, the hotplugged volumes will be attached to the restarted VM. If the volumes are persisted they will become part of the VM spec, and will not be considered hotplugged. If they are not persisted, the volumes will be reattached as hotplugged volumes
"},{"location":"storage/hotplug_volumes/#enabling-hotplug-volume-support","title":"Enabling hotplug volume support","text":"Hotplug volume support must be enabled in the feature gates to be supported. The feature gates field in the KubeVirt CR must be expanded by adding the HotplugVolumes
to it.
In order to hotplug a volume, you must first prepare a volume. This can be done by using a DataVolume (DV). In the example we will use a blank DV in order to add some extra storage to a running VMI
apiVersion: cdi.kubevirt.io/v1beta1\nkind: DataVolume\nmetadata:\n name: example-volume-hotplug\nspec:\n source:\n blank: {}\n storage:\n resources:\n requests:\n storage: 5Gi\n
In this example we are using ReadWriteOnce
accessMode, and the default FileSystem volume mode. Volume hotplugging supports all combinations of block volume mode and ReadWriteMany
/ReadWriteOnce
/ReadOnlyMany
accessModes, if your storage supports the combination."},{"location":"storage/hotplug_volumes/#addvolume","title":"Addvolume","text":"Now lets assume we have started a VMI like the Fedora VMI in examples and the name of the VMI is 'vmi-fedora'. We can add the above blank volume to this running VMI by using the 'addvolume' command available with virtctl
$ virtctl addvolume vmi-fedora --volume-name=example-volume-hotplug\n
This will hotplug the volume into the running VMI, and set the serial of the disk to the volume name. In this example it is set to example-hotplug-volume.
"},{"location":"storage/hotplug_volumes/#why-virtio-scsi","title":"Why virtio-scsi","text":"The bus of hotplug disk is specified as a scsi
disk. Why is it not specified as virtio
instead, like regular disks? The reason is a limitation of virtio
disks that each disk uses a pcie slot in the virtual machine and there is a maximum of 32 slots. This means there is a low limit on the maximum number of disks you can hotplug especially given that other things will also need pcie slots. Another issue is these slots need to be reserved ahead of time. So if the number of hotplugged disks is not known ahead of time, it is impossible to properly reserve the required number of slots. To work around this issue, each VM has a virtio-scsi controller, which allows the use of a scsi
bus for hotplugged disks. This controller allows for hotplugging of over 4 million disks. virtio-scsi
is very close in performance to virtio
You can change the serial of the disk by specifying the --serial parameter, for example:
$ virtctl addvolume vmi-fedora --volume-name=example-volume-hotplug --serial=1234567890\n
The serial will be used in the guest so you can identify the disk inside the guest by the serial. For instance in Fedora the disk by id will contain the serial.
$ virtctl console vmi-fedora\n\nFedora 32 (Cloud Edition)\nKernel 5.6.6-300.fc32.x86_64 on an x86_64 (ttyS0)\n\nSSH host key: SHA256:c8ik1A9F4E7AxVrd6eE3vMNOcMcp6qBxsf8K30oC/C8 (ECDSA)\nSSH host key: SHA256:fOAKptNAH2NWGo2XhkaEtFHvOMfypv2t6KIPANev090 (ED25519)\neth0: 10.244.196.144 fe80::d8b7:51ff:fec4:7099\nvmi-fedora login:fedora\nPassword:fedora\n[fedora@vmi-fedora ~]$ ls /dev/disk/by-id\nscsi-0QEMU_QEMU_HARDDISK_1234567890\n[fedora@vmi-fedora ~]$ \n
As you can see the serial is part of the disk name, so you can uniquely identify it. The format and length of serials are specified according to the libvirt documentation:
If present, this specify serial number of virtual hard drive. For example, it may look like <serial>WD-WMAP9A966149</serial>. Not supported for scsi-block devices, that is those using disk type 'block' using device 'lun' on bus 'scsi'. Since 0.7.1\n\n Note that depending on hypervisor and device type the serial number may be truncated silently. IDE/SATA devices are commonly limited to 20 characters. SCSI devices depending on hypervisor version are limited to 20, 36 or 247 characters.\n\n Hypervisors may also start rejecting overly long serials instead of truncating them in the future so it's advised to avoid the implicit truncation by testing the desired serial length range with the desired device and hypervisor combination.\n
"},{"location":"storage/hotplug_volumes/#supported-disk-types","title":"Supported Disk types","text":"Kubevirt supports hotplugging disk devices of type disk and lun. As with other volumes, using type disk
will expose the hotplugged volume as a regular disk, while using lun
allows additional functionalities like the execution of iSCSI commands.
You can specify the desired type by using the --disk-type parameter, for example:
# Allowed values are lun and disk. If no option is specified, we use disk by default.\n$ virtctl addvolume vmi-fedora --volume-name=example-lun-hotplug --disk-type=lun\n
"},{"location":"storage/hotplug_volumes/#retain-hotplugged-volumes-after-restart","title":"Retain hotplugged volumes after restart","text":"In many cases it is desirable to keep hotplugged volumes after a VM restart. It may also be desirable to be able to unplug these volumes after the restart. The persist
option makes it impossible to unplug the disks after a restart. If you don't specify persist
the default behaviour is to retain hotplugged volumes as hotplugged volumes after a VM restart. This makes the persist
flag mostly obsolete unless you want to make a volume permanent on restart.
In some cases you want a hotplugged volume to become part of the standard disks after a restart of the VM. For instance if you added some permanent storage to the VM. We also assume that the running VMI has a matching VM that defines it specification. You can call the addvolume command with the --persist flag. This will update the VM domain disks section in addition to updating the VMI domain disks. This means that when you restart the VM, the disk is already defined in the VM, and thus in the new VMI.
$ virtctl addvolume vm-fedora --volume-name=example-volume-hotplug --persist\n
In the VM spec this will now show as a new disk
spec:\ndomain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n bus: scsi\n name: example-volume-hotplug\n machine:\n type: \"\"\n
"},{"location":"storage/hotplug_volumes/#removevolume","title":"Removevolume","text":"In addition to hotplug plugging the volume, you can also unplug it by using the 'removevolume' command available with virtctl
$ virtctl removevolume vmi-fedora --volume-name=example-volume-hotplug\n
NOTE You can only unplug volumes that were dynamically added with addvolume, or using the API.
"},{"location":"storage/hotplug_volumes/#volumestatus","title":"VolumeStatus","text":"VMI objects have a new status.VolumeStatus
field. This is an array containing each disk, hotplugged or not. For example, after hotplugging the volume in the addvolume example, the VMI status will contain this:
volumeStatus:\n- name: cloudinitdisk\n target: vdb\n- name: containerdisk\n target: vda\n- hotplugVolume:\n attachPodName: hp-volume-7fmz4\n attachPodUID: 62a7f6bf-474c-4e25-8db5-1db9725f0ed2\n message: Successfully attach hotplugged volume volume-hotplug to VM\n name: example-volume-hotplug\n phase: Ready\n reason: VolumeReady\n target: sda\n
Vda is the container disk that contains the Fedora OS, vdb is the cloudinit disk. As you can see those just contain the name and target used when assigning them to the VM. The target is the value passed to QEMU when specifying the disks. The value is unique for the VM and does NOT represent the naming inside the guest. For instance for a Windows Guest OS the target has no meaning. The same will be true for hotplugged volumes. The target is just a unique identifier meant for QEMU, inside the guest the disk can be assigned a different name. The hotplugVolume has some extra information that regular volume statuses do not have. The attachPodName is the name of the pod that was used to attach the volume to the node the VMI is running on. If this pod is deleted it will also stop the VMI as we cannot guarantee the volume will remain attached to the node. The other fields are similar to conditions and indicate the status of the hot plug process. Once a Volume is ready it can be used by the VM.
"},{"location":"storage/hotplug_volumes/#live-migration","title":"Live Migration","text":"Currently Live Migration is enabled for any VMI that has volumes hotplugged into it.
NOTE However there is a known issue that the migration may fail for VMIs with hotplugged block volumes if the target node uses CPU manager with static policy and runc
prior to version v1.0.0
.
The snapshot.kubevirt.io
API Group defines resources for snapshotting and restoring KubeVirt VirtualMachines
KubeVirt leverages the VolumeSnapshot
functionality of Kubernetes CSI drivers for capturing persistent VirtualMachine
state. So, you should make sure that your VirtualMachine
uses DataVolumes
or PersistentVolumeClaims
backed by a StorageClass
that supports VolumeSnapshots
and a VolumeSnapshotClass
is properly configured for that StorageClass
.
KubeVirt looks for Kubernetes Volume Snapshot related APIs/resources in the v1
version. To make sure that KubeVirt's snapshot controller is able to snapshot the VirtualMachine and referenced volumes as expected, Kubernetes Volume Snapshot APIs must be served from v1
version.
To list VolumeSnapshotClasses
:
kubectl get volumesnapshotclass\n
Make sure the provisioner
property of your StorageClass
matches the driver
property of the VolumeSnapshotClass
Even if you have no VolumeSnapshotClasses
in your cluster, VirtualMachineSnapshots
are not totally useless. They will still backup your VirtualMachine
configuration.
Snapshot/Restore support must be enabled in the feature gates to be supported. The feature gates field in the KubeVirt CR must be expanded by adding the Snapshot
to it.
Snapshotting a virtualMachine is supported for online and offline vms.
When snapshotting a running vm the controller will check for qemu guest agent in the vm. If the agent exists it will freeze the vm filesystems before taking the snapshot and unfreeze after the snapshot. It is recommended to take online snapshots with the guest agent for a better snapshot, if not present a best effort snapshot will be taken.
Note To check if your vm has a qemu-guest-agent check for 'AgentConnected' in the vm status.
There will be an indication in the vmSnapshot status if the snapshot was taken online and with or without guest agent participation.
Note Online snapshot with hotplugged disks is supported, only persistent hotplugged disks will be included in the snapshot.
To snapshot a VirtualMachine
named larry
, apply the following yaml.
apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineSnapshot\nmetadata:\n name: snap-larry\nspec:\n source:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: larry\n
To wait for a snapshot to complete, execute:
kubectl wait vmsnapshot snap-larry --for condition=Ready\n
You can check the vmSnapshot phase in the vmSnapshot status. It can be one of the following:
The vmSnapshot has a default deadline of 5 minutes. If the vmSnapshot has not succeessfully completed before the deadline, it will be marked as Failed. The VM will be unfrozen and the created snapshot content will be cleaned up if necessary. The vmSnapshot object will remain in Failed state until deleted by the user. To change the default deadline add 'FailureDeadline' to the VirtualMachineSnapshot spec with a new value. The allowed format is a duration string which is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as \"300ms\", \"-1.5h\" or \"2h45m\"
apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineSnapshot\nmetadata:\n name: snap-larry\nspec:\n source:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: larry\n failureDeadline: 1m\n
In order to set an infinite deadline you can set it to 0 (not recommended).
"},{"location":"storage/snapshot_restore_api/#restoring-a-virtualmachine","title":"Restoring a VirtualMachine","text":"To restore the VirtualMachine
larry
from VirtualMachineSnapshot
snap-larry
, Stop the VM, wait for it to be stopped and then apply the following yaml.
apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineRestore\nmetadata:\n name: restore-larry\nspec:\n target:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: larry\n virtualMachineSnapshotName: snap-larry\n
To wait for a restore to complete, execute:
kubectl wait vmrestore restore-larry --for condition=Ready\n
"},{"location":"storage/snapshot_restore_api/#cleanup","title":"Cleanup","text":"Keep VirtualMachineSnapshots
(and their corresponding VirtualMachineSnapshotContents
) around as long as you may want to restore from them again.
Feel free to delete restore-larry
as it is not needed once the restore is complete.
Once a virtual machine is started you are able to connect to the consoles it exposes. Usually there are two types of consoles:
Note: You need to have virtctl
installed to gain access to the VirtualMachineInstance.
The serial console of a virtual machine can be accessed by using the console
command:
virtctl console testvm\n
"},{"location":"user_workloads/accessing_virtual_machines/#accessing-the-graphical-console-vnc","title":"Accessing the Graphical Console (VNC)","text":"To access the graphical console of a virtual machine the VNC protocol is typically used. This requires remote-viewer
to be installed. Once the tool is installed, you can access the graphical console using:
virtctl vnc testvm\n
If you only want to open a vnc-proxy without executing the remote-viewer
command, it can be accomplished with:
virtctl vnc --proxy-only testvm\n
This would print the port number on your machine where you can manually connect using any VNC viewer.
"},{"location":"user_workloads/accessing_virtual_machines/#debugging-console-access","title":"Debugging console access","text":"If the connection fails, you can use the -v
flag to get more verbose output from both virtctl
and the remote-viewer
tool to troubleshoot the problem.
virtctl vnc testvm -v 4\n
Note: If you are using virtctl via SSH on a remote machine, you need to forward the X session to your machine. Look up the -X and -Y flags of ssh
if you are not familiar with that. As an alternative you can proxy the API server port with SSH to your machine (either direct or in combination with kubectl proxy
).
A common operational pattern used when managing virtual machines is to inject SSH public keys into the virtual machines at boot. This allows automation tools (like Ansible) to provision the virtual machine. It also gives operators a way of gaining secure and passwordless access to a virtual machine.
KubeVirt provides multiple ways to inject SSH public keys into a virtual machine.
In general, these methods fall into two categories: - Static key injection, which places keys on the virtual machine the first time it is booted. - Dynamic key injection, which allows keys to be dynamically updated both at boot and during runtime.
Once a SSH public key is injected into the virtual machine, it can be accessed via virtctl
.
Users creating virtual machines can provide startup scripts to their virtual machines, allowing multiple customization operations.
One option for injecting public SSH keys into a VM is via cloud-init startup script. However, there are more flexible options available.
The virtual machine's access credential API allows statically injecting SSH public keys at startup time independently of the cloud-init user data by placing the SSH public key into a Kubernetes Secret
. This allows keeping the application data in the cloud-init user data separate from the credentials used to access the virtual machine.
A Kubernetes Secret
can be created from an SSH public key like this:
# Place SSH public key into a Secret\nkubectl create secret generic my-pub-key --from-file=key1=id_rsa.pub\n
The Secret
containing the public key is then assigned to a virtual machine using the access credentials API with the noCloud
propagation method.
KubeVirt injects the SSH public key into the virtual machine by using the generated cloud-init metadata instead of the user data. This separates the application user data and user credentials.
Note: The cloud-init userData
is not touched.
# Create a VM referencing the Secret using propagation method noCloud\nkubectl create -f - <<EOF\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: testvm\nspec:\n running: true\n template:\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n accessCredentials:\n - sshPublicKey:\n source:\n secret:\n secretName: my-pub-key\n propagationMethod:\n noCloud: {}\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\nEOF\n
"},{"location":"user_workloads/accessing_virtual_machines/#dynamic-ssh-public-key-injection-via-qemu-guest-agent","title":"Dynamic SSH public key injection via qemu-guest-agent","text":"KubeVirt allows the dynamic injection of SSH public keys into a VirtualMachine with the access credentials API.
Utilizing the qemuGuestAgent
propagation method, configured Secrets are attached to a VirtualMachine when the VM is started. This allows for dynamic injection of SSH public keys at runtime by updating the attached Secrets.
Please note that new Secrets cannot be attached to a running VM: You must restart the VM to attach the new Secret.
Note: This requires the qemu-guest-agent to be installed within the guest.
Note: When using qemuGuestAgent propagation, the /home/$USER/.ssh/authorized_keys
file will be owned by the guest agent. Changes to the file not made by the guest agent will be lost.
Note: More information about the motivation behind the access credentials API can be found in the pull request description that introduced the API.
In the example below the Secret
containing the SSH public key is attached to the virtual machine via the access credentials API with the qemuGuestAgent
propagation method. This allows updating the contents of the Secret
at any time, which will result in the changes getting applied to the running virtual machine immediately. The Secret
may also contain multiple SSH public keys.
# Place SSH public key into a secret\nkubectl create secret generic my-pub-key --from-file=key1=id_rsa.pub\n
Now reference this secret in the VirtualMachine
spec with the access credentials API using qemuGuestAgent
propagation.
# Create a VM referencing the Secret using propagation method qemuGuestAgent\nkubectl create -f - <<EOF\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: testvm\nspec:\n running: true\n template:\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n accessCredentials:\n - sshPublicKey:\n source:\n secret:\n secretName: my-pub-key\n propagationMethod:\n qemuGuestAgent:\n users:\n - fedora\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n # Disable SELinux for now, so qemu-guest-agent can write the authorized_keys file\n # The selinux-policy is too restrictive currently, see open bugs:\n # - https://bugzilla.redhat.com/show_bug.cgi?id=1917024\n # - https://bugzilla.redhat.com/show_bug.cgi?id=2028762\n # - https://bugzilla.redhat.com/show_bug.cgi?id=2057310\n bootcmd:\n - setenforce 0\n name: cloudinitdisk\nEOF\n
"},{"location":"user_workloads/accessing_virtual_machines/#accessing-the-vmi-using-virtctl","title":"Accessing the VMI using virtctl","text":"The user can create a websocket backed network tunnel to a port inside the instance by using the virtualmachineinstances/portforward
subresource of the VirtualMachineInstance
.
One use-case for this subresource is to forward SSH traffic into the VirtualMachineInstance
either from the CLI or a web-UI.
To connect to a VirtualMachineInstance
from your local machine, virtctl
provides a lightweight SSH client with the ssh
command, that uses port forwarding. Refer to the command's help for more details.
virtctl ssh\n
To transfer files from or to a VirtualMachineInstance
virtctl
also provides a lightweight SCP client with the scp
command. Its usage is similar to the ssh
command. Refer to the command's help for more details.
virtctl scp\n
"},{"location":"user_workloads/accessing_virtual_machines/#using-virtctl-as-proxy","title":"Using virtctl as proxy","text":"If you prefer to use your local OpenSSH client, there are two ways of doing that in combination with virtctl.
Note: Most of this applies to the virtctl scp
command too.
virtctl ssh
command has a --local-ssh
option. With this option virtctl
wraps the local OpenSSH client transparently to the user. The executed SSH command can be viewed by increasing the verbosity (-v 3
).virtctl ssh --local-ssh -v 3 testvm\n
virtctl port-forward
command provides an option to tunnel a single port to your local stdout/stdin. This allows the command to be used in combination with the OpenSSH client's ProxyCommand
option.ssh -o 'ProxyCommand=virtctl port-forward --stdio=true vmi/testvm.mynamespace 22' fedora@testvm.mynamespace\n
To provide easier access to arbitrary virtual machines you can add the following lines to your SSH config
:
Host vmi/*\n ProxyCommand virtctl port-forward --stdio=true %h %p\nHost vm/*\n ProxyCommand virtctl port-forward --stdio=true %h %p\n
This allows you to simply call ssh user@vmi/testvmi.mynamespace
and your SSH config and virtctl will do the rest. Using this method it becomes easy to set up different identities for different namespaces inside your SSH config
.
This feature can also be used with Ansible to automate configuration of virtual machines running on KubeVirt. You can put the snippet above into its own file (e.g. ~/.ssh/virtctl-proxy-config
) and add the following lines to your .ansible.cfg
:
[ssh_connection]\nssh_args = -F ~/.ssh/virtctl-proxy-config\n
Note that all port forwarding traffic will be sent over the Kubernetes control plane. A high amount of connections and traffic can increase pressure on the API server. If you regularly need a high amount of connections and traffic consider using a dedicated Kubernetes Service
instead.
Create virtual machine and inject SSH public key as explained above
SSH into virtual machine
# Add --local-ssh to transparently use local OpenSSH client\nvirtctl ssh -i id_rsa fedora@testvm\n
or
ssh -o 'ProxyCommand=virtctl port-forward --stdio=true vmi/testvm.mynamespace 22' -i id_rsa fedora@vmi/testvm.mynamespace\n
# Add --local-ssh to transparently use local OpenSSH client\nvirtctl scp -i id_rsa testfile fedora@testvm:/tmp\n
or
scp -o 'ProxyCommand=virtctl port-forward --stdio=true vmi/testvm.mynamespace 22' -i id_rsa testfile fedora@testvm.mynamespace:/tmp\n
"},{"location":"user_workloads/accessing_virtual_machines/#rbac-permissions-for-consolevncssh-access","title":"RBAC permissions for Console/VNC/SSH access","text":""},{"location":"user_workloads/accessing_virtual_machines/#using-default-rbac-cluster-roles","title":"Using default RBAC cluster roles","text":"Every KubeVirt installation starting with version v0.5.1 ships a set of default RBAC cluster roles that can be used to grant users access to VirtualMachineInstances.
The kubevirt.io:admin
and kubevirt.io:edit
cluster roles have console, VNC and SSH respectively port-forwarding access permissions built into them. By binding either of these roles to a user, they will have the ability to use virtctl to access the console, VNC and SSH.
The default KubeVirt cluster roles grant access to more than just the console, VNC and port-forwarding. The ClusterRole
below demonstrates how to craft a custom role, that only allows access to the console, VNC and port-forwarding.
apiVersion: rbac.authorization.k8s.io/v1beta1\nkind: ClusterRole\nmetadata:\n name: allow-console-vnc-port-forward-access\nrules:\n - apiGroups:\n - subresources.kubevirt.io\n resources:\n - virtualmachineinstances/console\n - virtualmachineinstances/vnc\n verbs:\n - get\n - apiGroups:\n - subresources.kubevirt.io\n resources:\n - virtualmachineinstances/portforward\n verbs:\n - update\n
When bound with a ClusterRoleBinding
the ClusterRole
above grants access to virtual machines across all namespaces.
In order to reduce the scope to a single namespace, bind this ClusterRole
using a RoleBinding
that targets a single namespace.
Using KubeVirt should be fairly natural if you are used to working with Kubernetes.
The primary way of using KubeVirt is by working with the KubeVirt kinds in the Kubernetes API:
$ kubectl create -f vmi.yaml\n$ kubectl wait --for=condition=Ready vmis/my-vmi\n$ kubectl get vmis\n$ kubectl delete vmis testvmi\n
The following pages describe how to use and discover the API, manage, and access virtual machines.
"},{"location":"user_workloads/basic_use/#user-interface","title":"User Interface","text":"KubeVirt does not come with a UI, it is only extending the Kubernetes API with virtualization functionality.
"},{"location":"user_workloads/boot_from_external_source/","title":"Booting From External Source","text":"When installing a new guest virtual machine OS, it is often useful to boot directly from a kernel and initrd stored in the host physical machine OS, allowing command line arguments to be passed directly to the installer.
Booting from an external source is supported in Kubevirt starting from version v0.42.0-rc.0. This enables the capability to define a Virtual Machine that will use a custom kernel / initrd binary, with possible custom arguments, during its boot process.
The binaries are provided though a container image. The container is pulled from the container registry and resides on the local node hosting the VMs.
"},{"location":"user_workloads/boot_from_external_source/#use-cases","title":"Use cases","text":"Some use cases for this may be: - For a kernel developer it may be very convenient to launch VMs that are defined to boot from the latest kernel binary that is often being changed. - Initrd can be set with files that need to reside on-memory during all the VM's life-cycle.
"},{"location":"user_workloads/boot_from_external_source/#workflow","title":"Workflow","text":"Defining an external boot source can be done in the following way:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: ext-kernel-boot-vm\nspec:\n runStrategy: Manual\n template:\n spec:\n domain:\n devices: {}\n firmware:\n kernelBoot:\n container:\n image: vmi_ext_boot/kernel_initrd_binaries_container:latest\n initrdPath: /boot/initramfs-virt\n kernelPath: /boot/vmlinuz-virt\n imagePullPolicy: Always\n imagePullSecret: IfNotPresent\n kernelArgs: console=ttyS0\n resources:\n requests:\n memory: 1Gi\n
Notes:
initrdPath
and kernelPath
define the path for the binaries inside the container.
Kernel and Initrd binaries must be owned by qemu
user & group.
To change ownership: chown qemu:qemu <binary>
when <binary>
is the binary file.
kernelArgs
can only be provided if a kernel binary is provided (i.e. kernelPath
not defined). These arguments will be passed to the default kernel the VM boots from.
imagePullSecret
and imagePullPolicy
are optional
if imagePullPolicy
is Always
and the container image is updated then the VM will be booted into the new kernel when VM restarts
All KubeVirt system-components expose Prometheus metrics at their /metrics
REST endpoint.
You can consult the complete and up-to-date metric list at kubevirt/monitoring.
"},{"location":"user_workloads/component_monitoring/#custom-service-discovery","title":"Custom Service Discovery","text":"Prometheus supports service discovery based on Pods and Endpoints out of the box. Both can be used to discover KubeVirt services.
All Pods which expose metrics are labeled with prometheus.kubevirt.io
and contain a port-definition which is called metrics
. In the KubeVirt release-manifests, the default metrics
port is 8443
.
The above labels and port informations are collected by a Service
called kubevirt-prometheus-metrics
. Kubernetes automatically creates a corresponding Endpoint
with an equal name:
$ kubectl get endpoints -n kubevirt kubevirt-prometheus-metrics -o yaml\napiVersion: v1\nkind: Endpoints\nmetadata:\n labels:\n kubevirt.io: \"\"\n prometheus.kubevirt.io: \"\"\n name: kubevirt-prometheus-metrics\n namespace: kubevirt\nsubsets:\n- addresses:\n - ip: 10.244.0.5\n nodeName: node01\n targetRef:\n kind: Pod\n name: virt-handler-cjzg6\n namespace: kubevirt\n resourceVersion: \"4891\"\n uid: c67331f9-bfcf-11e8-bc54-525500d15501\n - ip: 10.244.0.6\n [...]\n ports:\n - name: metrics\n port: 8443\n protocol: TCP\n
By watching this endpoint for added and removed IPs to subsets.addresses
and appending the metrics
port from subsets.ports
, it is possible to always get a complete list of ready-to-be-scraped Prometheus targets.
The prometheus-operator can make use of the kubevirt-prometheus-metrics
service to automatically create the appropriate Prometheus config.
KubeVirt's virt-operator
checks if the ServiceMonitor
custom resource exists when creating an install strategy for deployment. KubeVirt will automatically create a ServiceMonitor
resource in the monitorNamespace
, as well as an appropriate role and rolebinding in KubeVirt's namespace.
Three settings are exposed in the KubeVirt
custom resource to direct KubeVirt to create these resources correctly:
monitorNamespace
: The namespace that prometheus-operator runs in. Defaults to openshift-monitoring
.
monitorAccount
: The serviceAccount that prometheus-operator runs with. Defaults to prometheus-k8s
.
serviceMonitorNamespace
: The namespace that the serviceMonitor runs in. Defaults to be monitorNamespace
Please note that if you decide to set serviceMonitorNamespace
than this namespace must be included in serviceMonitorNamespaceSelector
field of Prometheus spec.
If the prometheus-operator for a given deployment uses these defaults, then these values can be omitted.
An example of the KubeVirt resource depicting these default values:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\nspec:\n monitorNamespace: openshift-monitoring\n monitorAccount: prometheus-k8s\n
"},{"location":"user_workloads/component_monitoring/#integrating-with-the-okd-cluster-monitoring-operator","title":"Integrating with the OKD cluster-monitoring-operator","text":"After the cluster-monitoring-operator is up and running, KubeVirt will detect the existence of the ServiceMonitor
resource. Because the definition contains the openshift.io/cluster-monitoring
label, it will automatically be picked up by the cluster monitor.
The endpoints report metrics related to the runtime behaviour of the Virtual Machines. All the relevant metrics are prefixed with kubevirt_vmi
.
The metrics have labels that allow to connect to the VMI objects they refer to. At minimum, the labels will expose node
, name
and namespace
of the related VMI object.
For example, reported metrics could look like
kubevirt_vmi_memory_resident_bytes{domain=\"default_vm-test-01\",name=\"vm-test-01\",namespace=\"default\",node=\"node01\"} 2.5595904e+07\nkubevirt_vmi_network_traffic_bytes_total{domain=\"default_vm-test-01\",interface=\"vnet0\",name=\"vm-test-01\",namespace=\"default\",node=\"node01\",type=\"rx\"} 8431\nkubevirt_vmi_network_traffic_bytes_total{domain=\"default_vm-test-01\",interface=\"vnet0\",name=\"vm-test-01\",namespace=\"default\",node=\"node01\",type=\"tx\"} 1835\nkubevirt_vmi_vcpu_seconds_total{domain=\"default_vm-test-01\",id=\"0\",name=\"vm-test-01\",namespace=\"default\",node=\"node01\",state=\"1\"} 19\n
Please note the domain
label in the above example. This label is deprecated and it will be removed in a future release. You should identify the VMI using the node
, namespace
, name
labels instead.
Use the following query to get a counter for all REST call which indicate connection issues:
rest_client_requests_total{code=\"<error>\"}\n
If this counter is continuously increasing, it is an indicator that the corresponding KubeVirt component has general issues to connect to the apiserver
"},{"location":"user_workloads/creating_vms/","title":"Creating VirtualMachines","text":"The virtctl sub command create vm
allows easy creation of VirtualMachine manifests from the command line. It leverages instance types and preferences and inference by default (see Specifying or inferring instance types and preferences) and provides several flags to control details of the created virtual machine.
For example there are flags to specify the name or run strategy of a virtual machine or flags to add volumes to a virtual machine. Instance types and preferences can either be specified directly or it is possible to let KubeVirt infer those from the volume used to boot the virtual machine.
For a full set of flags and their description use the following command:
virtctl create vm -h\n
"},{"location":"user_workloads/creating_vms/#creating-virtualmachines-on-a-cluster","title":"Creating VirtualMachines on a cluster","text":"The output of virtctl create vm
can be piped into kubectl
to directly create a VirtualMachine on a cluster, e.g.:
# Create a VM with name my-vm on the cluster\nvirtctl create vm --name my-vm | kubectl create -f -\nvirtualmachine.kubevirt.io/my-vm created\n
"},{"location":"user_workloads/creating_vms/#creating-instance-types","title":"Creating Instance Types","text":"The virtctl subcommand create instancetype
allows easy creation of an instance type manifest from the command line. The command also provides several flags that can be used to create your desired manifest.
There are two required flags that need to be specified: the number of vCPUs and the amount of memory to be requested. Additionally, there are several optional flags that can be used, such as specifying a list of GPUs for passthrough, choosing the desired IOThreadsPolicy, or simply providing the name of our instance type.
By default, the command creates the cluster-wide resource. If the user wants to create the namespaced version, they need to provide the namespaced flag. The namespace name can be specified by using the namespace flag.
For a complete list of flags and their descriptions, use the following command:
virtctl create instancetype -h\n
"},{"location":"user_workloads/creating_vms/#examples","title":"Examples","text":"Create a manifest for a VirtualMachineClusterInstancetype with the required --cpu and --memory flags
virtctl create instancetype --cpu 2 --memory 256Mi\n
Create a manifest for a VirtualMachineInstancetype with a specified namespace
virtctl create instancetype --cpu 2 --memory 256Mi --namespace my-namespace\n
Create a manifest for a VirtualMachineInstancetype without a specified namespace name
virtctl create instancetype --cpu 2 --memory 256Mi --namespaced\n
"},{"location":"user_workloads/creating_vms/#creating-preferences","title":"Creating Preferences","text":"The virtctl subcommand create preference
allows easy creation of a preference manifest from the command line. This command serves as a starting point to create the basic structure of a manifest, as it does not allow specifying all of the options that are supported in preferences.
The current set of flags allows us, for example, to specify the preferred CPU topology, machine type or a storage class.
By default, the command creates the cluster-wide resource. If the user wants to create the namespaced version, they need to provide the namespaced flag. The namespace name can be specified by using the namespace flag.
For a complete list of flags and their descriptions, use the following command:
virtctl create preference -h\n
"},{"location":"user_workloads/creating_vms/#examples_1","title":"Examples","text":"Create a manifest for a VirtualMachineClusterPreference with a preferred cpu topology
virtctl create preference --cpu-topology preferSockets\n
Create a manifest for a VirtualMachinePreference with a specified namespace
virtctl create preference --namespace my-namespace\n
Create a manifest for a VirtualMachinePreference with the preferred storage class
virtctl create preference --namespaced --volume-storage-class my-storage\n
"},{"location":"user_workloads/creating_vms/#specifying-or-inferring-instance-types-and-preferences","title":"Specifying or inferring instance types and preferences","text":"Instance types and preference can be specified with the appropriate flags, e.g.:
virtctl create vm --instancetype my-instancetype --preference my-preference\n
The type of the instance type or preference (namespaced or cluster scope) can be controlled by prefixing the instance type or preference name with the corresponding CRD name, e.g.:
# Using a cluster scoped instancetype and a namespaced preference\nvirtctl create vm \\\n --instancetype virtualmachineclusterinstancetype/my-instancetype \\\n --preference virtualmachinepreference/my-preference\n
If a prefix was not supplied the cluster scoped resources will be used by default.
To explicitly infer instance types and/or preferences from the volume used to boot the virtual machine add the following flags:
virtctl create vm --infer-instancetype --infer-preference\n
The implicit default is to always try inferring an instancetype and preference from the boot volume. This feature makes use of the IgnoreInferFromVolumeFailure
policy, which suppresses failures on inference of instancetypes and preferences. If one of the above switches was provided explicitly, then the RejectInferFromVolumeFailure
policy is used instead. This way users are made aware of potential issues during the virtual machine creation.
Please note that volumes of different kinds currently have the following fixed boot order regardless of the order their flags were specified on the command line:
If multiple volumes of the same kind were specified their order is determined by the order in which their flags were specified.
"},{"location":"user_workloads/creating_vms/#specifying-cloud-init-user-data","title":"Specifying cloud-init user data","text":"To pass cloud-init user data to virtctl it needs to be encoded into a base64 string. Here is an example how to do it:
# Put your cloud-init user data into a file.\n# This will add an authorized key to the default user.\n# To get the default username read the documentation for the cloud image\n$ cat cloud-init.txt\n#cloud-config\nssh_authorized_keys:\n - ssh-rsa AAAA...\n\n# Base64 encode the contents of the file without line wraps and store it in a variable\n$ CLOUD_INIT_USERDATA=$(base64 -w 0 cloud-init.txt)\n\n# Show the contents of the variable\n$ echo $CLOUD_INIT_USERDATA I2Nsb3VkLWNvbmZpZwpzc2hfYXV0aG9yaXplZF9rZXlzOgogIC0gc3NoLXJzYSBBQUFBLi4uCg==\n
You can now use this variable as an argument to the --cloud-init-user-data
flag:
virtctl create vm --cloud-init-user-data $CLOUD_INIT_USERDATA\n
"},{"location":"user_workloads/creating_vms/#examples_2","title":"Examples","text":"Create a manifest for a VirtualMachine with a random name:
virtctl create vm\n
Create a manifest for a VirtualMachine with a specified name and RunStrategy Always
virtctl create vm --name=my-vm --run-strategy=Always\n
Create a manifest for a VirtualMachine with a specified VirtualMachineClusterInstancetype
virtctl create vm --instancetype=my-instancetype\n
Create a manifest for a VirtualMachine with a specified VirtualMachineInstancetype (namespaced)
virtctl create vm --instancetype=virtualmachineinstancetype/my-instancetype\n
Create a manifest for a VirtualMachine with a specified VirtualMachineClusterPreference
virtctl create vm --preference=my-preference\n
Create a manifest for a VirtualMachine with a specified VirtualMachinePreference (namespaced)
virtctl create vm --preference=virtualmachinepreference/my-preference\n
Create a manifest for a VirtualMachine with an ephemeral containerdisk volume
virtctl create vm --volume-containerdisk=src:my.registry/my-image:my-tag\n
Create a manifest for a VirtualMachine with a cloned DataSource in namespace and specified size
virtctl create vm --volume-datasource=src:my-ns/my-ds,size:50Gi\n
Create a manifest for a VirtualMachine with a cloned DataSource and inferred instancetype and preference
virtctl create vm --volume-datasource=src:my-annotated-ds --infer-instancetype --infer-preference\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and cloned PVC
virtctl create vm --volume-clone-pvc=my-ns/my-pvc\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and directly used PVC
virtctl create vm --volume-pvc=my-pvc\n
Create a manifest for a VirtualMachine with a clone DataSource and a blank volume
virtctl create vm --volume-datasource=src:my-ns/my-ds --volume-blank=size:50Gi\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and cloned DataSource
virtctl create vm --instancetype=my-instancetype --preference=my-preference --volume-datasource=src:my-ds\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and two cloned DataSources (flag can be provided multiple times)
virtctl create vm --instancetype=my-instancetype --preference=my-preference --volume-datasource=src:my-ds1 --volume-datasource=src:my-ds2\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and directly used PVC
virtctl create vm --instancetype=my-instancetype --preference=my-preference --volume-pvc=my-pvc\n
"},{"location":"user_workloads/deploy_common_instancetypes/","title":"Deploy common-instancetypes","text":"The kubevirt/common-instancetypes
provide a set of instancetypes and preferences to help create KubeVirt VirtualMachines
.
Beginning with the 1.1 release of KubeVirt, cluster wide resources can be deployed directly through KubeVirt, without another operator. This allows deployment of a set of default instancetypes and preferences along side KubeVirt.
"},{"location":"user_workloads/deploy_common_instancetypes/#enable-automatic-deployment-of-common-instancetypes","title":"Enable automatic deployment of common-instancetypes","text":"To enable the deployment of cluster-wide common-instancetypes through the KubeVirt virt-operator
, the CommonInstancetypesDeploymentGate
feature gate needs to be enabled.
See Activating feature gates on how to enable it.
"},{"location":"user_workloads/deploy_common_instancetypes/#deploy-common-instancetypes-manually","title":"Deploy common-instancetypes manually","text":"For customization purposes or to install namespaced resources, common-instancetypes can also be deployed by hand.
To install all resources provided by the kubevirt/common-instancetypes
project without further customizations, simply apply with kustomize
enabled (-k flag):
$ kubectl apply -k https://github.com/kubevirt/common-instancetypes.git\n
Alternatively, targets for each of the available custom resource types (e.g. namespaced instancetypes) are available.
For example, to deploy VirtualMachineInstancetypes
run the following command:
$ kubectl apply -k https://github.com/kubevirt/common-instancetypes.git/VirtualMachineInstancetypes\n
"},{"location":"user_workloads/guest_agent_information/","title":"Guest Agent information","text":"Guest Agent (GA) is an optional component that can run inside of Virtual Machines. The GA provides plenty of additional runtime information about the running operating system (OS). More technical detail about available GA commands is available here.
"},{"location":"user_workloads/guest_agent_information/#guest-agent-info-in-virtual-machine-status","title":"Guest Agent info in Virtual Machine status","text":"GA presence in the Virtual Machine is signaled with a condition in the VirtualMachineInstance status. The condition tells that the GA is connected and can be used.
GA condition on VirtualMachineInstance
status:\n conditions:\n - lastProbeTime: \"2020-02-28T10:22:59Z\"\n lastTransitionTime: null\n status: \"True\"\n type: AgentConnected\n
When the GA is connected, additional OS information is shown in the status. This information comprises:
Below is the example of the information shown in the VirtualMachineInstance status.
GA info with merged into status
status:\n guestOSInfo:\n id: fedora\n kernelRelease: 4.18.16-300.fc29.x86_64\n kernelVersion: '#1 SMP Sat Oct 20 23:24:08 UTC 2018'\n name: Fedora\n prettyName: Fedora 29 (Cloud Edition)\n version: \"29\"\n versionId: \"29\"\n interfaces:\n - infoSource: domain, guest-agent\n interfaceName: eth0\n ipAddress: 10.244.0.23/24\n ipAddresses:\n - 10.244.0.23/24\n - fe80::858:aff:fef4:17/64\n mac: 0a:58:0a:f4:00:17\n name: default\n
When the Guest Agent is not present in the Virtual Machine, the Guest Agent information is not shown. No error is reported because the Guest Agent is an optional component.
The infoSource field indicates where the info is gathered from. Valid values:
The data shown in the VirtualMachineInstance status are a subset of the information available. The rest of the data is available via the REST API exposed in the Kubernetes kube-api
server.
There are three new subresources added to the VirtualMachineInstance object:
- guestosinfo\n- userlist\n- filesystemlist\n
The whole GA data is returned via guestosinfo
subresource available behind the API endpoint.
/apis/subresources.kubevirt.io/v1/namespaces/{namespace}/virtualmachineinstances/{name}/guestosinfo\n
GuestOSInfo sample data:
{\n \"fsInfo\": {\n \"disks\": [\n {\n \"diskName\": \"vda1\",\n \"fileSystemType\": \"ext4\",\n \"mountPoint\": \"/\",\n \"totalBytes\": 0,\n \"usedBytes\": 0\n }\n ]\n },\n \"guestAgentVersion\": \"2.11.2\",\n \"hostname\": \"testvmi6m5krnhdlggc9mxfsrnhlxqckgv5kqrwcwpgr5mdpv76grrk\",\n \"metadata\": {\n \"creationTimestamp\": null\n },\n \"os\": {\n \"id\": \"fedora\",\n \"kernelRelease\": \"4.18.16-300.fc29.x86_64\",\n \"kernelVersion\": \"#1 SMP Sat Oct 20 23:24:08 UTC 2018\",\n \"machine\": \"x86_64\",\n \"name\": \"Fedora\",\n \"prettyName\": \"Fedora 29 (Cloud Edition)\",\n \"version\": \"29 (Cloud Edition)\",\n \"versionId\": \"29\"\n },\n \"timezone\": \"UTC, 0\"\n}\n
Items FSInfo and UserList are capped to the max capacity of 10 items, as a precaution for VMs with thousands of users.
Full list of Filesystems is available through the subresource filesystemlist
which is available as endpoint.
/apis/subresources.kubevirt.io/v1/namespaces/{namespace}/virtualmachineinstances/{name}/filesystemlist\n
Filesystem sample data:
{\n \"items\": [\n {\n \"diskName\": \"vda1\",\n \"fileSystemType\": \"ext4\",\n \"mountPoint\": \"/\",\n \"totalBytes\": 3927900160,\n \"usedBytes\": 1029201920\n }\n ],\n \"metadata\": {}\n}\n
Full list of the Users is available through the subresource userlist
which is available as endpoint.
/apis/subresources.kubevirt.io/v1/namespaces/{namespace}/virtualmachineinstances/{name}/userlist\n
Userlist sample data:
{\n \"items\": [\n {\n \"loginTime\": 1580467675.876078,\n \"userName\": \"fedora\"\n }\n ],\n \"metadata\": {}\n}\n
User LoginTime is in fractional seconds since epoch time. It is left for the consumer to convert to the desired format.
"},{"location":"user_workloads/guest_operating_system_information/","title":"Guest Operating System Information","text":"Guest operating system identity for the VirtualMachineInstance will be provided by the label kubevirt.io/os
:
metadata:\n name: myvmi\n labels:\n kubevirt.io/os: win2k12r2\n
The kubevirt.io/os
label is based on the short OS identifier from libosinfo database. The following Short IDs are currently supported:
win2k12r2
Microsoft Windows Server 2012 R2
6.3
winnt
https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-2012-r2
"},{"location":"user_workloads/guest_operating_system_information/#use-with-presets","title":"Use with presets","text":"A VirtualMachineInstancePreset representing an operating system with a kubevirt.io/os
label could be applied on any given VirtualMachineInstance that have and match the kubevirt.io/os
label.
Default presets for the OS identifiers above are included in the current release.
"},{"location":"user_workloads/guest_operating_system_information/#windows-server-2012r2-virtualmachineinstancepreset-example","title":"Windows Server 2012R2VirtualMachineInstancePreset
Example","text":"apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: windows-server-2012r2\n selector:\n matchLabels:\n kubevirt.io/os: win2k12r2\nspec:\n domain:\n cpu:\n cores: 2\n resources:\n requests:\n memory: 2G\n features:\n acpi: {}\n apic: {}\n hyperv:\n relaxed: {}\n vapic: {}\n spinlocks:\n spinlocks: 8191\n clock:\n utc: {}\n timer:\n hpet:\n present: false\n pit:\n tickPolicy: delay\n rtc:\n tickPolicy: catchup\n hyperv: {}\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/os: win2k12r2\n name: windows2012r2\nspec:\n terminationGracePeriodSeconds: 0\n domain:\n firmware:\n uuid: 5d307ca9-b3ef-428c-8861-06e72d69f223\n devices:\n disks:\n - name: server2012r2\n disk:\n dev: vda\n volumes:\n - name: server2012r2\n persistentVolumeClaim:\n claimName: my-windows-image\n\nOnce the `VirtualMachineInstancePreset` is applied to the\n`VirtualMachineInstance`, the resulting resource would look like this:\n\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/windows-server-2012r2: kubevirt.io/v1\n labels:\n kubevirt.io/os: win2k12r2\n name: windows2012r2\nspec:\n terminationGracePeriodSeconds: 0\n domain:\n cpu:\n cores: 2\n resources:\n requests:\n memory: 2G\n features:\n acpi: {}\n apic: {}\n hyperv:\n relaxed: {}\n vapic: {}\n spinlocks:\n spinlocks: 8191\n clock:\n utc: {}\n timer:\n hpet:\n present: false\n pit:\n tickPolicy: delay\n rtc:\n tickPolicy: catchup\n hyperv: {}\n firmware:\n uuid: 5d307ca9-b3ef-428c-8861-06e72d69f223\n devices:\n disks:\n - name: server2012r2\n disk:\n dev: vda\n volumes:\n - name: server2012r2\n persistentVolumeClaim:\n claimName: my-windows-image\n
For more information see VirtualMachineInstancePresets
"},{"location":"user_workloads/guest_operating_system_information/#hyperv-optimizations","title":"HyperV optimizations","text":"KubeVirt supports quite a lot of so-called \"HyperV enlightenments\", which are optimizations for Windows Guests. Some of these optimization may require an up to date host kernel support to work properly, or to deliver the maximum performance gains.
KubeVirt can perform extra checks on the hosts before to run Hyper-V enabled VMs, to make sure the host has no known issues with Hyper-V support, properly expose all the required features and thus we can expect optimal performance. These checks are disabled by default for backward compatibility and because they depend on the node-feature-discovery and on extra configuration.
To enable strict host checking, the user may expand the featureGates
field in the KubeVirt CR by adding the HypervStrictCheck
to it.
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n ...\n configuration:\n developerConfiguration:\n featureGates:\n - \"HypervStrictCheck\"\n
Alternatively, users can edit an existing kubevirt CR:
kubectl edit kubevirt kubevirt -n kubevirt
...\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - \"HypervStrictCheck\"\n - \"CPUManager\"\n
"},{"location":"user_workloads/hook-sidecar/","title":"Hook Sidecar Container","text":""},{"location":"user_workloads/hook-sidecar/#introduction","title":"Introduction","text":"In KubeVirt, a Hook Sidecar container is a sidecar container (a secondary container that runs along with the main application container within the same Pod) used to apply customizations before the Virtual Machine is initialized. This ability is provided since configurable elements in the VMI specification do not cover all of the libvirt domain XML elements.
The sidecar containers communicate with the main container over a socket with a gRPC protocol. There are two main sidecar hooks:
onDefineDomain
: This hook helps to customize libvirt's XML and return the new XML over gRPC for the VM creation.preCloudInitIso
: This hook helps to customize the cloud-init configuration. It operates on and returns JSON formatted cloud-init data.Sidecar
feature gate","text":"Sidecar
feature gate can be enabled by following the steps mentioned in Activating feature gates.
In case of a development cluster created using kubevirtci, follow the steps mentioned in the developer doc to enable the feature gate.
"},{"location":"user_workloads/hook-sidecar/#sidecar-shim-container-image","title":"Sidecar-shim container image","text":"To run a VM with custom modifications, the sidecar-shim-image takes care of implementing the communication with the main container.
The image contains the sidecar-shim
binary built using sidecar_shim.go
which should be kept as the entrypoint of the container. This binary will search in $PATH
for binaries named after the hook names (e.g onDefineDomain
and preCloudInitIso
) and run them. Users must provide the necessary arguments as command line options (flags).
In the case of onDefineDomain
, the arguments will be the VMI information as JSON string, (e.g --vmi vmiJSON
) and the current domain XML (e.g --domain domainXML
). It outputs the modified domain XML on the standard output.
In the case of preCloudInitIso
, the arguments will be the VMI information as JSON string, (e.g --vmi vmiJSON
) and the CloudInitData (e.g --cloud-init cloudInitJSON
). It outputs the modified CloudInitData (as JSON) on the standard ouput.
Shell or python scripts can be used as alternatives to the binary, by making them available at the expected location (/usr/bin/onDefineDomain
or /usr/bin/preCloudInitIso
depending upon the hook).
A prebuilt image named sidecar-shim
capable of running Shell or Python scripts is shipped as part of KubeVirt releases.
Although a binary doesn't strictly need to be generated from Go code, and a script doesn't strictly need to be one among Shell or Python, for the purpose of this guide, we will use those as examples.
"},{"location":"user_workloads/hook-sidecar/#go-binary","title":"Go binary","text":"Example Go code modifiying the SMBIOS system information can be found in the KubeVirt repo. Binary generated from this code, when available under /usr/bin/ondefinedomain
in the sidecar-shim-image, is run right before VMI creation and the baseboard manufacturer value is modified to reflect what's provided in the smbios.vm.kubevirt.io/baseBoardManufacturer
annotation in VMI spec.
If you pefer writing a shell or python script instead of a Go program, create a Kubernetes ConfigMap and use annotations to make sure the script is run before the VMI creation. The flow would be as below:
hooks.kubevirt.io/hookSidecars
and mention the ConfigMap information in it.apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: my-config-map\ndata:\n my_script.sh: |\n #!/bin/sh\n tempFile=`mktemp --dry-run`\n echo $4 > $tempFile\n sed -i \"s|<baseBoard></baseBoard>|<baseBoard><entry name='manufacturer'>Radical Edward</entry></baseBoard>|\" $tempFile\n cat $tempFile\n
"},{"location":"user_workloads/hook-sidecar/#configmap-with-python-script","title":"ConfigMap with python script","text":"apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: my-config-map\ndata:\n my_script.sh: |\n #!/usr/bin/env python3\n\n import xml.etree.ElementTree as ET\n import sys\n\n def main(s):\n # write to a temporary file\n f = open(\"/tmp/orig.xml\", \"w\")\n f.write(s)\n f.close()\n\n # parse xml from file\n xml = ET.parse(\"/tmp/orig.xml\")\n # get the root element\n root = xml.getroot()\n # find the baseBoard element\n baseBoard = root.find(\"sysinfo\").find(\"baseBoard\")\n\n # prepare new element to be inserted into the xml definition\n element = ET.Element(\"entry\", {\"name\": \"manufacturer\"})\n element.text = \"Radical Edward\"\n # insert the element\n baseBoard.insert(0, element)\n\n # write to a new file\n xml.write(\"/tmp/new.xml\")\n # print file contents to stdout\n f = open(\"/tmp/new.xml\")\n print(f.read())\n f.close()\n\n if __name__ == \"__main__\":\n main(sys.argv[4])\n
After creating one of the above ConfigMap, create the VMI using the manifest in this example. Of importance here is the ConfigMap information stored in the annotations:
annotations:\n hooks.kubevirt.io/hookSidecars: >\n [\n {\n \"args\": [\"--version\", \"v1alpha2\"],\n \"configMap\": {\"name\": \"my-config-map\", \"key\": \"my_script.sh\", \"hookPath\": \"/usr/bin/onDefineDomain\"}\n }\n ]\n
The name
field indicates the name of the ConfigMap on the cluster which contains the script you want to execute. The key
field indicates the key in the ConfigMap which contains the script to be executed. Finally, hookPath
indicates the path where you want the script to be mounted. It could be either of /usr/bin/onDefineDomain
or /usr/bin/preCloudInitIso
depending upon the hook you want to execute. An optional value can be specified with the \"image\"
key if a custom image is needed, if omitted the default Sidecar-shim image built together with the other KubeVirt images will be used. The default Sidecar-shim image, if not override with a custom value, will also be updated as other images as for Updating KubeVirt Workloads.
Whether you used the Go binary or a Shell/Python script from the above examples, you would be able to see the newly created VMI have the modified baseboard manufacturer information. After creating the VMI, verify that it is in the Running
state, and connect to its console and see if the desired changes to baseboard manufacturer get reflected:
# Once the VM is ready, connect to its display and login using name and password \"fedora\"\ncluster/virtctl.sh vnc vmi-with-sidecar-hook-configmap\n\n# Check whether the base board manufacturer value was successfully overwritten\nsudo dmidecode -s baseboard-manufacturer\n
"},{"location":"user_workloads/instancetypes/","title":"Instance types and preferences","text":"FEATURE STATE:
instancetype.kubevirt.io/v1alpha1
(Experimental) as of the v0.56.0
KubeVirt releaseinstancetype.kubevirt.io/v1alpha2
(Experimental) as of the v0.58.0
KubeVirt releaseinstancetype.kubevirt.io/v1beta1
as of the v1.0.0
KubeVirt releaseSee the Version History section for more details.
"},{"location":"user_workloads/instancetypes/#introduction","title":"Introduction","text":"KubeVirt's VirtualMachine
API contains many advanced options for tuning the performance of a VM that goes beyond what typical users need to be aware of. Users have previously been unable to simply define the storage/network they want assigned to their VM and then declare in broad terms what quality of resources and kind of performance characteristics they need for their VM.
Instance types and preferences provide a way to define a set of resource, performance and other runtime characteristics, allowing users to reuse these definitions across multiple VirtualMachines
.
---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachineInstancetype\nmetadata:\n name: example-instancetype\nspec:\n cpu:\n guest: 1\n memory:\n guest: 128Mi\n
KubeVirt provides two CRDs
for instance types, a cluster wide VirtualMachineClusterInstancetype
and a namespaced VirtualMachineInstancetype
. These CRDs
encapsulate the following resource related characteristics of a VirtualMachine
through a shared VirtualMachineInstancetypeSpec
:
CPU
: Required number of vCPUs presented to the guestMemory
: Required amount of memory presented to the guestGPUs
: Optional list of vGPUs to passthroughHostDevices
: Optional list of HostDevices
to passthroughIOThreadsPolicy
: Optional IOThreadsPolicy
to be usedLaunchSecurity
: Optional LaunchSecurity
to be usedAnything provided within an instance type cannot be overridden within the VirtualMachine
. For example, as CPU
and Memory
are both required attributes of an instance type, if a user makes any requests for CPU
or Memory
resources within the underlying VirtualMachine
, the instance type will conflict and the request will be rejected during creation.
---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachinePreference\nmetadata:\n name: example-preference\nspec:\n devices:\n preferredDiskBus: virtio\n preferredInterfaceModel: virtio\n
KubeVirt also provides two further preference based CRDs
, again a cluster wide VirtualMachineClusterPreference
and namespaced VirtualMachinePreference
. These CRDs
encapsulate the preferred value of any remaining attributes of a VirtualMachine
required to run a given workload, again this is through a shared VirtualMachinePreferenceSpec
.
Unlike instance types, preferences only represent the preferred values and as such, they can be overridden by values in the VirtualMachine
provided by the user.
In the example shown below, a user has provided a VirtualMachine
with a disk bus already defined within a DiskTarget
and has also selected a set of preferences with DevicePreference
and preferredDiskBus
, so the user's original choice within the VirtualMachine
and DiskTarget
are used:
$ kubectl apply -f - << EOF\n---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachinePreference\nmetadata:\n name: example-preference-disk-virtio\nspec:\n devices:\n preferredDiskBus: virtio\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: example-preference-user-override\nspec:\n preference:\n kind: VirtualMachinePreference\n name: example-preference-disk-virtio\n running: false\n template:\n spec:\n domain:\n memory:\n guest: 128Mi\n devices:\n disks:\n - disk:\n bus: sata\n name: containerdisk\n - disk: {}\n name: cloudinitdisk\n resources: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/cirros-container-disk-demo:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |\n #!/bin/sh\n\n echo 'printed from cloud-init userdata'\n name: cloudinitdisk\nEOF\nvirtualmachinepreference.instancetype.kubevirt.io/example-preference-disk-virtio created\nvirtualmachine.kubevirt.io/example-preference-user-override configured\n\n\n$ virtctl start example-preference-user-override\nVM example-preference-user-override was scheduled to start\n\n# We can see the original request from the user within the VirtualMachine lists `containerdisk` with a `SATA` bus\n$ kubectl get vms/example-preference-user-override -o json | jq .spec.template.spec.domain.devices.disks\n[\n {\n \"disk\": {\n \"bus\": \"sata\"\n },\n \"name\": \"containerdisk\"\n },\n {\n \"disk\": {},\n \"name\": \"cloudinitdisk\"\n }\n]\n\n# This is still the case in the VirtualMachineInstance with the remaining disk using the `preferredDiskBus` from the preference of `virtio`\n$ kubectl get vmis/example-preference-user-override -o json | jq .spec.domain.devices.disks\n[\n {\n \"disk\": {\n \"bus\": \"sata\"\n },\n \"name\": \"containerdisk\"\n },\n {\n \"disk\": {\n \"bus\": \"virtio\"\n },\n \"name\": \"cloudinitdisk\"\n }\n]\n
"},{"location":"user_workloads/instancetypes/#virtualmachine","title":"VirtualMachine","text":"---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: example-vm\nspec:\n instancetype:\n kind: VirtualMachineInstancetype\n name: example-instancetype\n preference:\n kind: VirtualMachinePreference\n name: example-preference\n
The previous instance type and preference CRDs are matched to a given VirtualMachine
through the use of a matcher. Each matcher consists of the following:
Name
(string): Name of the resource being referencedKind
(string): Optional, defaults to the cluster wide CRD kinds of VirtualMachineClusterInstancetype
or VirtualMachineClusterPreference
if not providedRevisionName
(string) : Optional, name of a ControllerRevision
containing a copy of the VirtualMachineInstancetypeSpec
or VirtualMachinePreferenceSpec
taken when the VirtualMachine
is first created. See the Versioning section below for more details on how and why this is captured.InferFromVolume
(string): Optional, see the Inferring defaults from a Volume section below for more details.It is possible to streamline the creation of instance types, preferences, and virtual machines with the usage of the virtctl command-line tool. To read more about it, please see the Creating VirtualMachines.
"},{"location":"user_workloads/instancetypes/#versioning","title":"Versioning","text":"Versioning of these resources is required to ensure the eventual VirtualMachineInstance
created when starting a VirtualMachine
does not change between restarts if any referenced instance type or set of preferences are updated during the lifetime of the VirtualMachine
.
This is currently achieved by using ControllerRevision
to retain a copy of the VirtualMachineInstancetype
or VirtualMachinePreference
at the time the VirtualMachine
is created. A reference to these ControllerRevisions
are then retained in the InstancetypeMatcher
and PreferenceMatcher
within the VirtualMachine
for future use.
$ kubectl apply -f examples/csmall.yaml -f examples/vm-cirros-csmall.yaml\nvirtualmachineinstancetype.instancetype.kubevirt.io/csmall created\nvirtualmachine.kubevirt.io/vm-cirros-csmall created\n\n$ kubectl get vm/vm-cirros-csmall -o json | jq .spec.instancetype\n{\n \"kind\": \"VirtualMachineInstancetype\",\n \"name\": \"csmall\",\n \"revisionName\": \"vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1\"\n}\n\n$ kubectl get controllerrevision/vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1 -o json | jq .\n{\n \"apiVersion\": \"apps/v1\",\n \"data\": {\n \"apiVersion\": \"instancetype.kubevirt.io/v1beta1\",\n \"kind\": \"VirtualMachineInstancetype\",\n \"metadata\": {\n \"creationTimestamp\": \"2022-09-30T12:20:19Z\",\n \"generation\": 1,\n \"name\": \"csmall\",\n \"namespace\": \"default\",\n \"resourceVersion\": \"10303\",\n \"uid\": \"72c3a35b-6e18-487d-bebf-f73c7d4f4a40\"\n },\n \"spec\": {\n \"cpu\": {\n \"guest\": 1\n },\n \"memory\": {\n \"guest\": \"128Mi\"\n }\n }\n },\n \"kind\": \"ControllerRevision\",\n \"metadata\": {\n \"creationTimestamp\": \"2022-09-30T12:20:19Z\",\n \"name\": \"vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1\",\n \"namespace\": \"default\",\n \"ownerReferences\": [\n {\n \"apiVersion\": \"kubevirt.io/v1\",\n \"blockOwnerDeletion\": true,\n \"controller\": true,\n \"kind\": \"VirtualMachine\",\n \"name\": \"vm-cirros-csmall\",\n \"uid\": \"5216527a-1d31-4637-ad3a-b640cb9949a2\"\n }\n ],\n \"resourceVersion\": \"10307\",\n \"uid\": \"a7bc784b-4cea-45d7-8432-15418e1dd7d3\"\n },\n \"revision\": 0\n}\n\n\n$ kubectl delete vm/vm-cirros-csmall\nvirtualmachine.kubevirt.io \"vm-cirros-csmall\" deleted\n\n$ kubectl get controllerrevision/controllerrevision/vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1\nError from server (NotFound): controllerrevisions.apps \"vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1\" not found\n
Users can opt in to moving to a newer generation of an instance type or preference by removing the referenced revisionName
from the appropriate matcher within the VirtualMachine
object. This will result in fresh ControllerRevisions
being captured and used.
The following example creates a VirtualMachine
using an initial version of the csmall instance type before increasing the number of vCPUs provided by the instance type:
$ kubectl apply -f examples/csmall.yaml -f examples/vm-cirros-csmall.yaml\nvirtualmachineinstancetype.instancetype.kubevirt.io/csmall created\nvirtualmachine.kubevirt.io/vm-cirros-csmall created\n\n$ kubectl get vm/vm-cirros-csmall -o json | jq .spec.instancetype\n{\n \"kind\": \"VirtualMachineInstancetype\",\n \"name\": \"csmall\",\n \"revisionName\": \"vm-cirros-csmall-csmall-3e86e367-9cd7-4426-9507-b14c27a08671-1\"\n}\n\n$ virtctl start vm-cirros-csmall\nVM vm-cirros-csmall was scheduled to start\n\n$ kubectl get vmi/vm-cirros-csmall -o json | jq .spec.domain.cpu\n{\n \"cores\": 1,\n \"model\": \"host-model\",\n \"sockets\": 1,\n \"threads\": 1\n}\n\n$ kubectl patch VirtualMachineInstancetype/csmall --type merge -p '{\"spec\":{\"cpu\":{\"guest\":2}}}'\nvirtualmachineinstancetype.instancetype.kubevirt.io/csmall patched\n
In order for this change to be picked up within the VirtualMachine
, we need to stop the running VirtualMachine
and clear the revisionName
referenced by the InstancetypeMatcher
:
$ virtctl stop vm-cirros-csmall\nVM vm-cirros-csmall was scheduled to stop\n\n$ kubectl patch vm/vm-cirros-csmall --type merge -p '{\"spec\":{\"instancetype\":{\"revisionName\":\"\"}}}'\nvirtualmachine.kubevirt.io/vm-cirros-csmall patched\n\n$ kubectl get vm/vm-cirros-csmall -o json | jq .spec.instancetype\n{\n \"kind\": \"VirtualMachineInstancetype\",\n \"name\": \"csmall\",\n \"revisionName\": \"vm-cirros-csmall-csmall-3e86e367-9cd7-4426-9507-b14c27a08671-2\"\n}\n
As you can see above, the InstancetypeMatcher
now references a new ControllerRevision
containing generation 2 of the instance type. We can now start the VirtualMachine
again and see the new number of vCPUs being used by the VirtualMachineInstance
:
$ virtctl start vm-cirros-csmall\nVM vm-cirros-csmall was scheduled to start\n\n$ kubectl get vmi/vm-cirros-csmall -o json | jq .spec.domain.cpu\n{\n \"cores\": 1,\n \"model\": \"host-model\",\n \"sockets\": 2,\n \"threads\": 1\n}\n
"},{"location":"user_workloads/instancetypes/#inferfromvolume","title":"inferFromVolume","text":"The inferFromVolume
attribute of both the InstancetypeMatcher
and PreferenceMatcher
allows a user to request that defaults are inferred from a volume. When requested, KubeVirt will look for the following labels on the underlying PVC
, DataSource
or DataVolume
to determine the default name and kind:
instancetype.kubevirt.io/default-instancetype
instancetype.kubevirt.io/default-instancetype-kind
(optional, defaults to VirtualMachineClusterInstancetype
)instancetype.kubevirt.io/default-preference
instancetype.kubevirt.io/default-preference-kind
(optional, defaults to VirtualMachineClusterPreference
)These values are then written into the appropriate matcher by the mutation webhook and used during validation before the VirtualMachine
is formally accepted.
The validation can be controlled by the value provided to inferFromVolumeFailurePolicy
in either the InstancetypeMatcher
or PreferenceMatcher
of a VirtualMachine
.
The default value of Reject
will cause the request to be rejected on failure to find the referenced Volume
or labels on an underlying resource.
If Ignore
was provided, the respective InstancetypeMatcher
or PreferenceMatcher
will be cleared on a failure instead.
Example with implicit default value of Reject
:
$ kubectl apply -k https://github.com/kubevirt/common-instancetypes.git\n[..]\n$ virtctl image-upload pvc cirros-pvc --size=1Gi --image-path=./cirros-0.5.2-x86_64-disk.img\n[..]\n$ kubectl label pvc/cirros-pvc \\\n instancetype.kubevirt.io/default-instancetype=server.tiny \\\n instancetype.kubevirt.io/default-preference=cirros\n[..]\n$ kubectl apply -f - << EOF\n---\napiVersion: cdi.kubevirt.io/v1beta1\nkind: DataSource\nmetadata:\n name: cirros-datasource\nspec:\n source:\n pvc:\n name: cirros-pvc\n namespace: default\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: cirros\nspec:\n instancetype:\n inferFromVolume: cirros-volume\n preference:\n inferFromVolume: cirros-volume\n running: false\n dataVolumeTemplates:\n - metadata:\n name: cirros-datavolume\n spec:\n storage:\n resources:\n requests:\n storage: 1Gi\n storageClassName: local\n sourceRef:\n kind: DataSource\n name: cirros-datasource\n namespace: default\n template:\n spec:\n domain:\n devices: {}\n volumes:\n - dataVolume:\n name: cirros-datavolume\n name: cirros-volume\nEOF\n[..]\nkubectl get vms/cirros -o json | jq '.spec.instancetype, .spec.preference'\n{\n \"kind\": \"virtualmachineclusterinstancetype\",\n \"name\": \"server.tiny\",\n \"revisionName\": \"cirros-server.tiny-76454433-3d82-43df-a7e5-586e48c71f68-1\"\n}\n{\n \"kind\": \"virtualmachineclusterpreference\",\n \"name\": \"cirros\",\n \"revisionName\": \"cirros-cirros-85823ddc-9e8c-4d23-a94c-143571b5489c-1\"\n}\n
Example with explicit value of Ignore
:
$ virtctl image-upload pvc cirros-pvc --size=1Gi --image-path=./cirros-0.5.2-x86_64-disk.img\n$ kubectl apply -f - << EOF\n---\napiVersion: cdi.kubevirt.io/v1beta1\nkind: DataSource\nmetadata:\n name: cirros-datasource\nspec:\n source:\n pvc:\n name: cirros-pvc\n namespace: default\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: cirros\nspec:\n instancetype:\n inferFromVolume: cirros-volume\n inferFromVolumeFailurePolicy: Ignore\n preference:\n inferFromVolume: cirros-volume\n inferFromVolumeFailurePolicy: Ignore\n running: false\n dataVolumeTemplates:\n - metadata:\n name: cirros-datavolume\n spec:\n storage:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 1Gi\n storageClassName: local\n sourceRef:\n kind: DataSource\n name: cirros-datasource\n namespace: default\n template:\n spec:\n domain:\n devices: {}\n volumes:\n - dataVolume:\n name: cirros-datavolume\n name: cirros-volume\nEOF\n[..]\nkubectl get vms/cirros -o json | jq '.spec.instancetype, .spec.preference'\nnull\nnull\n
"},{"location":"user_workloads/instancetypes/#common-instancetypes","title":"common-instancetypes","text":"The kubevirt/common-instancetypes
provide a set of instancetypes and preferences to help create KubeVirt VirtualMachines
.
See Deploy common-instancetypes on how to deploy them.
"},{"location":"user_workloads/instancetypes/#examples","title":"Examples","text":"Various examples are available within the kubevirt
repo under /examples
. The following uses an example VirtualMachine
provided by the containerdisk/fedora
repo and replaces much of the DomainSpec
with the equivalent instance type and preferences:
$ kubectl apply -f - << EOF\n---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachineInstancetype\nmetadata:\n name: cmedium\nspec:\n cpu:\n guest: 1\n memory:\n guest: 1Gi\n---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachinePreference\nmetadata:\n name: fedora\nspec:\n devices:\n preferredDiskBus: virtio\n preferredInterfaceModel: virtio\n preferredRng: {}\n features:\n preferredAcpi: {}\n preferredSmm: {}\n firmware:\n preferredUseEfi: true\n preferredUseSecureBoot: true \n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n creationTimestamp: null\n name: fedora\nspec:\n instancetype:\n name: cmedium\n kind: virtualMachineInstancetype\n preference:\n name: fedora\n kind: virtualMachinePreference\n runStrategy: Always\n template:\n metadata:\n creationTimestamp: null\n spec:\n domain:\n devices: {}\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n ssh_authorized_keys:\n - ssh-rsa AAAA...\n name: cloudinit\nEOF\n
"},{"location":"user_workloads/instancetypes/#version-history","title":"Version History","text":""},{"location":"user_workloads/instancetypes/#instancetypekubevirtiov1alpha1-experimental","title":"instancetype.kubevirt.io/v1alpha1
(Experimental)","text":"instancetype.kubevirt.io/v1alpha2
(Experimental)","text":"This version captured complete VirtualMachine{Instancetype,ClusterInstancetype,Preference,ClusterPreference}
objects within the created ControllerRevisions
This version is backwardly compatible with instancetype.kubevirt.io/v1alpha1
.
instancetype.kubevirt.io/v1beta1
","text":"Spec.Memory.OvercommitPercent
The following preference attributes have been added:
Spec.CPU.PreferredCPUFeatures
Spec.Devices.PreferredInterfaceMasquerade
Spec.PreferredSubdomain
Spec.PreferredTerminationGracePeriodSeconds
Spec.Requirements
This version is backwardly compatible with instancetype.kubevirt.io/v1alpha1
and instancetype.kubevirt.io/v1alpha2
objects, no modifications are required to existing VirtualMachine{Instancetype,ClusterInstancetype,Preference,ClusterPreference}
or ControllerRevisions
.
As with the migration to kubevirt.io/v1
it is recommend previous users of instancetype.kubevirt.io/v1alpha1
or instancetype.kubevirt.io/v1alpha2
use kube-storage-version-migrator
to upgrade any stored objects to instancetype.kubevirt.io/v1beta1
.
Every VirtualMachineInstance
represents a single virtual machine instance. In general, the management of VirtualMachineInstances is kept similar to how Pods
are managed: Every VM that is defined in the cluster is expected to be running, just like Pods. Deleting a VirtualMachineInstance is equivalent to shutting it down, this is also equivalent to how Pods behave.
In order to start a VirtualMachineInstance, you just need to create a VirtualMachineInstance
object using kubectl
:
$ kubectl create -f vmi.yaml\n
"},{"location":"user_workloads/lifecycle/#listing-virtual-machines","title":"Listing virtual machines","text":"VirtualMachineInstances can be listed by querying for VirtualMachineInstance objects:
$ kubectl get vmis\n
"},{"location":"user_workloads/lifecycle/#retrieving-a-virtual-machine-instance-definition","title":"Retrieving a virtual machine instance definition","text":"A single VirtualMachineInstance definition can be retrieved by getting the specific VirtualMachineInstance object:
$ kubectl get vmis testvmi\n
"},{"location":"user_workloads/lifecycle/#stopping-a-virtual-machine-instance","title":"Stopping a virtual machine instance","text":"To stop the VirtualMachineInstance, you just need to delete the corresponding VirtualMachineInstance
object using kubectl
.
$ kubectl delete -f vmi.yaml\n# OR\n$ kubectl delete vmis testvmi\n
Note: Stopping a VirtualMachineInstance implies that it will be deleted from the cluster. You will not be able to start this VirtualMachineInstance object again.
"},{"location":"user_workloads/lifecycle/#starting-and-stopping-a-virtual-machine","title":"Starting and stopping a virtual machine","text":"Virtual machines, in contrast to VirtualMachineInstances, have a running state. Thus on VM you can define if it should be running, or not. VirtualMachineInstances are, if they are defined in the cluster, always running and consuming resources.
virtctl
is used in order to start and stop a VirtualMachine:
$ virtctl start my-vm\n$ virtctl stop my-vm\n
Note: You can force stop a VM (which is like pulling the power cord, with all its implications like data inconsistencies or [in the worst case] data loss) by
$ virtctl stop my-vm --grace-period 0 --force\n
"},{"location":"user_workloads/lifecycle/#pausing-and-unpausing-a-virtual-machine","title":"Pausing and unpausing a virtual machine","text":"Note: Pausing in this context refers to libvirt's virDomainSuspend
command: \"The process is frozen without further access to CPU resources and I/O but the memory used by the domain at the hypervisor level will stay allocated\"
To pause a virtual machine, you need the virtctl
command line tool. Its pause
command works on either VirtualMachine
s or VirtualMachinesInstance
s:
$ virtctl pause vm testvm\n# OR\n$ virtctl pause vmi testvm\n
Paused VMIs have a Paused
condition in their status:
$ kubectl get vmi testvm -o=jsonpath='{.status.conditions[?(@.type==\"Paused\")].message}'\nVMI was paused by user\n
Unpausing works similar to pausing:
$ virtctl unpause vm testvm\n# OR\n$ virtctl unpause vmi testvm\n
"},{"location":"user_workloads/liveness_and_readiness_probes/","title":"Liveness and Readiness Probes","text":"It is possible to configure Liveness and Readiness Probes in a similar fashion like it is possible to configure Liveness and Readiness Probes on Containers.
Liveness Probes will effectively stop the VirtualMachineInstance if they fail, which will allow higher level controllers, like VirtualMachine or VirtualMachineInstanceReplicaSet to spawn new instances, which will hopefully be responsive again.
Readiness Probes are an indicator for Services and Endpoints if the VirtualMachineInstance is ready to receive traffic from Services. If Readiness Probes fail, the VirtualMachineInstance will be removed from the Endpoints which back services until the probe recovers.
Watchdogs focus on ensuring that an Operating System is still responsive. They complement the probes which are more workload centric. Watchdogs require kernel support from the guest and additional tooling like the commonly used watchdog
binary.
Exec probes are Liveness or Readiness probes specifically intended for VMs. These probes run a command inside the VM and determine the VM ready/live state based on its success. For running commands inside the VMs, the qemu-guest-agent package is used. A command supplied to an exec probe will be wrapped by virt-probe
in the operator and forwarded to the guest.
The following VirtualMachineInstance configures a HTTP Liveness Probe via spec.livenessProbe.httpGet
, which will query port 1500 of the VirtualMachineInstance, after an initial delay of 120 seconds. The VirtualMachineInstance itself installs and runs a minimal HTTP server on port 1500 via cloud-init.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/vm: vmi-fedora-vmi\n name: vmi-fedora\nspec:\n template:\n metadata:\n labels:\n kubevirt.io/domain: vmi-fedora\n kubevirt.io/vm: vmi-fedora\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n livenessProbe:\n initialDelaySeconds: 120\n periodSeconds: 20\n httpGet:\n port: 1500\n timeoutSeconds: 10\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n bootcmd:\n - [\"sudo\", \"dnf\", \"install\", \"-y\", \"nmap-ncat\"]\n - [\"sudo\", \"systemd-run\", \"--unit=httpserver\", \"nc\", \"-klp\", \"1500\", \"-e\", '/usr/bin/echo -e HTTP/1.1 200 OK\\\\nContent-Length: 12\\\\n\\\\nHello World!']\n name: cloudinitdisk\n
"},{"location":"user_workloads/liveness_and_readiness_probes/#define-a-tcp-liveness-probe","title":"Define a TCP Liveness Probe","text":"The following VirtualMachineInstance configures a TCP Liveness Probe via spec.livenessProbe.tcpSocket
, which will query port 1500 of the VirtualMachineInstance, after an initial delay of 120 seconds. The VirtualMachineInstance itself installs and runs a minimal HTTP server on port 1500 via cloud-init.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/vm: vmi-fedora-vmi\n name: vmi-fedora\nspec:\n template:\n metadata:\n labels:\n kubevirt.io/domain: vmi-fedora\n kubevirt.io/vm: vmi-fedora\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n livenessProbe:\n initialDelaySeconds: 120\n periodSeconds: 20\n tcpSocket:\n port: 1500\n timeoutSeconds: 10\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n bootcmd:\n - [\"sudo\", \"dnf\", \"install\", \"-y\", \"nmap-ncat\"]\n - [\"sudo\", \"systemd-run\", \"--unit=httpserver\", \"nc\", \"-klp\", \"1500\", \"-e\", '/usr/bin/echo -e HTTP/1.1 200 OK\\\\nContent-Length: 12\\\\n\\\\nHello World!']\n name: cloudinitdisk\n
"},{"location":"user_workloads/liveness_and_readiness_probes/#define-readiness-probes","title":"Define Readiness Probes","text":"Readiness Probes are configured in a similar way like liveness probes. Instead of spec.livenessProbe
, spec.readinessProbe
needs to be filled:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/vm: vmi-fedora-vmi\n name: vmi-fedora\nspec:\n template:\n metadata:\n labels:\n kubevirt.io/domain: vmi-fedora\n kubevirt.io/vm: vmi-fedora\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n readinessProbe:\n initialDelaySeconds: 120\n periodSeconds: 20\n timeoutSeconds: 10\n failureThreshold: 3\n successThreshold: 3\n httpGet:\n port: 1500\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n bootcmd:\n - [\"sudo\", \"dnf\", \"install\", \"-y\", \"nmap-ncat\"]\n - [\"sudo\", \"systemd-run\", \"--unit=httpserver\", \"nc\", \"-klp\", \"1500\", \"-e\", '/usr/bin/echo -e HTTP/1.1 200 OK\\\\nContent-Length: 12\\\\n\\\\nHello World!']\n name: cloudinitdisk\n
Note that in the case of Readiness Probes, it is also possible to set a failureThreshold
and a successThreashold
to only flip between ready and non-ready state if the probe succeeded or failed multiple times.
Some context is needed to understand the limitations imposed by a dual-stack network configuration on readiness - or liveness - probes. Users must be fully aware that a dual-stack configuration is currently only available when using a masquerade binding type. Furthermore, it must be recalled that accessing a VM using masquerade binding type is performed via the pod IP address; in dual-stack mode, both IPv4 and IPv6 addresses can be used to reach the VM.
Dual-stack networking configurations have a limitation when using HTTP / TCP probes - you cannot probe the VMI by its IPv6 address. The reason for this is the host
field for both the HTTP and TCP probe actions default to the pod's IP address, which is currently always the IPv4 address.
Since the pod's IP address is not known before creating the VMI, it is not possible to pre-provision the probe's host field.
"},{"location":"user_workloads/liveness_and_readiness_probes/#defining-a-watchdog","title":"Defining a Watchdog","text":"A watchdog is a more VM centric approach where the responsiveness of the Operating System is focused on. One can configure the i6300esb
watchdog device:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-with-watchdog\n name: vmi-with-watchdog\nspec:\n domain:\n devices:\n watchdog:\n name: mywatchdog\n i6300esb:\n action: \"poweroff\"\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n bootcmd:\n - [\"sudo\", \"dnf\", \"install\", \"-y\", \"busybox\"]\n name: cloudinitdisk\n
The example above configures it with the poweroff
action. It defines what will happen if the OS can't respond anymore. Other possible actions are reset
and shutdown
. The VM in this example will have the device exposed as /dev/watchdog
. This device can then be used by the watchdog
binary. For example, if root executes this command inside the VM:
sudo busybox watchdog -t 2000ms -T 4000ms /dev/watchdog\n
the watchdog will send a heartbeat every two seconds to /dev/watchdog
and after four seconds without a heartbeat the defined action will be executed. In this case a hard poweroff
.
Guest-Agent probes are based on qemu-guest-agent guest-ping
. This will ping the guest and return an error if the guest is not up and running. To easily define this on VM spec, specify guestAgentPing: {}
in VM's spec.template.spec.readinessProbe
. virt-controller
will translate this into a corresponding command wrapped by virt-probe
.
Note: You can only define one of the type of probe, i.e. guest-agent exec or ping probes.
Important: If the qemu-guest-agent is not installed and enabled inside the VM, the probe will fail. Many images don't enable the agent by default so make sure you either run one that does or enable it.
Make sure to provide enough delay and failureThreshold for the VM and the agent to be online.
In the following example the Fedora image does have qemu-guest-agent available by default. Nevertheless, in case qemu-guest-agent is not installed, it will be installed and enabled via cloud-init as shown in the example below. Also, cloud-init assigns the proper SELinux context, i.e. virt_qemu_ga_exec_t, to the /tmp/healthy.txt
file. Otherwise, SELinux will deny the attempts to open the /tmp/healthy.txt
file causing the probe to fail.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/vm: vmi-guest-probe-vmi\n name: vmi-fedora\nspec:\n template:\n metadata:\n labels:\n kubevirt.io/domain: vmi-guest-probe\n kubevirt.io/vm: vmi-guest-probe\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n readinessProbe:\n exec:\n command: [\"cat\", \"/tmp/healthy.txt\"]\n failureThreshold: 10\n initialDelaySeconds: 20\n periodSeconds: 10\n timeoutSeconds: 5\n terminationGracePeriodSeconds: 180\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n packages:\n qemu-guest-agent\n runcmd:\n - [\"touch\", \"/tmp/healthy.txt\"]\n - [\"sudo\", \"chcon\", \"-t\", \"virt_qemu_ga_exec_t\", \"/tmp/healthy.txt\"]\n - [\"sudo\", \"systemctl\", \"enable\", \"--now\", \"qemu-guest-agent\"]\n name: cloudinitdisk\n
Note that, in the above example if SELinux is not installed in your container disk image, the command chcon
should be removed from the VM manifest shown below. Otherwise, the chcon
command will fail.
The .status.ready
field will switch to true
indicating that probes are returning successfully:
kubectl wait vmis/vmi-guest-probe --for=condition=Ready --timeout=5m\n
Additionally, the following command can be used inside the VM to watch the incoming qemu-ga commands:
journalctl _COMM=qemu-ga --follow \n
"},{"location":"user_workloads/pool/","title":"VirtualMachinePool","text":"A VirtualMachinePool tries to ensure that a specified number of VirtualMachine replicas and their respective VirtualMachineInstances are in the ready state at any time. In other words, a VirtualMachinePool makes sure that a VirtualMachine or a set of VirtualMachines is always up and ready.
No state is kept and no guarantees are made about the maximum number of VirtualMachineInstance replicas running at any time. For example, the VirtualMachinePool may decide to create new replicas if possibly still running VMs are entering an unknown state.
"},{"location":"user_workloads/pool/#using-virtualmachinepool","title":"Using VirtualMachinePool","text":"The VirtualMachinePool allows us to specify a VirtualMachineTemplate in spec.virtualMachineTemplate
. It consists of ObjectMetadata
in spec.virtualMachineTemplate.metadata
, and a VirtualMachineSpec
in spec.virtualMachineTemplate.spec
. The specification of the virtual machine is equal to the specification of the virtual machine in the VirtualMachine
workload.
spec.replicas
can be used to specify how many replicas are wanted. If unspecified, the default value is 1. This value can be updated anytime. The controller will react to the changes.
spec.selector
is used by the controller to keep track of managed virtual machines. The selector specified there must be able to match the virtual machine labels as specified in spec.virtualMachineTemplate.metadata.labels
. If the selector does not match these labels, or they are empty, the controller will simply do nothing except log an error. The user is responsible for avoiding the creation of other virtual machines or VirtualMachinePools which may conflict with the selector and the template labels.
VirtualMachinePool is part of the Kubevirt API pool.kubevirt.io/v1alpha1
.
The example below shows how to create a simple VirtualMachinePool
:
apiVersion: pool.kubevirt.io/v1alpha1\nkind: VirtualMachinePool\nmetadata:\n name: vm-pool-cirros\nspec:\n replicas: 3\n selector:\n matchLabels:\n kubevirt.io/vmpool: vm-pool-cirros\n virtualMachineTemplate:\n metadata:\n creationTimestamp: null\n labels:\n kubevirt.io/vmpool: vm-pool-cirros\n spec:\n running: true\n template:\n metadata:\n creationTimestamp: null\n labels:\n kubevirt.io/vmpool: vm-pool-cirros\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n resources:\n requests:\n memory: 128Mi\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n name: containerdisk \n
Saving this manifest into vm-pool-cirros.yaml
and submitting it to Kubernetes will create three virtual machines based on the template.
$ kubectl create -f vm-pool-cirros.yaml\nvirtualmachinepool.pool.kubevirt.io/vm-pool-cirros created\n$ kubectl describe vmpool vm-pool-cirros\nName: vm-pool-cirros\nNamespace: default\nLabels: <none>\nAnnotations: <none>\nAPI Version: pool.kubevirt.io/v1alpha1\nKind: VirtualMachinePool\nMetadata:\n Creation Timestamp: 2023-02-09T18:30:08Z\n Generation: 1\n Manager: kubectl-create\n Operation: Update\n Time: 2023-02-09T18:30:08Z\n API Version: pool.kubevirt.io/v1alpha1\n Fields Type: FieldsV1\n fieldsV1:\n f:status:\n .:\n f:labelSelector:\n f:readyReplicas:\n f:replicas:\n Manager: virt-controller\n Operation: Update\n Subresource: status\n Time: 2023-02-09T18:30:44Z\n Resource Version: 6606\n UID: ba51daf4-f99f-433c-89e5-93f39bc9989d\nSpec:\n Replicas: 3\n Selector:\n Match Labels:\n kubevirt.io/vmpool: vm-pool-cirros\n Virtual Machine Template:\n Metadata:\n Creation Timestamp: <nil>\n Labels:\n kubevirt.io/vmpool: vm-pool-cirros\n Spec:\n Running: true\n Template:\n Metadata:\n Creation Timestamp: <nil>\n Labels:\n kubevirt.io/vmpool: vm-pool-cirros\n Spec:\n Domain:\n Devices:\n Disks:\n Disk:\n Bus: virtio\n Name: containerdisk\n Resources:\n Requests:\n Memory: 128Mi\n Termination Grace Period Seconds: 0\n Volumes:\n Container Disk:\n Image: kubevirt/cirros-container-disk-demo:latest\n Name: containerdisk\nStatus:\n Label Selector: kubevirt.io/vmpool=vm-pool-cirros\n Ready Replicas: 2\n Replicas: 3\nEvents:\n Type Reason Age From Message\n ---- ------ ---- ---- -------\n Normal SuccessfulCreate 17s virtualmachinepool-controller Created VM default/vm-pool-cirros-0\n Normal SuccessfulCreate 17s virtualmachinepool-controller Created VM default/vm-pool-cirros-2\n Normal SuccessfulCreate 17s virtualmachinepool-controller Created VM default/vm-pool-cirros-1\n
Replicas
is 3
and Ready Replicas
is 2
. This means that at the moment when showing the status, three Virtual Machines were already created, but only two are running and ready.
Note: This requires KubeVirt 0.59 or newer.
The VirtualMachinePool
supports the scale
subresource. As a consequence it is possible to scale it via kubectl
:
$ kubectl scale vmpool vm-pool-cirros --replicas 5\n
"},{"location":"user_workloads/pool/#removing-a-virtualmachine-from-virtualmachinepool","title":"Removing a VirtualMachine from VirtualMachinePool","text":"It is also possible to remove a VirtualMachine
from its VirtualMachinePool
.
In this scenario, the ownerReferences
needs to be removed from the VirtualMachine
. This can be achieved either by using kubectl edit
or kubectl patch
. Using kubectl patch
it would look like:
kubectl patch vm vm-pool-cirros-0 --type merge --patch '{\"metadata\":{\"ownerReferences\":null}}'\n
Note: You may want to update your VirtualMachine labels as well to avoid impact on selectors.
"},{"location":"user_workloads/pool/#using-the-horizontal-pod-autoscaler","title":"Using the Horizontal Pod Autoscaler","text":"Note: This requires KubeVirt 0.59 or newer.
The HorizontalPodAutoscaler (HPA) can be used with a VirtualMachinePool
. Simply reference it in the spec of the autoscaler:
apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n creationTimestamp: null\n name: vm-pool-cirros\nspec:\n maxReplicas: 10\n minReplicas: 3\n scaleTargetRef:\n apiVersion: pool.kubevirt.io/v1alpha1\n kind: VirtualMachinePool\n name: vm-pool-cirros\n targetCPUUtilizationPercentage: 50\n
or use kubectl autoscale
to define the HPA via the commandline:
$ kubectl autoscale vmpool vm-pool-cirros --min=3 --max=10 --cpu-percent=50\n
"},{"location":"user_workloads/pool/#exposing-a-virtualmachinepool-as-a-service","title":"Exposing a VirtualMachinePool as a Service","text":"A VirtualMachinePool may be exposed as a service. When this is done, one of the VirtualMachine replicas will be picked for the actual delivery of the service.
For example, exposing SSH port (22) as a ClusterIP service:
apiVersion: v1\nkind: Service\nmetadata:\n name: vm-pool-cirros-ssh\nspec:\n type: ClusterIP\n selector:\n kubevirt.io/vmpool: vm-pool-cirros\n ports:\n - protocol: TCP\n port: 2222\n targetPort: 22\n
Saving this manifest into vm-pool-cirros-ssh.yaml
and submitting it to Kubernetes will create the ClusterIP
service listening on port 2222 and forwarding to port 22. See Service Objects for more details.
"},{"location":"user_workloads/pool/#using-persistent-storage","title":"Using Persistent Storage","text":"Note: DataVolumes are part of CDI
Usage of a DataVolumeTemplates
within a spec.virtualMachineTemplate.spec
will result in the creation of unique persistent storage for each VM within a VMPool. The DataVolumeTemplate
name will have the VM's sequential postfix appended to it when the VM is created from the spec.virtualMachineTemplate.spec.dataVolumeTemplates
. This makes each VM a completely unique stateful workload.
By default, any secrets or configMaps references in a spec.virtualMachineTemplate.spec.template
Volume section will be used directly as is, without any modification to the naming. This means if you specify a secret in a CloudInitNoCloud
volume, that every VM instance spawned from the VirtualMachinePool with this volume will get the exact same secret used for their cloud-init user data.
This default behavior can be modified by setting the AppendPostfixToSecretReferences
and AppendPostfixToConfigMapReferences
booleans to true on the VMPool spec. When these booleans are enabled, references to secret and configMap names will have the VM's sequential postfix appended to the secret and configmap name. This allows someone to pre-generate unique per VM secret
and configMap
data for a VirtualMachinePool ahead of time in a way that will be predictably assigned to VMs within the VirtualMachinePool.
FEATURE STATE:
VirtualMachineInstancePresets
are deprecated as of the v0.57.0
release and will be removed in a future release. VirtualMachineInstancePresets
are an extension to general VirtualMachineInstance
configuration behaving much like PodPresets
from Kubernetes. When a VirtualMachineInstance
is created, any applicable VirtualMachineInstancePresets
will be applied to the existing spec for the VirtualMachineInstance
. This allows for re-use of common settings that should apply to multiple VirtualMachineInstances
.
You can describe a VirtualMachineInstancePreset
in a YAML file. For example, the vmi-preset.yaml
file below describes a VirtualMachineInstancePreset
that requests a VirtualMachineInstance
be created with a resource request for 64M of RAM.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: small-qemu\nspec:\n selector:\n matchLabels:\n kubevirt.io/size: small\n domain:\n resources:\n requests:\n memory: 64M\n
VirtualMachineInstancePreset
based on that YAML file: kubectl create -f vmipreset.yaml\n
"},{"location":"user_workloads/presets/#required-fields","title":"Required Fields","text":"As with most Kubernetes resources, a VirtualMachineInstancePreset
requires apiVersion
, kind
and metadata
fields.
Additionally VirtualMachineInstancePresets
also need a spec
section. While not technically required to satisfy syntax, it is strongly recommended to include a Selector
in the spec
section, otherwise a VirtualMachineInstancePreset
will match all VirtualMachineInstances
in a namespace.
KubeVirt uses Kubernetes Labels
and Selectors
to determine which VirtualMachineInstancePresets
apply to a given VirtualMachineInstance
, similarly to how PodPresets
work in Kubernetes. If a setting from a VirtualMachineInstancePreset
is applied to a VirtualMachineInstance
, the VirtualMachineInstance
will be marked with an Annotation upon completion.
Any domain structure can be listed in the spec
of a VirtualMachineInstancePreset
, e.g. Clock, Features, Memory, CPU, or Devices such as network interfaces. All elements of the spec
section of a VirtualMachineInstancePreset
will be applied to the VirtualMachineInstance
.
Once a VirtualMachineInstancePreset
is successfully applied to a VirtualMachineInstance
, the VirtualMachineInstance
will be marked with an annotation to indicate that it was applied. If a conflict occurs while a VirtualMachineInstancePreset
is being applied, that portion of the VirtualMachineInstancePreset
will be skipped.
Any valid Label
can be matched against, but it is suggested that a general rule of thumb is to use os/shortname, e.g. kubevirt.io/os: rhel7
.
If a VirtualMachineInstancePreset
is modified, changes will not be applied to existing VirtualMachineInstances
. This applies to both the Selector
indicating which VirtualMachineInstances
should be matched, and also the Domain
section which lists the settings that should be applied to a VirtualMachine
.
VirtualMachineInstancePresets
use a similar conflict resolution strategy to Kubernetes PodPresets
. If a portion of the domain spec is present in both a VirtualMachineInstance
and a VirtualMachineInstancePreset
and both resources have the identical information, then creation of the VirtualMachineInstance
will continue normally. If however there is a difference between the resources, an Event will be created indicating which DomainSpec
element of which VirtualMachineInstancePreset
was overridden. For example: If both the VirtualMachineInstance
and VirtualMachineInstancePreset
define a CPU
, but use a different number of Cores
, KubeVirt will note the difference.
If any settings from the VirtualMachineInstancePreset
were successfully applied, the VirtualMachineInstance
will be annotated.
In the event that there is a difference between the Domains
of a VirtualMachineInstance
and VirtualMachineInstancePreset
, KubeVirt will create an Event
. kubectl get events
can be used to show all Events
. For example:
$ kubectl get events\n ....\n Events:\n FirstSeen LastSeen Count From SubobjectPath Reason Message\n 2m 2m 1 myvmi.1515bbb8d397f258 VirtualMachineInstance Warning Conflict virtualmachineinstance-preset-controller Unable to apply VirtualMachineInstancePreset 'example-preset': spec.cpu: &{6} != &{4}\n
"},{"location":"user_workloads/presets/#usage","title":"Usage","text":"VirtualMachineInstancePresets
are namespaced resources, so should be created in the same namespace as the VirtualMachineInstances
that will use them:
kubectl create -f <preset>.yaml [--namespace <namespace>]
KubeVirt will determine which VirtualMachineInstancePresets
apply to a Particular VirtualMachineInstance
by matching Labels
. For example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: example-preset\n selector:\n matchLabels:\n kubevirt.io/os: win10\n ...\n
would match any VirtualMachineInstance
in the same namespace with a Label
of flavor: foo
. For example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nversion: v1\nmetadata:\n name: myvmi\n labels:\n kubevirt.io/os: win10\n ...\n
"},{"location":"user_workloads/presets/#conflicts","title":"Conflicts","text":"When multiple VirtualMachineInstancePresets
match a particular VirtualMachineInstance
, if they specify the same settings within a Domain, those settings must match. If two VirtualMachineInstancePresets
have conflicting settings (e.g. for the number of CPU cores requested), an error will occur, and the VirtualMachineInstance
will enter the Failed
state, and a Warning
event will be emitted explaining which settings of which VirtualMachineInstancePresets
were problematic.
VirtualMachineInstances
","text":"The main use case for VirtualMachineInstancePresets
is to create re-usable settings that can be applied across various machines. Multiple methods are available to match the labels of a VirtualMachineInstance
using selectors.
matchLabels: Each VirtualMachineInstance
can use a specific label shared by all
instances. * matchExpressions: Logical operators for sets can be used to match multiple
labels.
Using matchLabels, the label used in the VirtualMachineInstancePreset
must match one of the labels of the VirtualMachineInstance
:
selector:\n matchLabels:\n kubevirt.io/memory: large\n
would match
metadata:\n labels:\n kubevirt.io/memory: large\n kubevirt.io/os: win10\n
or
metadata:\n labels:\n kubevirt.io/memory: large\n kubevirt.io/os: fedora27\n
Using matchExpressions allows for matching multiple labels of VirtualMachineInstances
without needing to explicity list a label.
selector:\n matchExpressions:\n - {key: kubevirt.io/os, operator: In, values: [fedora27, fedora26]}\n
would match both:
metadata:\n labels:\n kubevirt.io/os: fedora26\n\nmetadata:\n labels:\n kubevirt.io/os: fedora27\n
The Kubernetes documentation has a detailed explanation. Examples are provided below.
"},{"location":"user_workloads/presets/#exclusions","title":"Exclusions","text":"Since VirtualMachineInstancePresets
use Selectors
that indicate which VirtualMachineInstances
their settings should apply to, there needs to exist a mechanism by which VirtualMachineInstances
can opt out of VirtualMachineInstancePresets
altogether. This is done using an annotation:
kind: VirtualMachineInstance\nversion: v1\nmetadata:\n name: myvmi\n annotations:\n virtualmachineinstancepresets.admission.kubevirt.io/exclude: \"true\"\n ...\n
"},{"location":"user_workloads/presets/#examples","title":"Examples","text":""},{"location":"user_workloads/presets/#simple-virtualmachineinstancepreset-example","title":"Simple VirtualMachineInstancePreset
Example","text":"apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nversion: v1\nmetadata:\n name: example-preset\nspec:\n selector:\n matchLabels:\n kubevirt.io/os: win10\n domain:\n features:\n acpi: {}\n apic: {}\n hyperv:\n relaxed: {}\n vapic: {}\n spinlocks:\n spinlocks: 8191\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nversion: v1\nmetadata:\n name: myvmi\n labels:\n kubevirt.io/os: win10\nspec:\n domain:\n firmware:\n uuid: c8f99fc8-20f5-46c4-85e5-2b841c547cef\n
Once the VirtualMachineInstancePreset
is applied to the VirtualMachineInstance
, the resulting resource would look like this:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/example-preset: kubevirt.io/v1\n labels:\n kubevirt.io/os: win10\n kubevirt.io/nodeName: master\n name: myvmi\n namespace: default\nspec:\n domain:\n devices: {}\n features:\n acpi:\n enabled: true\n apic:\n enabled: true\n hyperv:\n relaxed:\n enabled: true\n spinlocks:\n enabled: true\n spinlocks: 8191\n vapic:\n enabled: true\n firmware:\n uuid: c8f99fc8-20f5-46c4-85e5-2b841c547cef\n machine:\n type: q35\n resources:\n requests:\n memory: 8Mi\n
"},{"location":"user_workloads/presets/#conflict-example","title":"Conflict Example","text":"This is an example of a merge conflict. In this case both the VirtualMachineInstance
and VirtualMachineInstancePreset
request different number of CPU's.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nversion: v1\nmetadata:\n name: example-preset\nspec:\n selector:\n matchLabels:\n kubevirt.io/flavor: default-features\n domain:\n cpu:\n cores: 4\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nversion: v1\nmetadata:\n name: myvmi\n labels:\n kubevirt.io/flavor: default-features\nspec:\n domain:\n cpu:\n cores: 6\n
In this case the VirtualMachineInstance
Spec will remain unmodified. Use kubectl get events
to show events.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n generation: 0\n labels:\n kubevirt.io/flavor: default-features\n name: myvmi\n namespace: default\nspec:\n domain:\n cpu:\n cores: 6\n devices: {}\n machine:\n type: \"\"\n resources: {}\nstatus: {}\n
Calling kubectl get events
would have a line like:
2m 2m 1 myvmi.1515bbb8d397f258 VirtualMachineInstance Warning Conflict virtualmachineinstance-preset-controller Unable to apply VirtualMachineInstancePreset example-preset: spec.cpu: &{6} != &{4}\n
"},{"location":"user_workloads/presets/#matching-multiple-virtualmachineinstances-using-matchlabels","title":"Matching Multiple VirtualMachineInstances Using MatchLabels","text":"These VirtualMachineInstances
have multiple labels, one that is unique and one that is shared.
Note: This example breaks from the convention of using os-shortname as a Label
for demonstration purposes.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: twelve-cores\nspec:\n selector:\n matchLabels:\n kubevirt.io/cpu: dodecacore\n domain:\n cpu:\n cores: 12\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: windows-10\n labels:\n kubevirt.io/os: win10\n kubevirt.io/cpu: dodecacore\nspec:\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: windows-7\n labels:\n kubevirt.io/os: win7\n kubevirt.io/cpu: dodecacore\nspec:\n terminationGracePeriodSeconds: 0\n
Adding this VirtualMachineInstancePreset
and these VirtualMachineInstances
will result in:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/twelve-cores: kubevirt.io/v1\n labels:\n kubevirt.io/cpu: dodecacore\n kubevirt.io/os: win10\n name: windows-10\nspec:\n domain:\n cpu:\n cores: 12\n devices: {}\n resources:\n requests:\n memory: 4Gi\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/twelve-cores: kubevirt.io/v1\n labels:\n kubevirt.io/cpu: dodecacore\n kubevirt.io/os: win7\n name: windows-7\nspec:\n domain:\n cpu:\n cores: 12\n devices: {}\n resources:\n requests:\n memory: 4Gi\n terminationGracePeriodSeconds: 0\n
"},{"location":"user_workloads/presets/#matching-multiple-virtualmachineinstances-using-matchexpressions","title":"Matching Multiple VirtualMachineInstances Using MatchExpressions","text":"This VirtualMachineInstancePreset
has a matchExpression that will match two labels: kubevirt.io/os: win10
and kubevirt.io/os: win7
.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: windows-vmis\nspec:\n selector:\n matchExpressions:\n - {key: kubevirt.io/os, operator: In, values: [win10, win7]}\n domain:\n resources:\n requests:\n memory: 128M\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: smallvmi\n labels:\n kubevirt.io/os: win10\nspec:\n terminationGracePeriodSeconds: 60\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: largevmi\n labels:\n kubevirt.io/os: win7\nspec:\n terminationGracePeriodSeconds: 120\n
Applying the preset to both VM's will result in:
apiVersion: v1\nitems:\n- apiVersion: kubevirt.io/v1\n kind: VirtualMachineInstance\n metadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/windows-vmis: kubevirt.io/v1\n labels:\n kubevirt.io/os: win7\n name: largevmi\n spec:\n domain:\n resources:\n requests:\n memory: 128M\n terminationGracePeriodSeconds: 120\n- apiVersion: kubevirt.io/v1\n kind: VirtualMachineInstance\n metadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/windows-vmis: kubevirt.io/v1\n labels:\n kubevirt.io/os: win10\n name: smallvmi\n spec:\n domain:\n resources:\n requests:\n memory: 128M\n terminationGracePeriodSeconds: 60\n
"},{"location":"user_workloads/replicaset/","title":"VirtualMachineInstanceReplicaSet","text":"A VirtualMachineInstanceReplicaSet tries to ensures that a specified number of VirtualMachineInstance replicas are running at any time. In other words, a VirtualMachineInstanceReplicaSet makes sure that a VirtualMachineInstance or a homogeneous set of VirtualMachineInstances is always up and ready. It is very similar to a Kubernetes ReplicaSet.
No state is kept and no guarantees about the maximum number of VirtualMachineInstance replicas which are up are given. For example, the VirtualMachineInstanceReplicaSet may decide to create new replicas if possibly still running VMs are entering an unknown state.
"},{"location":"user_workloads/replicaset/#using-virtualmachineinstancereplicaset","title":"Using VirtualMachineInstanceReplicaSet","text":"The VirtualMachineInstanceReplicaSet allows us to specify a VirtualMachineInstanceTemplate in spec.template
. It consists of ObjectMetadata
in spec.template.metadata
, and a VirtualMachineInstanceSpec
in spec.template.spec
. The specification of the virtual machine is equal to the specification of the virtual machine in the VirtualMachineInstance
workload.
spec.replicas
can be used to specify how many replicas are wanted. If unspecified, the default value is 1. This value can be updated anytime. The controller will react to the changes.
spec.selector
is used by the controller to keep track of managed virtual machines. The selector specified there must be able to match the virtual machine labels as specified in spec.template.metadata.labels
. If the selector does not match these labels, or they are empty, the controller will simply do nothing except from logging an error. The user is responsible for not creating other virtual machines or VirtualMachineInstanceReplicaSets which conflict with the selector and the template labels.
A VirtualMachineInstanceReplicaSet could be exposed as a service. When this is done, one of the VirtualMachineInstances replicas will be picked for the actual delivery of the service.
For example, exposing SSH port (22) as a ClusterIP service using virtctl on a VirtualMachineInstanceReplicaSet:
$ virtctl expose vmirs vmi-ephemeral --name vmiservice --port 27017 --target-port 22\n
All service exposure options that apply to a VirtualMachineInstance apply to a VirtualMachineInstanceReplicaSet. See Exposing VirtualMachineInstance for more details.
"},{"location":"user_workloads/replicaset/#when-to-use-a-virtualmachineinstancereplicaset","title":"When to use a VirtualMachineInstanceReplicaSet","text":"Note: The base assumption is that referenced disks are read-only or that the VMIs are writing internally to a tmpfs. The most obvious volume sources for VirtualMachineInstanceReplicaSets which KubeVirt supports are referenced below. If other types are used data corruption is possible.
Using VirtualMachineInstanceReplicaSet is the right choice when one wants many identical VMs and does not care about maintaining any disk state after the VMs are terminated.
Volume types which work well in combination with a VirtualMachineInstanceReplicaSet are:
This use-case involves small and fast booting VMs with little provisioning performed during initialization.
In this scenario, migrations are not important. Redistributing VM workloads between Nodes can be achieved simply by deleting managed VirtualMachineInstances which are running on an overloaded Node. The eviction
of such a VirtualMachineInstance can happen by directly deleting the VirtualMachineInstance instance (KubeVirt aware workload redistribution) or by deleting the corresponding Pod where the Virtual Machine runs in (Only Kubernetes aware workload redistribution).
In this use-case one has big and slow booting VMs, and complex or resource intensive provisioning is done during boot. More specifically, the timespan between the creation of a new VM and it entering the ready state is long.
In this scenario, one still does not care about the state, but since re-provisioning VMs is expensive, migrations are important. Workload redistribution between Nodes can be achieved by migrating VirtualMachineInstances to different Nodes. A workload redistributor needs to be aware of KubeVirt and create migrations, instead of evicting
VirtualMachineInstances by deletion.
Note: The simplest form of having a migratable ephemeral VirtualMachineInstance, will be to use local storage based on ContainerDisks
in combination with a file based backing store. However, migratable backing store support has not officially landed yet in KubeVirt and is untested.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceReplicaSet\nmetadata:\n name: testreplicaset\nspec:\n replicas: 3\n selector:\n matchLabels:\n myvmi: myvmi\n template:\n metadata:\n name: test\n labels:\n myvmi: myvmi\n spec:\n domain:\n devices:\n disks:\n - disk:\n name: containerdisk\n resources:\n requests:\n memory: 64M\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n
Saving this manifest into testreplicaset.yaml
and submitting it to Kubernetes will create three virtual machines based on the template. $ kubectl create -f testreplicaset.yaml\nvirtualmachineinstancereplicaset \"testreplicaset\" created\n$ kubectl describe vmirs testreplicaset\nName: testreplicaset\nNamespace: default\nLabels: <none>\nAnnotations: <none>\nAPI Version: kubevirt.io/v1\nKind: VirtualMachineInstanceReplicaSet\nMetadata:\n Cluster Name:\n Creation Timestamp: 2018-01-03T12:42:30Z\n Generation: 0\n Resource Version: 6380\n Self Link: /apis/kubevirt.io/v1/namespaces/default/virtualmachineinstancereplicasets/testreplicaset\n UID: 903a9ea0-f083-11e7-9094-525400ee45b0\nSpec:\n Replicas: 3\n Selector:\n Match Labels:\n Myvmi: myvmi\n Template:\n Metadata:\n Creation Timestamp: <nil>\n Labels:\n Myvmi: myvmi\n Name: test\n Spec:\n Domain:\n Devices:\n Disks:\n Disk:\n Name: containerdisk\n Volume Name: containerdisk\n Resources:\n Requests:\n Memory: 64M\n Volumes:\n Name: containerdisk\n Container Disk:\n Image: kubevirt/cirros-container-disk-demo:latest\nStatus:\n Conditions: <nil>\n Ready Replicas: 2\n Replicas: 3\nEvents:\n Type Reason Age From Message\n ---- ------ ---- ---- -------\n Normal SuccessfulCreate 13s virtualmachineinstancereplicaset-controller Created virtual machine: testh8998\n Normal SuccessfulCreate 13s virtualmachineinstancereplicaset-controller Created virtual machine: testf474w\n Normal SuccessfulCreate 13s virtualmachineinstancereplicaset-controller Created virtual machine: test5lvkd\n
Replicas
is 3
and Ready Replicas
is 2
. This means that at the moment when showing the status, three Virtual Machines were already created, but only two are running and ready.
Note: This requires the CustomResourceSubresources
feature gate to be enabled for clusters prior to 1.11.
The VirtualMachineInstanceReplicaSet
supports the scale
subresource. As a consequence it is possible to scale it via kubectl
:
$ kubectl scale vmirs myvmirs --replicas 5\n
"},{"location":"user_workloads/replicaset/#using-the-horizontal-pod-autoscaler","title":"Using the Horizontal Pod Autoscaler","text":"Note: This requires at cluster newer or equal to 1.11.
The HorizontalPodAutoscaler (HPA) can be used with a VirtualMachineInstanceReplicaSet
. Simply reference it in the spec of the autoscaler:
apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: myhpa\nspec:\n scaleTargetRef:\n kind: VirtualMachineInstanceReplicaSet\n name: vmi-replicaset-cirros\n apiVersion: kubevirt.io/v1\n minReplicas: 3\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50\n
or use kubectl autoscale
to define the HPA via the commandline:
$ kubectl autoscale vmirs vmi-replicaset-cirros --min=3 --max=10\n
"},{"location":"user_workloads/startup_scripts/","title":"Startup Scripts","text":"KubeVirt supports the ability to assign a startup script to a VirtualMachineInstance instance which is executed automatically when the VM initializes.
These scripts are commonly used to automate injection of users and SSH keys into VMs in order to provide remote access to the machine. For example, a startup script can be used to inject credentials into a VM that allows an Ansible job running on a remote host to access and provision the VM.
Startup scripts are not limited to any specific use case though. They can be used to run any arbitrary script in a VM on boot.
"},{"location":"user_workloads/startup_scripts/#cloud-init","title":"Cloud-init","text":"cloud-init is a widely adopted project used for early initialization of a VM. Used by cloud providers such as AWS and GCP, cloud-init has established itself as the defacto method of providing startup scripts to VMs.
Cloud-init documentation can be found here: Cloud-init Documentation.
KubeVirt supports cloud-init's NoCloud and ConfigDrive datasources which involve injecting startup scripts into a VM instance through the use of an ephemeral disk. VMs with the cloud-init package installed will detect the ephemeral disk and execute custom userdata scripts at boot.
"},{"location":"user_workloads/startup_scripts/#ignition","title":"Ignition","text":"Ignition is an alternative to cloud-init which allows for configuring the VM disk on first boot. You can find the Ignition documentation here. You can also find a comparison between cloud-init and Ignition here.
Ignition can be used with Kubevirt by using the cloudInitConfigDrive
volume.
Sysprep is an automation tool for Windows that automates Windows installation, setup, and custom software provisioning.
The general flow is:
Seal the vm image with the Sysprep tool, for example by running:
%WINDIR%\\system32\\sysprep\\sysprep.exe /generalize /shutdown /oobe /mode:vm\n
Note
We need to make sure the base vm does not restart, which can be done by setting the vm run strategy as RerunOnFailure
.
VM runStrategy:
spec:\n runStrategy: RerunOnFailure\n
More information can be found here:
Note
It is important that there is no answer file detected when the Sysprep Tool is triggered, because Windows Setup searches for answer files at the beginning of each configuration pass and caches it. If that happens, when the OS will start - it will just use the cached answer file, ignoring the one we provide through the Sysprep API. More information can be found here.
Providing an Answer file named autounattend.xml
in an attached media. The answer file can be provided in a ConfigMap or a Secret with the key autounattend.xml
The configuration file can be generated with Windows SIM or it can be specified manually according to the information found here:
Note
There are also many easy to find online tools available for creating an answer file.
KubeVirt supports the cloud-init NoCloud and ConfigDrive data sources which involve injecting startup scripts through the use of a disk attached to the VM.
In order to assign a custom userdata script to a VirtualMachineInstance using this method, users must define a disk and a volume for the NoCloud or ConfigDrive datasource in the VirtualMachineInstance's spec.
"},{"location":"user_workloads/startup_scripts/#data-sources","title":"Data Sources","text":"Under most circumstances users should stick to the NoCloud data source as it is the simplest cloud-init data source. Only if NoCloud is not supported by the cloud-init implementation (e.g. coreos-cloudinit) users should switch the data source to ConfigDrive.
Switching the cloud-init data source to ConfigDrive is as easy as changing the volume type in the VirtualMachineInstance's spec from cloudInitNoCloud
to cloudInitConfigDrive
.
NoCloud data source:
volumes:\n - name: cloudinitvolume\n cloudInitNoCloud:\n userData: \"#cloud-config\"\n
ConfigDrive data source:
volumes:\n - name: cloudinitvolume\n cloudInitConfigDrive:\n userData: \"#cloud-config\"\n
When using the ConfigDrive datasource, the networkData
part has to be in the OpenStack Metadata Service Network format:
spec:\n domain:\n interfaces: \n - name: secondary-net\n bridge: {}\n macAddress: '02:26:19:00:00:30'\n model: virtio\n networks: \n - multus:\n networkName: my-ns/my-net\n name: secondary-net\n volumes:\n - name: cloudinitvolume\n cloudInitConfigDrive:\n networkData: |\n {\"links\":[{\"id\":\"enp2s0\",\"type\":\"phy\",\"ethernet_mac_address\":\"02:26:19:00:00:30\"}],\"networks\":[{\"id\":\"NAD1\",\"type\":\"ipv4\",\"link\":\"enp2s0\",\"ip_address\":\"10.184.0.244\",\"netmask\":\"255.255.240.0\",\"routes\":[{\"network\":\"0.0.0.0\",\"netmask\":\"0.0.0.0\",\"gateway\":\"23.253.157.1\"}],\"network_id\":\"\"}],\"services\":[]}\n userData: \"#cloud-config\"\n
Note The MAC address of the secondary interface should be predefined and identical in the network interface and the cloud-init networkData.
See the examples below for more complete cloud-init examples.
"},{"location":"user_workloads/startup_scripts/#cloud-init-user-data-as-clear-text","title":"Cloud-init user-data as clear text","text":"In the example below, a SSH key is stored in the cloudInitNoCloud Volume's userData field as clean text. There is a corresponding disks entry that references the cloud-init volume and assigns it to the VM's device.
# Create a VM manifest with the startup script\n# a cloudInitNoCloud volume's userData field.\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userData: |\n #cloud-config\n ssh_authorized_keys:\n - ssh-rsa AAAAB3NzaK8L93bWxnyp test@test.com\n\nEND\n\n# Post the Virtual Machine spec to KubeVirt.\n\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-user-data-as-base64-string","title":"Cloud-init user-data as base64 string","text":"In the example below, a simple bash script is base64 encoded and stored in the cloudInitNoCloud Volume's userDataBase64 field. There is a corresponding disks entry that references the cloud-init volume and assigns it to the VM's device.
Users also have the option of storing the startup script in a Kubernetes Secret and referencing the Secret in the VM's spec. Examples further down in the document illustrate how that is done.
# Create a simple startup script\n\ncat << END > startup-script.sh\n#!/bin/bash\necho \"Hi from startup script!\"\nEND\n\n# Create a VM manifest with the startup script base64 encoded into\n# a cloudInitNoCloud volume's userDataBase64 field.\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userDataBase64: $(cat startup-script.sh | base64 -w0)\nEND\n\n# Post the Virtual Machine spec to KubeVirt.\n\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-userdata-as-k8s-secret","title":"Cloud-init UserData as k8s Secret","text":"Users who wish to not store the cloud-init userdata directly in the VirtualMachineInstance spec have the option to store the userdata into a Kubernetes Secret and reference that Secret in the spec.
Multiple VirtualMachineInstance specs can reference the same Kubernetes Secret containing cloud-init userdata.
Below is an example of how to create a Kubernetes Secret containing a startup script and reference that Secret in the VM's spec.
# Create a simple startup script\n\ncat << END > startup-script.sh\n#!/bin/bash\necho \"Hi from startup script!\"\nEND\n\n# Store the startup script in a Kubernetes Secret\nkubectl create secret generic my-vmi-secret --from-file=userdata=startup-script.sh\n\n# Create a VM manifest and reference the Secret's name in the cloudInitNoCloud\n# Volume's secretRef field\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-registry-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n secretRef:\n name: my-vmi-secret\nEND\n\n# Post the VM\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#injecting-ssh-keys-with-cloud-inits-cloud-config","title":"Injecting SSH keys with Cloud-init's Cloud-config","text":"In the examples so far, the cloud-init userdata script has been a bash script. Cloud-init has it's own configuration that can handle some common tasks such as user creation and SSH key injection.
More cloud-config examples can be found here: Cloud-init Examples
Below is an example of using cloud-config to inject an SSH key for the default user (fedora in this case) of a Fedora Atomic disk image.
# Create the cloud-init cloud-config userdata.\ncat << END > startup-script\n#cloud-config\npassword: atomic\nchpasswd: { expire: False }\nssh_pwauth: False\nssh_authorized_keys:\n - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6zdgFiLr1uAK7PdcchDd+LseA5fEOcxCCt7TLlr7Mx6h8jUg+G+8L9JBNZuDzTZSF0dR7qwzdBBQjorAnZTmY3BhsKcFr8Gt4KMGrS6r3DNmGruP8GORvegdWZuXgASKVpXeI7nCIjRJwAaK1x+eGHwAWO9Z8ohcboHbLyffOoSZDSIuk2kRIc47+ENRjg0T6x2VRsqX27g6j4DfPKQZGk0zvXkZaYtr1e2tZgqTBWqZUloMJK8miQq6MktCKAS4VtPk0k7teQX57OGwD6D7uo4b+Cl8aYAAwhn0hc0C2USfbuVHgq88ESo2/+NwV4SQcl3sxCW21yGIjAGt4Hy7J fedora@localhost.localdomain\nEND\n\n# Create the VM spec\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: sshvmi\nspec:\n terminationGracePeriodSeconds: 0\n domain:\n resources:\n requests:\n memory: 1024M\n devices:\n disks:\n - name: containerdisk\n disk:\n dev: vda\n - name: cloudinitdisk\n disk:\n dev: vdb\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-atomic-registry-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userDataBase64: $(cat startup-script | base64 -w0)\nEND\n\n# Post the VirtualMachineInstance spec to KubeVirt.\nkubectl create -f my-vmi.yaml\n\n# Connect to VM with passwordless SSH key\nssh -i <insert private key here> fedora@<insert ip here>\n
"},{"location":"user_workloads/startup_scripts/#inject-ssh-key-using-a-custom-shell-script","title":"Inject SSH key using a Custom Shell Script","text":"Depending on the boot image in use, users may have a mixed experience using cloud-init's cloud-config to create users and inject SSH keys.
Below is an example of creating a user and injecting SSH keys for that user using a script instead of cloud-config.
cat << END > startup-script.sh\n#!/bin/bash\nexport NEW_USER=\"foo\"\nexport SSH_PUB_KEY=\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6zdgFiLr1uAK7PdcchDd+LseA5fEOcxCCt7TLlr7Mx6h8jUg+G+8L9JBNZuDzTZSF0dR7qwzdBBQjorAnZTmY3BhsKcFr8Gt4KMGrS6r3DNmGruP8GORvegdWZuXgASKVpXeI7nCIjRJwAaK1x+eGHwAWO9Z8ohcboHbLyffOoSZDSIuk2kRIc47+ENRjg0T6x2VRsqX27g6j4DfPKQZGk0zvXkZaYtr1e2tZgqTBWqZUloMJK8miQq6MktCKAS4VtPk0k7teQX57OGwD6D7uo4b+Cl8aYAAwhn0hc0C2USfbuVHgq88ESo2/+NwV4SQcl3sxCW21yGIjAGt4Hy7J $NEW_USER@localhost.localdomain\"\n\nsudo adduser -U -m $NEW_USER\necho \"$NEW_USER:atomic\" | chpasswd\nsudo mkdir /home/$NEW_USER/.ssh\nsudo echo \"$SSH_PUB_KEY\" > /home/$NEW_USER/.ssh/authorized_keys\nsudo chown -R ${NEW_USER}: /home/$NEW_USER/.ssh\nEND\n\n# Create the VM spec\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: sshvmi\nspec:\n terminationGracePeriodSeconds: 0\n domain:\n resources:\n requests:\n memory: 1024M\n devices:\n disks:\n - name: containerdisk\n disk:\n dev: vda\n - name: cloudinitdisk\n disk:\n dev: vdb\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-atomic-registry-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userDataBase64: $(cat startup-script.sh | base64 -w0)\nEND\n\n# Post the VirtualMachineInstance spec to KubeVirt.\nkubectl create -f my-vmi.yaml\n\n# Connect to VM with passwordless SSH key\nssh -i <insert private key here> foo@<insert ip here>\n
"},{"location":"user_workloads/startup_scripts/#network-config","title":"Network Config","text":"A cloud-init network version 1 configuration can be set to configure the network at boot.
Cloud-init user-data must be set for cloud-init to parse network-config even if it is just the user-data config header:
#cloud-config\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-network-config-as-clear-text","title":"Cloud-init network-config as clear text","text":"In the example below, a simple cloud-init network-config is stored in the cloudInitNoCloud Volume's networkData field as clean text. There is a corresponding disks entry that references the cloud-init volume and assigns it to the VM's device.
# Create a VM manifest with the network-config in\n# a cloudInitNoCloud volume's networkData field.\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1alpha2\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n volumeName: registryvolume\n disk:\n bus: virtio\n - name: cloudinitdisk\n volumeName: cloudinitvolume\n disk:\n bus: virtio\n volumes:\n - name: registryvolume\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - name: cloudinitvolume\n cloudInitNoCloud:\n userData: \"#cloud-config\"\n networkData: |\n network:\n version: 1\n config:\n - type: physical\n name: eth0\n subnets:\n - type: dhcp\n\nEND\n\n# Post the Virtual Machine spec to KubeVirt.\n\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-network-config-as-base64-string","title":"Cloud-init network-config as base64 string","text":"In the example below, a simple network-config is base64 encoded and stored in the cloudInitNoCloud Volume's networkDataBase64 field. There is a corresponding disks entry that references the cloud-init volume and assigns it to the VM's device.
Users also have the option of storing the network-config in a Kubernetes Secret and referencing the Secret in the VM's spec. Examples further down in the document illustrate how that is done.
# Create a simple network-config\n\ncat << END > network-config\nnetwork:\n version: 1\n config:\n - type: physical\n name: eth0\n subnets:\n - type: dhcp\nEND\n\n# Create a VM manifest with the networkData base64 encoded into\n# a cloudInitNoCloud volume's networkDataBase64 field.\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1alpha2\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n volumeName: registryvolume\n disk:\n bus: virtio\n - name: cloudinitdisk\n volumeName: cloudinitvolume\n disk:\n bus: virtio\n volumes:\n - name: registryvolume\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - name: cloudinitvolume\n cloudInitNoCloud:\n userData: \"#cloud-config\"\n networkDataBase64: $(cat network-config | base64 -w0)\nEND\n\n# Post the Virtual Machine spec to KubeVirt.\n\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-network-config-as-k8s-secret","title":"Cloud-init network-config as k8s Secret","text":"Users who wish to not store the cloud-init network-config directly in the VirtualMachineInstance spec have the option to store the network-config into a Kubernetes Secret and reference that Secret in the spec.
Multiple VirtualMachineInstance specs can reference the same Kubernetes Secret containing cloud-init network-config.
Below is an example of how to create a Kubernetes Secret containing a network-config and reference that Secret in the VM's spec.
# Create a simple network-config\n\ncat << END > network-config\nnetwork:\n version: 1\n config:\n - type: physical\n name: eth0\n subnets:\n - type: dhcp\nEND\n\n# Store the network-config in a Kubernetes Secret\nkubectl create secret generic my-vmi-secret --from-file=networkdata=network-config\n\n# Create a VM manifest and reference the Secret's name in the cloudInitNoCloud\n# Volume's secretRef field\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1alpha2\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n volumeName: registryvolume\n disk:\n bus: virtio\n - name: cloudinitdisk\n volumeName: cloudinitvolume\n disk:\n bus: virtio\n volumes:\n - name: registryvolume\n containerDisk:\n image: kubevirt/cirros-registry-disk-demo:latest\n - name: cloudinitvolume\n cloudInitNoCloud:\n userData: \"#cloud-config\"\n networkDataSecretRef:\n name: my-vmi-secret\nEND\n\n# Post the VM\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#debugging","title":"Debugging","text":"Depending on the operating system distribution in use, cloud-init output is often printed to the console output on boot up. When developing userdata scripts, users can connect to the VM's console during boot up to debug.
Example of connecting to console using virtctl:
virtctl console <name of vmi>\n
"},{"location":"user_workloads/startup_scripts/#device-role-tagging","title":"Device Role Tagging","text":"KubeVirt provides a mechanism for users to tag devices such as Network Interfaces with a specific role. The tag will be matched to the hardware address of the device and this mapping exposed to the guest OS via cloud-init.
This additional metadata will help the guest OS users with multiple networks interfaces to identify the devices that may have a specific role, such as a network device dedicated to a specific service or a disk intended to be used by a specific application (database, webcache, etc.)
This functionality already exists in platforms such as OpenStack. KubeVirt will provide the data in a similar format, known to users and services like cloud-init.
For example:
kind: VirtualMachineInstance\nspec:\n domain:\n devices:\n interfaces:\n - masquerade: {}\n name: default\n - bridge: {}\n name: ptp\n tag: ptp\n - name: sriov-net\n sriov: {}\n tag: nfvfunc\n networks:\n - name: default\n pod: {}\n - multus:\n networkName: ptp-conf\n name: ptp\n networkName: sriov/sriov-network\n name: sriov-net\n\nThe metadata will be available in the guests config drive `openstack/latest/meta_data.json`\n\n{\n \"devices\": [\n {\n \"type\": \"nic\",\n \"bus\": \"pci\",\n \"address\": \"0000:00:02.0\",\n \"mac\": \"01:22:22:42:22:21\",\n \"tags\": [\"ptp\"]\n },\n {\n \"type\": \"nic\",\n \"bus\": \"pci\",\n \"address\": \"0000:81:10.1\",\n \"mac\": \"01:22:22:42:22:22\",\n \"tags\": [\"nfvfunc\"]\n },\n ]\n}\n
"},{"location":"user_workloads/startup_scripts/#ignition-examples","title":"Ignition Examples","text":"Ignition data can be passed into a cloudInitConfigDrive
source using either clear text, a base64 string or a k8s Secret.
Some examples of Ignition configurations can be found in the examples given by the Ignition documentation.
"},{"location":"user_workloads/startup_scripts/#ignition-as-clear-text","title":"Ignition as clear text","text":"Here is a complete example of a Kubevirt VM using Ignition to add an ssh key to the coreos
user at first boot :
apiVersion: kubevirt.io/v1alpha3\nkind: VirtualMachine\nmetadata:\n name: ign-demo\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/size: small\n kubevirt.io/domain: ign-demo\n spec:\n domain:\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n interfaces:\n - name: default\n masquerade: {}\n resources:\n requests:\n memory: 2G\n networks:\n - name: default\n pod: {}\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/rhcos:4.9\n - name: cloudinitdisk\n cloudInitConfigDrive:\n userData: |\n {\n \"ignition\": {\n \"config\": {},\n \"proxy\": {},\n \"security\": {},\n \"timeouts\": {},\n \"version\": \"3.2.0\"\n },\n \"passwd\": {\n \"users\": [\n {\n \"name\": \"coreos\",\n \"sshAuthorizedKeys\": [\n \"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPL3axFGHI3db9iJWkPXVbYzD7OaWTtHuqmxLvj+DztB user@example\"\n ]\n }\n ]\n },\n \"storage\": {},\n \"systemd\": {}\n }\n
See that the Ignition config is simply passed to the userData
annotation of the cloudInitConfigDrive
volume.
You can also pass the Ignition config as a base64 string by using the userDatabase64
annotation :
...\ncloudInitConfigDrive:\n userDataBase64: eyJpZ25pdGlvbiI6eyJjb25maWciOnt9LCJwcm94eSI6e30sInNlY3VyaXR5Ijp7fSwidGltZW91dHMiOnt9LCJ2ZXJzaW9uIjoiMy4yLjAifSwicGFzc3dkIjp7InVzZXJzIjpbeyJuYW1lIjoiY29yZW9zIiwic3NoQXV0aG9yaXplZEtleXMiOlsic3NoLWVkMjU1MTlBQUFBQzNOemFDMWxaREkxTlRFNUFBQUFJUEwzYXhGR0hJM2RiOWlKV2tQWFZiWXpEN09hV1R0SHVxbXhMdmorRHp0QiB1c2VyQGV4YW1wbGUiXX1dfSwic3RvcmFnZSI6e30sInN5c3RlbWQiOnt9fQ==\n
You can obtain the base64 string by doing cat ignition.json | base64 -w0
in your terminal.
If you do not want to store the Ignition config into the VM configuration, you can use a k8s Secret.
First, create the secret with the ignition data in it :
kubectl create secret generic my-ign-secret --from-file=ignition=ignition.json\n
Then specify this secret into your VM configuration :
...\ncloudInitConfigDrive:\n secretRef:\n name: my-ign-secret\n
"},{"location":"user_workloads/startup_scripts/#sysprep-examples","title":"Sysprep Examples","text":""},{"location":"user_workloads/startup_scripts/#sysprep-in-a-configmap","title":"Sysprep in a ConfigMap","text":"The answer file can be provided in a ConfigMap:
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: sysprep-config\ndata:\n autounattend.xml: |\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <unattend xmlns=\"urn:schemas-microsoft-com:unattend\">\n ...\n </unattend>\n
And attached to the VM like so:
kind: VirtualMachine\nmetadata:\n name: windows-with-sysprep\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/domain: windows-with-sysprep\n spec:\n domain:\n cpu:\n cores: 3\n devices:\n disks:\n - bootOrder: 1\n disk:\n bus: virtio\n name: harddrive\n - name: sysprep\n cdrom:\n bus: sata\n machine:\n type: q35\n resources:\n requests:\n memory: 6G\n volumes:\n - name: harddrive\n persistentVolumeClaim:\n claimName: windows_pvc\n - name: sysprep\n sysprep:\n configMap:\n name: sysprep-config\n
"},{"location":"user_workloads/startup_scripts/#sysprep-in-a-secret","title":"Sysprep in a Secret","text":"The answer file can be provided in a Secret:
apiVersion: v1\nkind: Secret\nmetadata:\n name: sysprep-config\nstringData:\ndata:\n autounattend.xml: |\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <unattend xmlns=\"urn:schemas-microsoft-com:unattend\">\n ...\n </unattend>\n
And attached to the VM like so:
kind: VirtualMachine\nmetadata:\n name: windows-with-sysprep\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/domain: windows-with-sysprep\n spec:\n domain:\n cpu:\n cores: 3\n devices:\n disks:\n - bootOrder: 1\n disk:\n bus: virtio\n name: harddrive\n - name: sysprep\n cdrom:\n bus: sata\n machine:\n type: q35\n resources:\n requests:\n memory: 6G\n volumes:\n - name: harddrive\n persistentVolumeClaim:\n claimName: windows_pvc\n - name: sysprep\n sysprep:\n secret:\n name: sysprep-secret\n
"},{"location":"user_workloads/startup_scripts/#base-sysprep-vm","title":"Base Sysprep VM","text":"In the example below, a configMap with autounattend.xml
file is used to modify the Windows iso image which is downloaded from Microsoft and creates a base installed Windows machine with virtio drivers installed and all the commands executed in post-install.ps1
For the below manifests to work it needs to have win10-iso
DataVolume.
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: win10-template-configmap\ndata:\n autounattend.xml: |-\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <unattend xmlns=\"urn:schemas-microsoft-com:unattend\">\n <settings pass=\"windowsPE\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-International-Core-WinPE\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <SetupUILanguage>\n <UILanguage>en-US</UILanguage>\n </SetupUILanguage>\n <InputLocale>0409:00000409</InputLocale>\n <SystemLocale>en-US</SystemLocale>\n <UILanguage>en-US</UILanguage>\n <UILanguageFallback>en-US</UILanguageFallback>\n <UserLocale>en-US</UserLocale>\n </component>\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-PnpCustomizationsWinPE\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <DriverPaths>\n <PathAndCredentials wcm:keyValue=\"4b29ba63\" wcm:action=\"add\">\n <Path>E:\\amd64\\2k19</Path>\n </PathAndCredentials>\n <PathAndCredentials wcm:keyValue=\"25fe51ea\" wcm:action=\"add\">\n <Path>E:\\NetKVM\\2k19\\amd64</Path>\n </PathAndCredentials>\n </DriverPaths>\n </component>\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-Setup\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <DiskConfiguration>\n <Disk wcm:action=\"add\">\n <CreatePartitions>\n <CreatePartition wcm:action=\"add\">\n <Order>1</Order>\n <Type>Primary</Type>\n <Size>100</Size>\n </CreatePartition>\n <CreatePartition wcm:action=\"add\">\n <Extend>true</Extend>\n <Order>2</Order>\n <Type>Primary</Type>\n </CreatePartition>\n </CreatePartitions>\n <ModifyPartitions>\n <ModifyPartition wcm:action=\"add\">\n <Format>NTFS</Format>\n <Label>System Reserved</Label>\n <Order>1</Order>\n <PartitionID>1</PartitionID>\n <TypeID>0x27</TypeID>\n </ModifyPartition>\n <ModifyPartition wcm:action=\"add\">\n <Format>NTFS</Format>\n <Label>OS</Label>\n <Letter>C</Letter>\n <Order>2</Order>\n <PartitionID>2</PartitionID>\n </ModifyPartition>\n </ModifyPartitions>\n <DiskID>0</DiskID>\n <WillWipeDisk>true</WillWipeDisk>\n </Disk>\n </DiskConfiguration>\n <ImageInstall>\n <OSImage>\n <InstallFrom>\n <MetaData wcm:action=\"add\">\n <Key>/Image/Description</Key>\n <Value>Windows 10 Pro</Value>\n </MetaData>\n </InstallFrom>\n <InstallTo>\n <DiskID>0</DiskID>\n <PartitionID>2</PartitionID>\n </InstallTo>\n </OSImage>\n </ImageInstall>\n <UserData>\n <AcceptEula>true</AcceptEula>\n <FullName/>\n <Organization/>\n <ProductKey>\n <Key/>\n </ProductKey>\n </UserData>\n </component>\n </settings>\n <settings pass=\"offlineServicing\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-LUA-Settings\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <EnableLUA>false</EnableLUA>\n </component>\n </settings>\n <settings pass=\"specialize\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-International-Core\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <InputLocale>0409:00000409</InputLocale>\n <SystemLocale>en-US</SystemLocale>\n <UILanguage>en-US</UILanguage>\n <UILanguageFallback>en-US</UILanguageFallback>\n <UserLocale>en-US</UserLocale>\n </component>\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-Security-SPP-UX\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <SkipAutoActivation>true</SkipAutoActivation>\n </component>\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-SQMApi\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <CEIPEnabled>0</CEIPEnabled>\n </component>\n </settings>\n <settings pass=\"oobeSystem\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-Shell-Setup\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <OOBE>\n <HideEULAPage>true</HideEULAPage>\n <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>\n <HideOnlineAccountScreens>true</HideOnlineAccountScreens>\n <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>\n <NetworkLocation>Work</NetworkLocation>\n <SkipUserOOBE>true</SkipUserOOBE>\n <SkipMachineOOBE>true</SkipMachineOOBE>\n <ProtectYourPC>3</ProtectYourPC>\n </OOBE>\n <AutoLogon>\n <Password>\n <Value>123456</Value>\n <PlainText>true</PlainText>\n </Password>\n <Enabled>true</Enabled>\n <Username>Administrator</Username>\n </AutoLogon>\n <UserAccounts>\n <AdministratorPassword>\n <Value>123456</Value>\n <PlainText>true</PlainText>\n </AdministratorPassword>\n </UserAccounts>\n <RegisteredOrganization/>\n <RegisteredOwner/>\n <TimeZone>Eastern Standard Time</TimeZone>\n <FirstLogonCommands>\n <SynchronousCommand wcm:action=\"add\">\n <CommandLine>powershell -ExecutionPolicy Bypass -NoExit -NoProfile f:\\post-install.ps1</CommandLine>\n <RequiresUserInput>false</RequiresUserInput>\n <Order>1</Order>\n <Description>Post Installation Script</Description>\n </SynchronousCommand>\n </FirstLogonCommands>\n </component>\n </settings>\n </unattend>\n\n\n post-install.ps1: |-\n # Remove AutoLogin\n # https://docs.microsoft.com/en-us/windows-hardware/customize/desktop/unattend/microsoft-windows-shell-setup-autologon-logoncount#logoncount-known-issue\n reg add \"HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Winlogon\" /v AutoAdminLogon /t REG_SZ /d 0 /f\n\n # install Qemu Tools (Drivers)\n Start-Process msiexec -Wait -ArgumentList '/i e:\\virtio-win-gt-x64.msi /qn /passive /norestart'\n\n # install Guest Agent\n Start-Process msiexec -Wait -ArgumentList '/i e:\\guest-agent\\qemu-ga-x86_64.msi /qn /passive /norestart'\n\n # Rename cached unattend.xml to avoid it is picked up by sysprep\n mv C:\\Windows\\Panther\\unattend.xml C:\\Windows\\Panther\\unattend.install.xml\n\n # Eject CD, to avoid that the autounattend.xml on the CD is picked up by sysprep\n (new-object -COM Shell.Application).NameSpace(17).ParseName('F:').InvokeVerb('Eject')\n\n # Run Sysprep and Shutdown\n C:\\Windows\\System32\\Sysprep\\sysprep.exe /generalize /oobe /shutdown /mode:vm\n\n---\n\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n annotations:\n name.os.template.kubevirt.io/win10: Microsoft Windows 10\n vm.kubevirt.io/validations: |\n [\n {\n \"name\": \"minimal-required-memory\",\n \"path\": \"jsonpath::.spec.domain.resources.requests.memory\",\n \"rule\": \"integer\",\n \"message\": \"This VM requires more memory.\",\n \"min\": 2147483648\n }, {\n \"name\": \"windows-virtio-bus\",\n \"path\": \"jsonpath::.spec.domain.devices.disks[*].disk.bus\",\n \"valid\": \"jsonpath::.spec.domain.devices.disks[*].disk.bus\",\n \"rule\": \"enum\",\n \"message\": \"virto disk bus type has better performance, install virtio drivers in VM and change bus type\",\n \"values\": [\"virtio\"],\n \"justWarning\": true\n }, {\n \"name\": \"windows-disk-bus\",\n \"path\": \"jsonpath::.spec.domain.devices.disks[*].disk.bus\",\n \"valid\": \"jsonpath::.spec.domain.devices.disks[*].disk.bus\",\n \"rule\": \"enum\",\n \"message\": \"disk bus has to be either virtio or sata or scsi\",\n \"values\": [\"virtio\", \"sata\", \"scsi\"]\n }, {\n \"name\": \"windows-cd-bus\",\n \"path\": \"jsonpath::.spec.domain.devices.disks[*].cdrom.bus\",\n \"valid\": \"jsonpath::.spec.domain.devices.disks[*].cdrom.bus\",\n \"rule\": \"enum\",\n \"message\": \"cd bus has to be sata\",\n \"values\": [\"sata\"]\n }\n ]\n name: win10-template\n namespace: default\n labels:\n app: win10-template\n flavor.template.kubevirt.io/medium: 'true'\n os.template.kubevirt.io/win10: 'true'\n vm.kubevirt.io/template: windows10-desktop-medium\n vm.kubevirt.io/template.namespace: openshift\n vm.kubevirt.io/template.revision: '1'\n vm.kubevirt.io/template.version: v0.14.0\n workload.template.kubevirt.io/desktop: 'true'\nspec:\n runStrategy: RerunOnFailure\n dataVolumeTemplates:\n - metadata:\n name: win10-template-windows-iso\n spec:\n storage: {}\n source:\n pvc:\n name: windows10-iso\n namespace: default\n - metadata:\n name: win10-template\n spec:\n storage:\n resources:\n requests:\n storage: 25Gi\n volumeMode: Filesystem\n source:\n blank: {}\n template:\n metadata:\n annotations:\n vm.kubevirt.io/flavor: medium\n vm.kubevirt.io/os: windows10\n vm.kubevirt.io/workload: desktop\n labels:\n flavor.template.kubevirt.io/medium: 'true'\n kubevirt.io/domain: win10-template\n kubevirt.io/size: medium\n os.template.kubevirt.io/win10: 'true'\n vm.kubevirt.io/name: win10-template\n workload.template.kubevirt.io/desktop: 'true'\n spec:\n domain:\n clock:\n timer:\n hpet:\n present: false\n hyperv: {}\n pit:\n tickPolicy: delay\n rtc:\n tickPolicy: catchup\n utc: {}\n cpu:\n cores: 1\n sockets: 1\n threads: 1\n devices:\n disks:\n - bootOrder: 1\n disk:\n bus: virtio\n name: win10-template\n - bootOrder: 2\n cdrom:\n bus: sata\n name: windows-iso\n - cdrom:\n bus: sata\n name: windows-guest-tools\n - name: sysprep\n cdrom:\n bus: sata\n inputs:\n - bus: usb\n name: tablet\n type: tablet\n interfaces:\n - masquerade: {}\n model: virtio\n name: default\n features:\n acpi: {}\n apic: {}\n hyperv:\n reenlightenment: {}\n ipi: {}\n synic: {}\n synictimer:\n direct: {}\n spinlocks:\n spinlocks: 8191\n reset: {}\n relaxed: {}\n vpindex: {}\n runtime: {}\n tlbflush: {}\n frequencies: {}\n vapic: {}\n machine:\n type: pc-q35-rhel8.4.0\n resources:\n requests:\n memory: 4Gi\n hostname: win10-template\n networks:\n - name: default\n pod: {}\n volumes:\n - dataVolume:\n name: win10-iso\n name: windows-iso\n - dataVolume:\n name: win10-template-windows-iso\n name: win10-template\n - containerDisk:\n image: quay.io/kubevirt/virtio-container-disk\n name: windows-guest-tools\n - name: sysprep\n sysprep:\n configMap:\n name: win10-template-configmap\n
"},{"location":"user_workloads/startup_scripts/#launching-a-vm-from-template","title":"Launching a VM from template","text":"From the above example after the sysprep command is executed in the post-install.ps1
and the vm is in shutdown state, A new VM can be launched from the base win10-template
with additional changes mentioned from the below unattend.xml
in sysprep-config
. The new VM can take upto 5 minutes to be in running state since Windows goes through oobe setup in the background with the customizations specified in the below unattend.xml
file.
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: sysprep-config\ndata:\n autounattend.xml: |-\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <!-- responsible for installing windows, ignored on sysprepped images -->\n unattend.xml: |-\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <unattend xmlns=\"urn:schemas-microsoft-com:unattend\">\n <settings pass=\"oobeSystem\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-Shell-Setup\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <OOBE>\n <HideEULAPage>true</HideEULAPage>\n <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>\n <HideOnlineAccountScreens>true</HideOnlineAccountScreens>\n <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>\n <NetworkLocation>Work</NetworkLocation>\n <SkipUserOOBE>true</SkipUserOOBE>\n <SkipMachineOOBE>true</SkipMachineOOBE>\n <ProtectYourPC>3</ProtectYourPC>\n </OOBE>\n <AutoLogon>\n <Password>\n <Value>123456</Value>\n <PlainText>true</PlainText>\n </Password>\n <Enabled>true</Enabled>\n <Username>Administrator</Username>\n </AutoLogon>\n <UserAccounts>\n <AdministratorPassword>\n <Value>123456</Value>\n <PlainText>true</PlainText>\n </AdministratorPassword>\n </UserAccounts>\n <RegisteredOrganization>Kuebvirt</RegisteredOrganization>\n <RegisteredOwner>Kubevirt</RegisteredOwner>\n <TimeZone>Eastern Standard Time</TimeZone>\n <FirstLogonCommands>\n <SynchronousCommand wcm:action=\"add\">\n <CommandLine>powershell -ExecutionPolicy Bypass -NoExit -WindowStyle Hidden -NoProfile d:\\customize.ps1</CommandLine>\n <RequiresUserInput>false</RequiresUserInput>\n <Order>1</Order>\n <Description>Customize Script</Description>\n </SynchronousCommand>\n </FirstLogonCommands>\n </component>\n </settings>\n </unattend>\n customize.ps1: |-\n # Enable RDP\n Set-ItemProperty -Path 'HKLM:\\System\\CurrentControlSet\\Control\\Terminal Server' -name \"fDenyTSConnections\" -value 0\n Enable-NetFirewallRule -DisplayGroup \"Remote Desktop\"\n\n\n # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_install_firstuse\n # Install the OpenSSH Server\n Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0\n # Start the sshd service\n Start-Service sshd\n\n Set-Service -Name sshd -StartupType 'Automatic'\n\n # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_server_configuration\n # use powershell as default shell for ssh\n New-ItemProperty -Path \"HKLM:\\SOFTWARE\\OpenSSH\" -Name DefaultShell -Value \"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe\" -PropertyType String -Force\n\n\n # Add ssh authorized_key for administrator\n # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_keymanagement\n $MyDir = $MyInvocation.MyCommand.Path | Split-Path -Parent\n $PublicKey = Get-Content -Path $MyDir\\id_rsa.pub\n $authrized_keys_path = $env:ProgramData + \"\\ssh\\administrators_authorized_keys\" \n Add-Content -Path $authrized_keys_path -Value $PublicKey\n icacls.exe $authrized_keys_path /inheritance:r /grant \"Administrators:F\" /grant \"SYSTEM:F\"\n\n\n # install application via exe file installer from url\n function Install-Exe {\n $dlurl = $args[0]\n $installerPath = Join-Path $env:TEMP (Split-Path $dlurl -Leaf)\n Invoke-WebRequest -UseBasicParsing $dlurl -OutFile $installerPath\n Start-Process -FilePath $installerPath -Args \"/S\" -Verb RunAs -Wait\n Remove-Item $installerPath\n\n }\n\n # Wait for networking before running a task at startup\n do {\n $ping = test-connection -comp kubevirt.io -count 1 -Quiet\n } until ($ping)\n\n # Installing the Latest Notepad++ with PowerShell\n $BaseUri = \"https://notepad-plus-plus.org\"\n $BasePage = Invoke-WebRequest -Uri $BaseUri -UseBasicParsing\n $ChildPath = $BasePage.Links | Where-Object { $_.outerHTML -like '*Current Version*' } | Select-Object -ExpandProperty href\n $DownloadPageUri = $BaseUri + $ChildPath\n $DownloadPage = Invoke-WebRequest -Uri $DownloadPageUri -UseBasicParsing\n $DownloadUrl = $DownloadPage.Links | Where-Object { $_.outerHTML -like '*npp.*.Installer.x64.exe\"*' } | Select-Object -ExpandProperty href\n Install-Exe $DownloadUrl\n id_rsa.pub: |-\n ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6zdgFiLr1uAK7PdcchDd+LseA5fEOcxCCt7TLlr7Mx6h8jUg+G+8L9JBNZuDzTZSF0dR7qwzdBBQjorAnZTmY3BhsKcFr8Gt4KMGrS6r3DNmGruP8GORvegdWZuXgASKVpXeI7nCIjRJwAaK1x+eGHwAWO9Z8ohcboHbLyffOoSZDSIuk2kRIc47+ENRjg0T6x2VRsqX27g6j4DfPKQZGk0zvXkZaYtr1e2tZgqTBWqZUloMJK8miQq6MktCKAS4VtPk0k7teQX57OGwD6D7uo4b+Cl8aYAAwhn0hc0C2USfbuVHgq88ESo2/+NwV4SQcl3sxCW21yGIjAGt4Hy7J fedora@localhost.localdomain\n
"},{"location":"user_workloads/templates/","title":"Templates","text":"Note
By deploying KubeVirt on top of OpenShift the user can benefit from the OpenShift Template functionality.
"},{"location":"user_workloads/templates/#virtual-machine-templates","title":"Virtual machine templates","text":""},{"location":"user_workloads/templates/#what-is-a-virtual-machine-template","title":"What is a virtual machine template?","text":"The KubeVirt projects provides a set of templates to create VMs to handle common usage scenarios. These templates provide a combination of some key factors that could be further customized and processed to have a Virtual Machine object. The key factors which define a template are
Workload Most Virtual Machine should be server or desktop to have maximum flexibility; the highperformance workload trades some of this flexibility to provide better performances.
Guest Operating System (OS) This allow to ensure that the emulated hardware is compatible with the guest OS. Furthermore, it allows to maximize the stability of the VM, and allows performance optimizations.
Size (flavor) Defines the amount of resources (CPU, memory) to allocate to the VM.
More documentation is available in the common templates subproject
"},{"location":"user_workloads/templates/#accessing-the-virtual-machine-templates","title":"Accessing the virtual machine templates","text":"If you installed KubeVirt using a supported method you should find the common templates preinstalled in the cluster. Should you want to upgrade the templates, or install them from scratch, you can use one of the supported releases
To install the templates:
$ export VERSION=$(curl -s https://api.github.com/repos/kubevirt/common-templates/releases | grep tag_name | grep -v -- '-rc' | head -1 | awk -F': ' '{print $2}' | sed 's/,//' | xargs)\n $ oc create -f https://github.com/kubevirt/common-templates/releases/download/$VERSION/common-templates-$VERSION.yaml\n
"},{"location":"user_workloads/templates/#editable-fields","title":"Editable fields","text":"You can edit the fields of the templates which define the amount of resources which the VMs will receive.
Each template can list a different set of fields that are to be considered editable. The fields are used as hints for the user interface, and also for other components in the cluster.
The editable fields are taken from annotations in the template. Here is a snippet presenting a couple of most commonly found editable fields:
metadata:\n annotations:\n template.kubevirt.io/editable: |\n /objects[0].spec.template.spec.domain.cpu.sockets\n /objects[0].spec.template.spec.domain.cpu.cores\n /objects[0].spec.template.spec.domain.cpu.threads\n /objects[0].spec.template.spec.domain.resources.requests.memory\n
Each entry in the editable field list must be a jsonpath. The jsonpath root is the objects: element of the template. The actually editable field is the last entry (the \"leaf\") of the path. For example, the following minimal snippet highlights the fields which you can edit:
objects:\n spec:\n template:\n spec:\n domain:\n cpu:\n sockets:\n VALUE # this is editable\n cores:\n VALUE # this is editable\n threads:\n VALUE # this is editable\n resources:\n requests:\n memory:\n VALUE # this is editable\n
"},{"location":"user_workloads/templates/#relationship-between-templates-and-vms","title":"Relationship between templates and VMs","text":"Once processed the templates produce VM objects to be used in the cluster. The VMs produced from templates will have a vm.kubevirt.io/template
label, whose value will be the name of the parent template, for example fedora-desktop-medium
:
metadata:\n labels:\n vm.kubevirt.io/template: fedora-desktop-medium\n
In addition, these VMs can include an optional label vm.kubevirt.io/template-namespace
, whose value will be the namespace of the parent template, for example:
metadata:\n labels:\n vm.kubevirt.io/template-namespace: openshift\n
If this label is not defined, the template is expected to belong to the same namespace as the VM.
This make it possible to query for all the VMs built from any template.
Example:
oc process -o yaml -f dist/templates/rhel8-server-tiny.yaml NAME=rheltinyvm SRC_PVC_NAME=rhel SRC_PVC_NAMESPACE=kubevirt\n
And the output:
apiVersion: v1\nitems:\n- apiVersion: kubevirt.io/v1\n kind: VirtualMachine\n metadata:\n annotations:\n vm.kubevirt.io/flavor: tiny\n vm.kubevirt.io/os: rhel8\n vm.kubevirt.io/validations: |\n [\n {\n \"name\": \"minimal-required-memory\",\n \"path\": \"jsonpath::.spec.domain.resources.requests.memory\",\n \"rule\": \"integer\",\n \"message\": \"This VM requires more memory.\",\n \"min\": 1610612736\n }\n ]\n vm.kubevirt.io/workload: server\n labels:\n app: rheltinyvm\n vm.kubevirt.io/template: rhel8-server-tiny\n vm.kubevirt.io/template.revision: \"45\"\n vm.kubevirt.io/template.version: 0.11.3\n name: rheltinyvm\n spec:\n dataVolumeTemplates:\n - apiVersion: cdi.kubevirt.io/v1beta1\n kind: DataVolume\n metadata:\n name: rheltinyvm\n spec:\n storage:\n accessModes:\n - ReadWriteMany\n source:\n pvc:\n name: rhel\n namespace: kubevirt\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/domain: rheltinyvm\n kubevirt.io/size: tiny\n spec:\n domain:\n cpu:\n cores: 1\n sockets: 1\n threads: 1\n devices:\n disks:\n - disk:\n bus: virtio\n name: rheltinyvm\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - masquerade: {}\n name: default\n networkInterfaceMultiqueue: true\n rng: {}\n resources:\n requests:\n memory: 1.5Gi\n networks:\n - name: default\n pod: {}\n terminationGracePeriodSeconds: 180\n volumes:\n - dataVolume:\n name: rheltinyvm\n name: rheltinyvm\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n user: cloud-user\n password: lymp-fda4-m1cv\n chpasswd: { expire: False }\n name: cloudinitdisk\nkind: List\nmetadata: {}\n
You can add the VM from the template to the cluster in one go
oc process rhel8-server-tiny NAME=rheltinyvm SRC_PVC_NAME=rhel SRC_PVC_NAMESPACE=kubevirt | oc apply -f -\n
Please note that after the generation step VM and template objects have no relationship with each other besides the aforementioned label. Changes in templates do not automatically affect VMs or vice versa.
"},{"location":"user_workloads/templates/#common-template-customization","title":"common template customization","text":"The templates provided by the kubevirt project provide a set of conventions and annotations that augment the basic feature of the openshift templates. You can customize your kubevirt-provided templates editing these annotations, or you can add them to your existing templates to make them consumable by the kubevirt services.
Here's a description of the kubevirt annotations. Unless otherwise specified, the following keys are meant to be top-level entries of the template metadata, like
apiVersion: v1\nkind: Template\nmetadata:\n name: windows-10\n annotations:\n openshift.io/display-name: \"Generic demo template\"\n
All the following annotations are prefixed with defaults.template.kubevirt.io
, which is omitted below for brevity. So the actual annotations you should use will look like
apiVersion: v1\nkind: Template\nmetadata:\n name: windows-10\n annotations:\n defaults.template.kubevirt.io/disk: default-disk\n defaults.template.kubevirt.io/volume: default-volume\n defaults.template.kubevirt.io/nic: default-nic\n defaults.template.kubevirt.io/network: default-network\n
Unless otherwise specified, all annotations are meant to be safe defaults, both for performance and compatibility, and hints for the CNV-aware UI and tooling.
"},{"location":"user_workloads/templates/#disk","title":"disk","text":"See the section references
below.
Example:
apiVersion: v1\nkind: Template\nmetadata:\n name: Linux\n annotations:\n defaults.template.kubevirt.io/disk: rhel-disk\n
"},{"location":"user_workloads/templates/#nic","title":"nic","text":"See the section references
below.
Example:
apiVersion: v1\nkind: Template\nmetadata:\n name: Windows\n annotations:\n defaults.template.kubevirt.io/nic: my-nic\n
"},{"location":"user_workloads/templates/#volume","title":"volume","text":"See the section references
below.
Example:
apiVersion: v1\nkind: Template\nmetadata:\n name: Linux\n annotations:\n defaults.template.kubevirt.io/volume: custom-volume\n
"},{"location":"user_workloads/templates/#network","title":"network","text":"See the section references
below.
Example:
apiVersion: v1\nkind: Template\nmetadata:\n name: Linux\n annotations:\n defaults.template.kubevirt.io/network: fast-net\n
"},{"location":"user_workloads/templates/#references","title":"references","text":"The default values for network, nic, volume, disk are meant to be the name of a section later in the document that the UI will find and consume to find the default values for the corresponding types. For example, considering the annotation defaults.template.kubevirt.io/disk: my-disk
: we assume that later in the document it exists an element called my-disk
that the UI can use to find the data it needs. The names actually don't matter as long as they are legal for kubernetes and consistent with the content of the document.
demo-template.yaml
apiversion: v1\nitems:\n- apiversion: kubevirt.io/v1\n kind: virtualmachine\n metadata:\n labels:\n vm.kubevirt.io/template: rhel7-generic-tiny\n name: rheltinyvm\n osinfoname: rhel7.0\n defaults.template.kubevirt.io/disk: rhel-default-disk\n defaults.template.kubevirt.io/nic: rhel-default-net\n spec:\n running: false\n template:\n spec:\n domain:\n cpu:\n sockets: 1\n cores: 1\n threads: 1\n devices:\n rng: {}\n resources:\n requests:\n memory: 1g\n terminationgraceperiodseconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/cirros-container-disk-demo:devel\n name: rhel-default-disk\n networks:\n - genie:\n networkName: flannel\n name: rhel-default-net\nkind: list\nmetadata: {}\n
once processed becomes: demo-vm.yaml
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n vm.kubevirt.io/template: rhel7-generic-tiny\n name: rheltinyvm\n osinfoname: rhel7.0\nspec:\n running: false\n template:\n spec:\n domain:\n cpu:\n sockets: 1\n cores: 1\n threads: 1\n resources:\n requests:\n memory: 1g\n devices:\n rng: {}\n disks:\n - disk:\n name: rhel-default-disk\n interfaces:\n - bridge: {}\n name: rhel-default-nic\n terminationgraceperiodseconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/cirros-container-disk-demo:devel\n name: containerdisk\n networks:\n - genie:\n networkName: flannel\n name: rhel-default-nic\n
"},{"location":"user_workloads/templates/#virtual-machine-creation","title":"Virtual machine creation","text":""},{"location":"user_workloads/templates/#overview","title":"Overview","text":"The KubeVirt projects provides a set of templates to create VMs to handle common usage scenarios. These templates provide a combination of some key factors that could be further customized and processed to have a Virtual Machine object.
The key factors which define a template are - Workload Most Virtual Machine should be server or desktop to have maximum flexibility; the highperformance workload trades some of this flexibility to provide better performances. - Guest Operating System (OS) This allow to ensure that the emulated hardware is compatible with the guest OS. Furthermore, it allows to maximize the stability of the VM, and allows performance optimizations. - Size (flavor) Defines the amount of resources (CPU, memory) to allocate to the VM.
"},{"location":"user_workloads/templates/#openshift-console","title":"Openshift Console","text":"VMs can be created through OpenShift Cluster Console UI . This UI supports creation VM using templates and templates features - flavors and workload profiles. To create VM from template, choose WorkLoads in the left panel >> choose Virtualization >> press to the \"Create Virtual Machine\" blue button >> choose \"Create from wizard\". Next, you have to see \"Create Virtual Machine\" window
"},{"location":"user_workloads/templates/#common-templates","title":"Common-templates","text":"There is the common-templates subproject. It provides official prepared and useful templates. You can also create templates by hand. You can find an example below, in the \"Example template\" section.
"},{"location":"user_workloads/templates/#example-template","title":"Example template","text":"In order to create a virtual machine via OpenShift CLI, you need to provide a template defining the corresponding object and its metadata.
NOTE Only VirtualMachine
object is currently supported.
Here is an example template that defines an instance of the VirtualMachine
object:
apiVersion: template.openshift.io/v1\nkind: Template\nmetadata:\n name: fedora-desktop-large\n annotations:\n openshift.io/display-name: \"Fedora 32+ VM\"\n description: >-\n Template for Fedora 32 VM or newer.\n A PVC with the Fedora disk image must be available.\n Recommended disk image:\n https://download.fedoraproject.org/pub/fedora/linux/releases/32/Cloud/x86_64/images/Fedora-Cloud-Base-32-1.6.x86_64.qcow2\n tags: \"hidden,kubevirt,virtualmachine,fedora\"\n iconClass: \"icon-fedora\"\n openshift.io/provider-display-name: \"KubeVirt\"\n openshift.io/documentation-url: \"https://github.com/kubevirt/common-templates\"\n openshift.io/support-url: \"https://github.com/kubevirt/common-templates/issues\"\n template.openshift.io/bindable: \"false\"\n template.kubevirt.io/version: v1alpha1\n defaults.template.kubevirt.io/disk: rootdisk\n template.kubevirt.io/editable: |\n /objects[0].spec.template.spec.domain.cpu.sockets\n /objects[0].spec.template.spec.domain.cpu.cores\n /objects[0].spec.template.spec.domain.cpu.threads\n /objects[0].spec.template.spec.domain.resources.requests.memory\n /objects[0].spec.template.spec.domain.devices.disks\n /objects[0].spec.template.spec.volumes\n /objects[0].spec.template.spec.networks\n name.os.template.kubevirt.io/fedora32: Fedora 32 or higher\n name.os.template.kubevirt.io/fedora33: Fedora 32 or higher\n name.os.template.kubevirt.io/silverblue32: Fedora 32 or higher\n name.os.template.kubevirt.io/silverblue33: Fedora 32 or higher\n labels:\n os.template.kubevirt.io/fedora32: \"true\"\n os.template.kubevirt.io/fedora33: \"true\"\n os.template.kubevirt.io/silverblue32: \"true\"\n os.template.kubevirt.io/silverblue33: \"true\"\n workload.template.kubevirt.io/desktop: \"true\"\n flavor.template.kubevirt.io/large: \"true\"\n template.kubevirt.io/type: \"base\"\n template.kubevirt.io/version: \"0.11.3\"\nobjects:\n- apiVersion: kubevirt.io/v1\n kind: VirtualMachine\n metadata:\n name: ${NAME}\n labels:\n vm.kubevirt.io/template: fedora-desktop-large\n vm.kubevirt.io/template.version: \"0.11.3\"\n vm.kubevirt.io/template.revision: \"45\"\n app: ${NAME}\n annotations:\n vm.kubevirt.io/os: \"fedora\"\n vm.kubevirt.io/workload: \"desktop\"\n vm.kubevirt.io/flavor: \"large\"\n vm.kubevirt.io/validations: |\n [\n {\n \"name\": \"minimal-required-memory\",\n \"path\": \"jsonpath::.spec.domain.resources.requests.memory\",\n \"rule\": \"integer\",\n \"message\": \"This VM requires more memory.\",\n \"min\": 1073741824\n }\n ]\n spec:\n dataVolumeTemplates:\n - apiVersion: cdi.kubevirt.io/v1beta1\n kind: DataVolume\n metadata:\n name: ${NAME}\n spec:\n storage:\n accessModes:\n - ReadWriteMany\n source:\n pvc:\n name: ${SRC_PVC_NAME}\n namespace: ${SRC_PVC_NAMESPACE}\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/domain: ${NAME}\n kubevirt.io/size: large\n spec:\n domain:\n cpu:\n sockets: 2\n cores: 1\n threads: 1\n resources:\n requests:\n memory: 8Gi\n devices:\n rng: {}\n networkInterfaceMultiqueue: true\n inputs:\n - type: tablet\n bus: virtio\n name: tablet\n disks:\n - disk:\n bus: virtio\n name: ${NAME}\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - masquerade: {}\n name: default\n terminationGracePeriodSeconds: 180\n networks:\n - name: default\n pod: {}\n volumes:\n - dataVolume:\n name: ${NAME}\n name: ${NAME}\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n user: fedora\n password: ${CLOUD_USER_PASSWORD}\n chpasswd: { expire: False }\n name: cloudinitdisk\nparameters:\n- description: VM name\n from: 'fedora-[a-z0-9]{16}'\n generate: expression\n name: NAME\n- name: SRC_PVC_NAME\n description: Name of the PVC to clone\n value: 'fedora'\n- name: SRC_PVC_NAMESPACE\n description: Namespace of the source PVC\n value: kubevirt-os-images\n- description: Randomized password for the cloud-init user fedora\n from: '[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}'\n generate: expression\n name: CLOUD_USER_PASSWORD\n
Note that the template above defines free parameters (NAME
, SRC_PVC_NAME
, SRC_PVC_NAMESPACE
, CLOUD_USER_PASSWORD
) and the NAME
parameter does not have specified default value.
An OpenShift template has to be converted into the JSON file via oc process
command, that also allows you to set the template parameters.
A complete example can be found in the KubeVirt repository.
!> You need to be logged in by oc login
command.
$ oc process -f cluster/vmi-template-fedora.yaml\\\n -p NAME=testvmi \\\n -p SRC_PVC_NAME=fedora \\\n -p SRC_PVC_NAMESPACE=kubevirt \\\n{\n \"kind\": \"List\",\n \"apiVersion\": \"v1\",\n \"metadata\": {},\n \"items\": [\n {\n
The JSON file is usually applied directly by piping the processed output to oc create
command.
$ oc process -f cluster/examples/vm-template-fedora.yaml \\\n -p NAME=testvm \\\n -p SRC_PVC_NAME=fedora \\\n -p SRC_PVC_NAMESPACE=kubevirt \\\n | oc create -f -\nvirtualmachine.kubevirt.io/testvm created\n
The command above results in creating a Kubernetes object according to the specification given by the template \\(in this example it is an instance of the VirtualMachine object\\).
It's possible to get list of available parameters using the following command:
$ oc process -f dist/templates/fedora-desktop-large.yaml --parameters\nNAME DESCRIPTION GENERATOR VALUE\nNAME VM name expression fedora-[a-z0-9]{16}\nSRC_PVC_NAME Name of the PVC to clone fedora\nSRC_PVC_NAMESPACE Namespace of the source PVC kubevirt-os-images\nCLOUD_USER_PASSWORD Randomized password for the cloud-init user fedora expression [a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}\n
"},{"location":"user_workloads/templates/#starting-virtual-machine-from-the-created-object","title":"Starting virtual machine from the created object","text":"The created object is now a regular VirtualMachine object and from now it can be controlled by accessing Kubernetes API resources. The preferred way how to do this from within the OpenShift environment is to use oc patch
command.
$ oc patch virtualmachine testvm --type merge -p '{\"spec\":{\"running\":true}}'\nvirtualmachine.kubevirt.io/testvm patched\n
Do not forget about virtctl tool. Using it in the real cases instead of using kubernetes API can be more convenient. Example:
$ virtctl start testvm\nVM testvm was scheduled to start\n
As soon as VM starts, Kubernetes creates new type of object - VirtualMachineInstance. It has similar name to VirtualMachine. Example (not full output, it's too big):
$ kubectl describe vm testvm\nname: testvm\nNamespace: myproject\nLabels: kubevirt-vm=vm-testvm\n kubevirt.io/os=fedora33\nAnnotations: <none>\nAPI Version: kubevirt.io/v1\nKind: VirtualMachine\n
"},{"location":"user_workloads/templates/#cloud-init-script-and-parameters","title":"Cloud-init script and parameters","text":"Kubevirt VM templates, just like kubevirt VM/VMI yaml configs, supports cloud-init scripts
"},{"location":"user_workloads/templates/#hack-use-pre-downloaded-image","title":"Hack - use pre-downloaded image","text":"Kubevirt VM templates, just like kubevirt VM/VMI yaml configs, can use pre-downloaded VM image, which can be a useful feature especially in the debug/development/testing cases. No special parameters required in the VM template or VM/VMI yaml config. The main idea is to create Kubernetes PersistentVolume and PersistentVolumeClaim corresponding to existing image in the file system. Example:
---\nkind: PersistentVolume\napiVersion: v1\nmetadata:\n name: mypv\n labels:\n type: local\nspec:\n storageClassName: manual\n capacity:\n storage: 10G\n accessModes:\n - ReadWriteOnce\n hostPath:\n path: \"/mnt/sda1/images/testvm\"\n---\nkind: PersistentVolumeClaim\napiVersion: v1\nmetadata:\n name: mypvc\nspec:\n storageClassName: manual\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 10G\n
"},{"location":"user_workloads/templates/#using-datavolumes","title":"Using DataVolumes","text":"Kubevirt VM templates are using dataVolumeTemplates. Before using dataVolumes, CDI has to be installed in cluster. After that, source Datavolume can be created.
---\napiVersion: cdi.kubevirt.io/v1beta1\nkind: DataVolume\nmetadata:\n name: fedora-datavolume-original\n namespace: kubevirt\nspec:\n source:\n registry:\n url: \"image_url\"\n storage:\n resources:\n requests:\n storage: 30Gi\n
After import is completed, VM can be created:
$ oc process -f cluster/examples/vm-template-fedora.yaml \\\n -p NAME=testvmi \\\n -p SRC_PVC_NAME=fedora-datavolume-original \\\n -p SRC_PVC_NAMESPACE=kubevirt \\\n | oc create -f -\nvirtualmachine.kubevirt.io/testvm created\n
"},{"location":"user_workloads/templates/#additional-information","title":"Additional information","text":"You can follow Virtual Machine Lifecycle Guide for further reference.
"},{"location":"user_workloads/virtctl_client_tool/","title":"Download and Install the virtctl Command Line Interface","text":""},{"location":"user_workloads/virtctl_client_tool/#download-the-virtctl-client-tool","title":"Download thevirtctl
client tool","text":"Basic VirtualMachineInstance operations can be performed with the stock kubectl
utility. However, the virtctl
binary utility is required to use advanced features such as:
It also provides convenience commands for:
Starting and stopping VirtualMachineInstances
Live migrating VirtualMachineInstances and canceling live migrations
Uploading virtual machine disk images
There are two ways to get it:
the most recent version of the tool can be retrieved from the official release page
it can be installed as a kubectl
plugin using krew
Example:
export VERSION==$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)\nwget https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/virtctl-${VERSION}-linux-amd64\n
"},{"location":"user_workloads/virtctl_client_tool/#install-virtctl-with-krew","title":"Install virtctl
with krew
","text":"It is required to install krew
plugin manager beforehand. If krew
is installed, virtctl
can be installed via krew
:
$ kubectl krew install virt\n
Then virtctl
can be used as a kubectl plugin. For a list of available commands run:
$ kubectl virt help\n
Every occurrence throughout this guide of
$ ./virtctl <command>...\n
should then be read as
$ kubectl virt <command>...\n
"},{"location":"user_workloads/virtual_machine_instances/","title":"Virtual Machines Instances","text":"The VirtualMachineInstance
type conceptionally has two parts:
Information for making scheduling decisions
Information about the virtual machine API
Every VirtualMachineInstance
object represents a single running virtual machine instance.
With the installation of KubeVirt, new types are added to the Kubernetes API to manage Virtual Machines.
You can interact with the new resources (via kubectl
) as you would with any other API resource.
Note: A full API reference is available at https://kubevirt.io/api-reference/.
Here is an example of a VirtualMachineInstance object:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 30\n domain:\n resources:\n requests:\n memory: 1024M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: emptydisk\n disk:\n bus: virtio\n - disk:\n bus: virtio\n name: cloudinitdisk\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - name: emptydisk\n emptyDisk:\n capacity: \"2Gi\"\n - name: cloudinitdisk\n cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n
This example uses a fedora cloud image in combination with cloud-init and an ephemeral empty disk with a capacity of 2Gi
. For the sake of simplicity, the volume sources in this example are ephemeral and don't require a provisioner in your cluster.
Using instancetypes and preferences with a VirtualMachine: Instancetypes and preferences
More information about persistent and ephemeral volumes: Disks and Volumes
How to access a VirtualMachineInstance via console
or vnc
: Console Access
How to customize VirtualMachineInstances with cloud-init
: Cloud Init
In KubeVirt, the VM rollout strategy defines how changes to a VM object affect a running guest. In other words, it defines when and how changes to a VM object get propagated to its corresponding VMI object.
There are currently 2 rollout strategies: LiveUpdate
and Stage
. Only 1 can be specified and the default is Stage
.
As long as the VMLiveUpdateFeatures
is not enabled, the VM Rollout Strategy is ignored and defaults to \"Stage\". The feature gate is set in the KubeVirt custom resource (CR) like that:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - VMLiveUpdateFeatures\n
"},{"location":"user_workloads/vm_rollout_strategies/#liveupdate","title":"LiveUpdate","text":"The LiveUpdate
VM rollout strategy tries to propagate VM object changes to running VMIs as soon as possible. For example, changing the number of CPU sockets will trigger a CPU hotplug.
Enable the LiveUpdate
VM rollout strategy in the KubeVirt CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmRolloutStrategy: \"LiveUpdate\"\n
"},{"location":"user_workloads/vm_rollout_strategies/#stage","title":"Stage","text":"The Stage
VM rollout strategy stages every change made to the VM object until its next reboot.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmRolloutStrategy: \"Stage\"\n
"},{"location":"user_workloads/vm_rollout_strategies/#restartrequired-condition","title":"RestartRequired condition","text":"Any change made to a VM object when the rollout strategy is Stage
will trigger the RestartRequired
VM condition. When the rollout strategy is LiveUpdate
, only non-propagatable changes will trigger the condition.
Once the RestartRequired
condition is set on a VM object, no further changes can be propagated, even if the strategy is set to LiveUpdate
. Changes will become effective on next reboot, and the condition will be removed.
The current implementation has the following limitations:
RestartRequired
condition is set, the only way to get rid of it is to restart the VM. In the future, we plan on implementing a way to get rid of it by reverting the VM template spec to its last non-RestartRequired state.RestartRequired
condition comes with a message stating what kind of change triggered the condition (CPU/memory/other). That message pertains only to the first change that triggered the condition. Additional changes that would usually trigger the condition will just get staged and no additional RestartRequired
condition will be added.Purpose of this document is to explain how to install virtio drivers for Microsoft Windows running in a fully virtualized guest.
"},{"location":"user_workloads/windows_virtio_drivers/#do-i-need-virtio-drivers","title":"Do I need virtio drivers?","text":"Yes. Without the virtio drivers, you cannot use paravirtualized hardware properly. It would either not work, or will have a severe performance penalty.
For more information about VirtIO and paravirtualization, see VirtIO and paravirtualization
For more details on configuring your VirtIO driver please refer to Installing VirtIO driver on a new Windows virtual machine and Installing VirtIO driver on an existing Windows virtual machine.
"},{"location":"user_workloads/windows_virtio_drivers/#which-drivers-i-need-to-install","title":"Which drivers I need to install?","text":"There are usually up to 8 possible devices that are required to run Windows smoothly in a virtualized environment. KubeVirt currently supports only:
viostor, the block driver, applies to SCSI Controller in the Other devices group.
viorng, the entropy source driver, applies to PCI Device in the Other devices group.
NetKVM, the network driver, applies to Ethernet Controller in the Other devices group. Available only if a virtio NIC is configured.
Other virtio drivers, that exists and might be supported in the future:
Balloon, the balloon driver, applies to PCI Device in the Other devices group
vioserial, the paravirtual serial driver, applies to PCI Simple Communications Controller in the Other devices group.
vioscsi, the SCSI block driver, applies to SCSI Controller in the Other devices group.
qemupciserial, the emulated PCI serial driver, applies to PCI Serial Port in the Other devices group.
qxl, the paravirtual video driver, applied to Microsoft Basic Display Adapter in the Display adapters group.
pvpanic, the paravirtual panic driver, applies to Unknown device in the Other devices group.
Note
Some drivers are required in the installation phase. When you are installing Windows onto the virtio block storage you have to provide an appropriate virtio driver. Namely, choose viostor driver for your version of Microsoft Windows, eg. does not install XP driver when you run Windows 10.
Other drivers can be installed after the successful windows installation. Again, please install only drivers matching your Windows version.
"},{"location":"user_workloads/windows_virtio_drivers/#how-to-install-during-windows-install","title":"How to install during Windows install?","text":"To install drivers before the Windows starts its install, make sure you have virtio-win package attached to your VirtualMachine as SATA CD-ROM. In the Windows installation, choose advanced install and load driver. Then please navigate to loaded Virtio CD-ROM and install one of viostor or vioscsi, depending on whichever you have set up.
Step by step screenshots:
"},{"location":"user_workloads/windows_virtio_drivers/#how-to-install-after-windows-install","title":"How to install after Windows install?","text":"After windows install, please go to Device Manager. There you should see undetected devices in \"available devices\" section. You can install virtio drivers one by one going through this list.
For more details on how to choose a proper driver and how to install the driver, please refer to the Windows Guest Virtual Machines on Red Hat Enterprise Linux 7.
"},{"location":"user_workloads/windows_virtio_drivers/#how-to-obtain-virtio-drivers","title":"How to obtain virtio drivers?","text":"The virtio Windows drivers are distributed in a form of containerDisk, which can be simply mounted to the VirtualMachine. The container image, containing the disk is located at: https://quay.io/repository/kubevirt/virtio-container-disk?tab=tags and the image be pulled as any other docker container:
docker pull quay.io/kubevirt/virtio-container-disk\n
However, pulling image manually is not required, it will be downloaded if not present by Kubernetes when deploying VirtualMachine.
"},{"location":"user_workloads/windows_virtio_drivers/#attaching-to-virtualmachine","title":"Attaching to VirtualMachine","text":"KubeVirt distributes virtio drivers for Microsoft Windows in a form of container disk. The package contains the virtio drivers and QEMU guest agent. The disk was tested on Microsoft Windows Server 2012. Supported Windows version is XP and up.
The package is intended to be used as CD-ROM attached to the virtual machine with Microsoft Windows. It can be used as SATA CDROM during install phase or to provide drivers in an existing Windows installation.
Attaching the virtio-win package can be done simply by adding ContainerDisk to you VirtualMachine.
spec:\n domain:\n devices:\n disks:\n - name: virtiocontainerdisk\n # Any other disk you want to use, must go before virtioContainerDisk.\n # KubeVirt boots from disks in order ther are defined.\n # Therefore virtioContainerDisk, must be after bootable disk.\n # Other option is to choose boot order explicitly:\n # - https://kubevirt.io/api-reference/v0.13.2/definitions.html#_v1_disk\n # NOTE: You either specify bootOrder explicitely or sort the items in\n # disks. You can not do both at the same time.\n # bootOrder: 2\n cdrom:\n bus: sata\nvolumes:\n - containerDisk:\n image: quay.io/kubevirt/virtio-container-disk\n name: virtiocontainerdisk\n
Once you are done installing virtio drivers, you can remove virtio container disk by simply removing the disk from yaml specification and restarting the VirtualMachine.
"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-,:!=\\[\\]\\(\\)\"/]+|\\.(?!\\d)","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Welcome","text":"The KubeVirt User Guide is divided into the following sections:
Kubevirt on Killercoda: https://killercoda.com/kubevirt
Kubevirt on Minikube: https://kubevirt.io/quickstart_minikube/
Kubevirt on Kind: https://kubevirt.io/quickstart_kind/
Kubevirt on cloud providers: https://kubevirt.io/quickstart_cloud/
Use KubeVirt
Experiment with Containerized Data Importer (CDI)
Experiment with KubeVirt Upgrades
Live Migration
File a bug: https://github.com/kubevirt/kubevirt/issues
Mailing list: https://groups.google.com/forum/#!forum/kubevirt-dev
Slack: https://kubernetes.slack.com/messages/virtualization
Start contributing: Contributing
API Reference: http://kubevirt.io/api-reference/
Check our privacy policy at: https://kubevirt.io/privacy/
We do use https://netlify.com Open Source Plan for Rendering Pull Requests to the documentation repository
KubeVirt is built using a service oriented architecture and a choreography pattern.
"},{"location":"architecture/#stack","title":"Stack","text":" +---------------------+\n | KubeVirt |\n~~+---------------------+~~\n | Orchestration (K8s) |\n +---------------------+\n | Scheduling (K8s) |\n +---------------------+\n | Container Runtime |\n~~+---------------------+~~\n | Operating System |\n +---------------------+\n | Virtual(kvm) |\n~~+---------------------+~~\n | Physical |\n +---------------------+\n
Users requiring virtualization services are speaking to the Virtualization API (see below) which in turn is speaking to the Kubernetes cluster to schedule requested Virtual Machine Instances (VMIs). Scheduling, networking, and storage are all delegated to Kubernetes, while KubeVirt provides the virtualization functionality.
"},{"location":"architecture/#additional-services","title":"Additional Services","text":"KubeVirt provides additional functionality to your Kubernetes cluster, to perform virtual machine management
If we recall how Kubernetes is handling Pods, then we remember that Pods are created by posting a Pod specification to the Kubernetes API Server. This specification is then transformed into an object inside the API Server, this object is of a specific type or kind - that is how it's called in the specification. A Pod is of the type Pod
. Controllers within Kubernetes know how to handle these Pod objects. Thus once a new Pod object is seen, those controllers perform the necessary actions to bring the Pod alive, and to match the required state.
This same mechanism is used by KubeVirt. Thus KubeVirt delivers three things to provide the new functionality:
Once all three steps have been completed, you are able to
virt-handler
- is taking care of a host - alongside the kubelet
- to launch the VMI and configure it until it matches the required state.One final note; both controllers and daemons are running as Pods (or similar) on top of the Kubernetes cluster, and are not installed alongside it. The type is - as said before - even defined inside the Kubernetes API server. This allows users to speak to Kubernetes, but modify VMIs.
The following diagram illustrates how the additional controllers and daemons communicate with Kubernetes and where the additional types are stored:
And a simplified version:
"},{"location":"architecture/#application-layout","title":"Application Layout","text":"VirtualMachineInstance (VMI) is the custom resource that represents the basic ephemeral building block of an instance. In a lot of cases this object won't be created directly by the user but by a high level resource. High level resources for VMI can be:
KubeVirt is deployed on top of a Kubernetes cluster. This means that you can continue to run your Kubernetes-native workloads next to the VMIs managed through KubeVirt.
Furthermore: if you can run native workloads, and you have KubeVirt installed, you should be able to run VM-based workloads, too. For example, Application Operators should not require additional permissions to use cluster features for VMs, compared to using that feature with a plain Pod.
Security-wise, installing and using KubeVirt must not grant users any permission they do not already have regarding native workloads. For example, a non-privileged Application Operator must never gain access to a privileged Pod by using a KubeVirt feature.
"},{"location":"architecture/#the-razor","title":"The Razor","text":"We love virtual machines, think that they are very important and work hard to make them easy to use in Kubernetes. But even more than VMs, we love good design and modular, reusable components. Quite frequently, we face a dilemma: should we solve a problem in KubeVirt in a way that is best optimized for VMs, or should we take a longer path and introduce the solution to Pod-based workloads too?
To decide these dilemmas we came up with the KubeVirt Razor: \"If something is useful for Pods, we should not implement it only for VMs\".
For example, we debated how we should connect VMs to external network resources. The quickest way seems to introduce KubeVirt-specific code, attaching a VM to a host bridge. However, we chose the longer path of integrating with Multus and CNI and improving them.
"},{"location":"architecture/#virtualmachine","title":"VirtualMachine","text":"A VirtualMachine
provides additional management capabilities to a VirtualMachineInstance inside the cluster. That includes:
API stability
Start/stop/restart capabilities on the controller level
Offline configuration change with propagation on VirtualMachineInstance recreation
Ensure that the VirtualMachineInstance is running if it should be running
It focuses on a 1:1 relationship between the controller instance and a virtual machine instance. In many ways it is very similar to a StatefulSet with spec.replica
set to 1
.
A VirtualMachine will make sure that a VirtualMachineInstance object with an identical name will be present in the cluster, if spec.running
is set to true
. Further it will make sure that a VirtualMachineInstance will be removed from the cluster if spec.running
is set to false
.
There exists a field spec.runStrategy
which can also be used to control the state of the associated VirtualMachineInstance object. To avoid confusing and contradictory states, these fields are mutually exclusive.
An extended explanation of spec.runStrategy
vs spec.running
can be found in Run Strategies
After creating a VirtualMachine it can be switched on or off like this:
# Start the virtual machine:\nvirtctl start vm\n\n# Stop the virtual machine:\nvirtctl stop vm\n
kubectl
can be used too:
# Start the virtual machine:\nkubectl patch virtualmachine vm --type merge -p \\\n '{\"spec\":{\"running\":true}}'\n\n# Stop the virtual machine:\nkubectl patch virtualmachine vm --type merge -p \\\n '{\"spec\":{\"running\":false}}'\n
Find more details about a VM's life-cycle in the relevant section
"},{"location":"architecture/#controller-status","title":"Controller status","text":"Once a VirtualMachineInstance is created, its state will be tracked via status.created
and status.ready
fields of the VirtualMachine. If a VirtualMachineInstance exists in the cluster, status.created
will equal true
. If the VirtualMachineInstance is also ready, status.ready
will equal true
too.
If a VirtualMachineInstance reaches a final state but the spec.running
equals true
, the VirtualMachine controller will set status.ready
to false
and re-create the VirtualMachineInstance.
Additionally, the status.printableStatus
field provides high-level summary information about the state of the VirtualMachine. This information is also displayed when listing VirtualMachines using the CLI:
$ kubectl get virtualmachines\nNAME AGE STATUS VOLUME\nvm1 4m Running\nvm2 11s Stopped\n
Here's the list of states currently supported and their meanings. Note that states may be added/removed in future releases, so caution should be used if consumed by automated programs.
A VirtualMachineInstance restart can be triggered by deleting the VirtualMachineInstance. This will also propagate configuration changes from the template in the VirtualMachine:
# Restart the virtual machine (you delete the instance!):\nkubectl delete virtualmachineinstance vm\n
To restart a VirtualMachine named vm using virtctl:
$ virtctl restart vm\n
This would perform a normal restart for the VirtualMachineInstance and would reschedule the VirtualMachineInstance on a new virt-launcher Pod
To force restart a VirtualMachine named vm using virtctl:
$ virtctl restart vm --force --grace-period=0\n
This would try to perform a normal restart, and would also delete the virt-launcher Pod of the VirtualMachineInstance with setting GracePeriodSeconds to the seconds passed in the command.
Currently, only setting grace-period=0 is supported.
Note
Force restart can cause data corruption, and should be used in cases of kernel panic or VirtualMachine being unresponsive to normal restarts.
"},{"location":"architecture/#fencing-considerations","title":"Fencing considerations","text":"A VirtualMachine will never restart or re-create a VirtualMachineInstance until the current instance of the VirtualMachineInstance is deleted from the cluster.
"},{"location":"architecture/#exposing-as-a-service","title":"Exposing as a Service","text":"A VirtualMachine can be exposed as a service. The actual service will be available once the VirtualMachineInstance starts without additional interaction.
For example, exposing SSH port (22) as a ClusterIP
service using virtctl
after the VirtualMachine was created, but before it started:
$ virtctl expose virtualmachine vmi-ephemeral --name vmiservice --port 27017 --target-port 22\n
All service exposure options that apply to a VirtualMachineInstance apply to a VirtualMachine.
See Service Objects for more details.
"},{"location":"architecture/#when-to-use-a-virtualmachine","title":"When to use a VirtualMachine","text":""},{"location":"architecture/#when-api-stability-is-required-between-restarts","title":"When API stability is required between restarts","text":"A VirtualMachine
makes sure that VirtualMachineInstance API configurations are consistent between restarts. A classical example are licenses which are bound to the firmware UUID of a virtual machine. The VirtualMachine
makes sure that the UUID will always stay the same without the user having to take care of it.
One of the main benefits is that a user can still make use of defaulting logic, although a stable API is needed.
"},{"location":"architecture/#when-config-updates-should-be-picked-up-on-the-next-restart","title":"When config updates should be picked up on the next restart","text":"If the VirtualMachineInstance configuration should be modifiable inside the cluster and these changes should be picked up on the next VirtualMachineInstance restart. This means that no hotplug is involved.
"},{"location":"architecture/#when-you-want-to-let-the-cluster-manage-your-individual-virtualmachineinstance","title":"When you want to let the cluster manage your individual VirtualMachineInstance","text":"Kubernetes as a declarative system can help you to manage the VirtualMachineInstance. You tell it that you want this VirtualMachineInstance with your application running, the VirtualMachine will try to make sure it stays running.
Note
The current belief is that if it is defined that the VirtualMachineInstance should be running, it should be running. This is different from many classical virtualization platforms, where VMs stay down if they were switched off. Restart policies may be added if needed. Please provide your use-case if you need this!
"},{"location":"architecture/#example","title":"Example","text":"apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-cirros\n name: vm-cirros\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-cirros\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - cloudInitNoCloud:\n userDataBase64: IyEvYmluL3NoCgplY2hvICdwcmludGVkIGZyb20gY2xvdWQtaW5pdCB1c2VyZGF0YScK\n name: cloudinitdisk\n
Saving this manifest into vm.yaml
and submitting it to Kubernetes will create the controller instance:
$ kubectl create -f vm.yaml\nvirtualmachine \"vm-cirros\" created\n
Since spec.running
is set to false
, no vmi will be created:
$ kubectl get vmis\nNo resources found.\n
Let's start the VirtualMachine:
$ virtctl start vm vm-cirros\n
As expected, a VirtualMachineInstance called vm-cirros
got created:
$ kubectl describe vm vm-cirros\nName: vm-cirros\nNamespace: default\nLabels: kubevirt.io/vm=vm-cirros\nAnnotations: <none>\nAPI Version: kubevirt.io/v1\nKind: VirtualMachine\nMetadata:\n Cluster Name:\n Creation Timestamp: 2018-04-30T09:25:08Z\n Generation: 0\n Resource Version: 6418\n Self Link: /apis/kubevirt.io/v1/namespaces/default/virtualmachines/vm-cirros\n UID: 60043358-4c58-11e8-8653-525500d15501\nSpec:\n Running: true\n Template:\n Metadata:\n Creation Timestamp: <nil>\n Labels:\n Kubevirt . Io / Ovmi: vm-cirros\n Spec:\n Domain:\n Devices:\n Disks:\n Disk:\n Bus: virtio\n Name: containerdisk\n Volume Name: containerdisk\n Disk:\n Bus: virtio\n Name: cloudinitdisk\n Volume Name: cloudinitdisk\n Machine:\n Type:\n Resources:\n Requests:\n Memory: 64M\n Termination Grace Period Seconds: 0\n Volumes:\n Name: containerdisk\n Registry Disk:\n Image: kubevirt/cirros-registry-disk-demo:latest\n Cloud Init No Cloud:\n User Data Base 64: IyEvYmluL3NoCgplY2hvICdwcmludGVkIGZyb20gY2xvdWQtaW5pdCB1c2VyZGF0YScK\n Name: cloudinitdisk\nStatus:\n Created: true\n Ready: true\nEvents:\n Type Reason Age From Message\n ---- ------ ---- ---- -------\n Normal SuccessfulCreate 15s virtualmachine-controller Created virtual machine: vm-cirros\n
"},{"location":"architecture/#kubectl-commandline-interactions","title":"Kubectl commandline interactions","text":"Whenever you want to manipulate the VirtualMachine through the commandline you can use the kubectl command. The following are examples demonstrating how to do it.
# Define a virtual machine:\n kubectl create -f vm.yaml\n\n # Start the virtual machine:\n kubectl patch virtualmachine vm --type merge -p \\\n '{\"spec\":{\"running\":true}}'\n\n # Look at virtual machine status and associated events:\n kubectl describe virtualmachine vm\n\n # Look at the now created virtual machine instance status and associated events:\n kubectl describe virtualmachineinstance vm\n\n # Stop the virtual machine instance:\n kubectl patch virtualmachine vm --type merge -p \\\n '{\"spec\":{\"running\":false}}'\n\n # Restart the virtual machine (you delete the instance!):\n kubectl delete virtualmachineinstance vm\n\n # Implicit cascade delete (first deletes the virtual machine and then the virtual machine instance)\n kubectl delete virtualmachine vm\n\n # Explicit cascade delete (first deletes the virtual machine and then the virtual machine instance)\n kubectl delete virtualmachine vm --cascade=true\n\n # Orphan delete (The running virtual machine is only detached, not deleted)\n # Recreating the virtual machine would lead to the adoption of the virtual machine instance\n kubectl delete virtualmachine vm --cascade=false\n
"},{"location":"contributing/","title":"Contributing","text":"Welcome!! And thank you for taking the first step to contributing to the KubeVirt project. On this page you should be able to find all the information required to get started on your contirbution journey, as well as information on how to become a community member and grow into roles of responsibility.
If you think something might be missing from this page, please help us by raising a bug!
"},{"location":"contributing/#prerequisites","title":"Prerequisites","text":"Reviewing the following will prepare you for contributing:
For code contributors:
The following will help you decide where to start:
good-first-issue
for issues that make good entry points.You should familiarize yourself with the following documents, which are critical to being a member of the community:
Killercoda provides an interactive environment for exploring KubeVirt scenarios:
Guides for deploying KubeVirt with different Kubernetes tools:
KubeVirt on minikube
KubeVirt on kind
KubeVirt on cloud providers
Released on: Tue Mar 05 2024
KubeVirt v1.2 is built for Kubernetes v1.29 and additionally supported for the previous two versions. See the KubeVirt support matrix for more information.
"},{"location":"release_notes/#api-change","title":"API change","text":"Status.GuestOSInfo.Version
vmRolloutStrategy
setting to define whether changes to VMs should either be always staged or live-updated when possible.kubevirt.io:default
clusterRole to get,list kubevirtsMachine
Released on: Tue Nov 07 2023
"},{"location":"release_notes/#api-change_1","title":"API change","text":"common-instancetypes
resources can now deployed by virt-operator
using the CommonInstancetypesDeploymentGate
feature gate.spec.config.machineType
in KubeVirt CR.ControllerRevisions
containing instancetype.kubevirt.io
CRDs
are now decorated with labels detailing specific metadata of the underlying stashed objectnodeSelector
and schedulerName
fields have been added to VirtualMachineInstancetype spec.virtctl create clone
marshalling and replacement of kubectl
with kubectl virt
AutoResourceLimits
FeatureGate is enabledkubevirt.io/schedulable
label when finding lowest TSC frequency on the clusterquay.io/kubevirt/network-slirp-binding:20230830_638c60fc8
. On next release (v1.2.0) no default image will be set and registering an image would be mandatory.list
and watch
verbs from virt-controller's RBACinstancetype.kubevirt.io:view
ClusterRole
has been introduced that can be bound to users via a ClusterRoleBinding
to provide read only access to the cluster scoped VirtualMachineCluster{Instancetype,Preference}
resources.kubevirt_vmi_*_usage_seconds
from Gauge to Counterkubevirt_vmi_vcpu_delay_seconds_total
reporting amount of seconds VM spent in waiting in the queue instead of running.kubevirt_vmi_cpu_affinity
and use sum as valuekubevirt_vmi_phase_count
not being createdReleased on: Thu Jul 11 17:39:42 2023 +0000
"},{"location":"release_notes/#api-changes","title":"API changes","text":"podConfigDone
field in favor of a new source option in infoSource
.Name
of a {Instancetype,Preference}Matcher
without also updating the RevisionName
are now rejected.dedicatedCPUPlacement
attribute is once again supported within the VirtualMachineInstancetype
and VirtualMachineClusterInstancetype
CRDs after a recent bugfix improved VirtualMachine
validations, ensuring defaults are applied before any attempt to validate.RUNBOOK_URL_TEMPLATE
for the runbooks URL templateReleased on: Wed Mar 1 16:49:27 2023 +0000
dedicatedCPUPlacement
attribute is once again supported within the VirtualMachineInstancetype
and VirtualMachineClusterInstancetype
CRDs after a recent bugfix improved VirtualMachine
validations, ensuring defaults are applied before any attempt to validate./dev/vhost-vsock
explicitly to ensure that the right vsock module is loadedinferFromVolume
now uses labels instead of annotations to lookup default instance type and preference details from a referenced Volume
. This has changed in order to provide users with a way of looking up suitably decorated resources through these labels before pointing to them within the VirtualMachine
.inferFromVolume
attributes have been introduced to the {Instancetype,Preference}Matchers
of a VirtualMachine
. When provided the Volume
referenced by the attribute is checked for the following annotations with which to populate the {Instancetype,Preference}Matchers
:kubevirt-prometheus-metrics
now sets ClusterIP
to None
to make it a headless service.Timer
is now correctly omitted from Clock
fixing bug #8844.virtqemud
daemon instead of libvirtd
Released on: Thu Feb 11 00:08:46 2023 +0000
Released on: Thu Oct 13 00:24:51 2022 +0000
tlsConfiguration
to Kubevirt ConfigurationDockerSELinuxMCSWorkaround
feature gate before upgradingReleased on: Mon Sep 12 14:00:44 2022 +0000
AutoattachInputDevice
has been added to Devices
allowing an Input
device to be automatically attached to a VirtualMachine
on start up. PreferredAutoattachInputDevice
has also been added to DevicePreferences
allowing users to control this behaviour with a set of preferences.Released on: Thu Aug 18 20:10:29 2022 +0000
VirtualMachine{Flavor,ClusterFlavor}
are renamed to instancetype and VirtualMachine{Instancetype,ClusterInstancetype}
.virtctl expose
ip-family
parameter to be empty value instead of IPv4.VirtualMachine
defines any CPU
or Memory
resource requests.Released on: Thu Jul 14 16:33:25 2022 +0000
ControllerRevisions
of any VirtualMachineFlavorSpec
or VirtualMachinePreferenceSpec
are stored during the initial start of a VirtualMachine
and used for subsequent restarts ensuring changes to the original VirtualMachineFlavor
or VirtualMachinePreference
do not modify the VirtualMachine
and the VirtualMachineInstance
it creates.make generate
to fail when API code comments contain backticks. (#7844, @janeczku)VirtualMachineInstance
at runtime.Released on: Wed Jun 8 14:15:43 2022 +0000
nil
values) of Address
and Driver
fields in XML will be omitted.virtualmachines/migrate
subresource to admin/edit usersDisk
or Filesystem
for each Volume
associated with a VirtualMachine
has been removed. Any Volumes
without a Disk
or Filesystem
defined will have a Disk
defined within the VirtualMachineInstance
at runtime.Released on: Tue May 17 14:55:54 2022 +0000
Released on: Mon May 9 14:02:20 2022 +0000
virtctl scp
to ease copying files from and to VMs and VMIsLiveMigrate
as a workload-update strategy if the LiveMigration
feature gate is not enabled.virtctl ssh
Released on: Fri Apr 8 16:17:56 2022 +0000
KubeVirtComponentExceedsRequestedMemory
alert complaining about many-to-many matching not allowed.--address [ip_address]
when using virtctl vnc
rather than only using 127.0.0.1kubectl logs <vmi-pod>
and kubectl exec <vmi-pod>
.Released on: Tue Mar 8 21:06:59 2022 +0000
Released on: Wed Feb 9 18:01:08 2022 +0000
time.Ticker
in agent poller and fix default values for qemu-*-interval
flagsmigrate_cancel
was added to virtctl. It cancels an active VM migration.Released on: Tue Jan 11 17:27:09 2022 +0000
virtctl
exposed services IPFamilyPolicyType
default to IPFamilyPolicyPreferDualStack
make
and make test
Released on: Wed Dec 15 15:11:55 2021 +0000
Released on: Mon Dec 6 18:26:51 2021 +0000
Released on: Thu Nov 11 15:52:59 2021 +0000
Released on: Tue Oct 19 15:41:10 2021 +0000
Released on: Fri Oct 8 21:12:33 2021 +0000
ssh
command to virtctl
that can be used to open SSH sessions to VMs/VMIs.Released on: Tue Oct 19 15:39:42 2021 +0000
Released on: Wed Sep 8 13:56:47 2021 +0000
Released on: Tue Oct 19 15:38:22 2021 +0000
Released on: Thu Oct 7 12:55:34 2021 +0000
Released on: Thu Aug 12 12:28:02 2021 +0000
Released on: Mon Aug 9 14:20:14 2021 +0000
/portforward
subresource to VirtualMachine
and VirtualMachineInstance
that can tunnel TCP traffic through the API Server using a websocket stream.guestfs
to virtctl--force --gracePeriod 0
Released on: Tue Oct 19 15:36:32 2021 +0000
Released on: Fri Jul 9 15:46:22 2021 +0000
spec.migrations.disableTLS
to the KubeVirt CR to allow disabling encrypted migrations. They stay secure by default.LifeMigrate
and request the invtsc
cpuflag are now live-migrateablemain
for kubevirt/kubevirt
repositoryNotReady
after migration when Istio is used.virtctl start --paused
Released on: Tue Oct 19 15:34:37 2021 +0000
Released on: Thu Jun 10 01:31:52 2021 +0000
Released on: Tue Jun 8 12:09:49 2021 +0000
Released on: Tue Oct 19 15:31:59 2021 +0000
Released on: Thu Aug 12 16:35:43 2021 +0000
--force --gracePeriod 0
Released on: Wed Jul 28 12:13:19 2021 -0400
"},{"location":"release_notes/#v0411","title":"v0.41.1","text":"Released on: Wed Jul 28 12:08:42 2021 -0400
"},{"location":"release_notes/#v0410","title":"v0.41.0","text":"Released on: Wed May 12 14:30:49 2021 +0000
docker save
and docker push
issues with released kubevirt imagesvmIPv6NetworkCIDR
under NetworkSource.pod
to support custom IPv6 CIDR for the vm network when using masquerade binding.Released on: Tue Oct 19 13:33:33 2021 +0000
docker save
issues with kubevirt imagesReleased on: Mon Apr 19 12:25:41 2021 +0000
permittedHostDevices
section will now remove all user-defined host device plugins.Released on: Tue Oct 19 13:29:33 2021 +0000
docker save
issues with kubevirt imagesReleased on: Tue Apr 13 12:10:13 2021 +0000
"},{"location":"release_notes/#v0390","title":"v0.39.0","text":"Released on: Wed Mar 10 14:51:58 2021 +0000
CHECK
RPC call, will not cause VMI pods to enter a failed state.Released on: Tue Oct 19 13:24:57 2021 +0000
Released on: Mon Feb 8 19:00:24 2021 +0000
Released on: Mon Feb 8 13:15:32 2021 +0000
Released on: Wed Jan 27 17:49:36 2021 +0000
Released on: Thu Jan 21 16:20:52 2021 +0000
Released on: Mon Jan 18 17:57:03 2021 +0000
Released on: Mon Feb 22 10:20:40 2021 -0500
"},{"location":"release_notes/#v0361","title":"v0.36.1","text":"Released on: Tue Jan 19 12:30:33 2021 +0100
"},{"location":"release_notes/#v0360","title":"v0.36.0","text":"Released on: Wed Dec 16 14:30:37 2020 +0000
domain
label removed from metric kubevirt_vmi_memory_unused_bytes
Released on: Mon Nov 9 13:08:27 2020 +0000
ip-family
to the virtctl expose
command.virt-launcher
Pods to speed up Pod instantiation and decrease Kubelet load in namespaces with many services.kubectl explain
for Kubevirt resources.Released on: Tue Nov 17 08:13:22 2020 -0500
"},{"location":"release_notes/#v0341","title":"v0.34.1","text":"Released on: Mon Nov 16 08:22:56 2020 -0500
"},{"location":"release_notes/#v0340","title":"v0.34.0","text":"Released on: Wed Oct 7 13:59:50 2020 +0300
bootOrder
will no longer be candidates for boot when using the BIOS bootloader, as documentedconfiguration
key. The usage of the kubevirt-config configMap will be deprecated in the future.customizeComponents
to the kubevirt apiReleased on: Tue Sep 15 14:46:00 2020 +0000
Released on: Tue Aug 11 19:21:56 2020 +0000
Released on: Thu Jul 9 16:08:18 2020 +0300
Released on: Mon Oct 26 11:57:21 2020 -0400
"},{"location":"release_notes/#v0306","title":"v0.30.6","text":"Released on: Wed Aug 12 10:55:31 2020 +0200
"},{"location":"release_notes/#v0305","title":"v0.30.5","text":"Released on: Fri Jul 17 05:26:37 2020 -0400
"},{"location":"release_notes/#v0304","title":"v0.30.4","text":"Released on: Fri Jul 10 07:44:00 2020 -0400
"},{"location":"release_notes/#v0303","title":"v0.30.3","text":"Released on: Tue Jun 30 17:39:42 2020 -0400
"},{"location":"release_notes/#v0302","title":"v0.30.2","text":"Released on: Thu Jun 25 17:05:59 2020 -0400
"},{"location":"release_notes/#v0301","title":"v0.30.1","text":"Released on: Tue Jun 16 13:10:17 2020 -0400
"},{"location":"release_notes/#v0300","title":"v0.30.0","text":"Released on: Fri Jun 5 12:19:57 2020 +0200
Released on: Mon May 25 21:15:30 2020 +0200
"},{"location":"release_notes/#v0291","title":"v0.29.1","text":"Released on: Tue May 19 10:03:27 2020 +0200
"},{"location":"release_notes/#v0290","title":"v0.29.0","text":"Released on: Wed May 6 15:01:57 2020 +0200
Released on: Thu Apr 9 23:01:29 2020 +0200
Released on: Fri Mar 6 22:40:34 2020 +0100
Released on: Tue Apr 14 15:07:04 2020 -0400
"},{"location":"release_notes/#v0264","title":"v0.26.4","text":"Released on: Mon Mar 30 03:43:48 2020 +0200
"},{"location":"release_notes/#v0263","title":"v0.26.3","text":"Released on: Tue Mar 10 08:57:27 2020 -0400
"},{"location":"release_notes/#v0262","title":"v0.26.2","text":"Released on: Tue Mar 3 12:31:56 2020 -0500
"},{"location":"release_notes/#v0261","title":"v0.26.1","text":"Released on: Fri Feb 14 20:42:46 2020 +0100
"},{"location":"release_notes/#v0260","title":"v0.26.0","text":"Released on: Fri Feb 7 09:40:07 2020 +0100
Released on: Mon Jan 13 20:37:15 2020 +0100
Released on: Tue Dec 3 15:34:34 2019 +0100
Released on: Tue Jan 21 13:17:20 2020 -0500
"},{"location":"release_notes/#v0232","title":"v0.23.2","text":"Released on: Fri Jan 10 10:36:36 2020 -0500
"},{"location":"release_notes/#v0231","title":"v0.23.1","text":"Released on: Thu Nov 28 09:36:41 2019 +0100
"},{"location":"release_notes/#v0230","title":"v0.23.0","text":"Released on: Mon Nov 4 16:42:54 2019 +0100
Released on: Thu Oct 10 18:55:08 2019 +0200
Released on: Mon Sep 9 09:59:08 2019 +0200
virtctl migrate
Released on: Thu Oct 3 12:03:40 2019 +0200
"},{"location":"release_notes/#v0207","title":"v0.20.7","text":"Released on: Fri Sep 27 15:21:56 2019 +0200
"},{"location":"release_notes/#v0206","title":"v0.20.6","text":"Released on: Wed Sep 11 06:09:47 2019 -0400
"},{"location":"release_notes/#v0205","title":"v0.20.5","text":"Released on: Thu Sep 5 17:48:59 2019 +0200
"},{"location":"release_notes/#v0204","title":"v0.20.4","text":"Released on: Mon Sep 2 18:55:35 2019 +0200
"},{"location":"release_notes/#v0203","title":"v0.20.3","text":"Released on: Tue Aug 27 16:58:15 2019 +0200
"},{"location":"release_notes/#v0202","title":"v0.20.2","text":"Released on: Tue Aug 20 15:51:07 2019 +0200
"},{"location":"release_notes/#v0201","title":"v0.20.1","text":"Released on: Fri Aug 9 19:48:17 2019 +0200
virtctl
by using the basename of the call, this enables nicer output when installed via krew plugin package managerkubevirt_vm_
to kubevirt_vmi_
to better reflect their purposeReleased on: Fri Aug 9 16:42:41 2019 +0200
virtctl
by using the basename of the call, this enables nicer output when installed via krew plugin package managerkubevirt_vm_
to kubevirt_vmi_
to better reflect their purposeReleased on: Fri Jul 5 12:52:16 2019 +0200
Released on: Thu Jun 13 12:00:56 2019 +0200
"},{"location":"release_notes/#v0180","title":"v0.18.0","text":"Released on: Wed Jun 5 22:25:09 2019 +0200
Released on: Tue Jun 25 07:49:12 2019 -0400
"},{"location":"release_notes/#v0173","title":"v0.17.3","text":"Released on: Wed Jun 19 12:00:45 2019 -0400
"},{"location":"release_notes/#v0172","title":"v0.17.2","text":"Released on: Wed Jun 5 08:12:04 2019 -0400
"},{"location":"release_notes/#v0171","title":"v0.17.1","text":"Released on: Tue Jun 4 14:41:10 2019 -0400
"},{"location":"release_notes/#v0170","title":"v0.17.0","text":"Released on: Mon May 6 16:18:01 2019 +0200
Released on: Thu May 2 23:51:08 2019 +0200
"},{"location":"release_notes/#v0162","title":"v0.16.2","text":"Released on: Fri Apr 26 12:24:33 2019 +0200
"},{"location":"release_notes/#v0161","title":"v0.16.1","text":"Released on: Tue Apr 23 19:31:19 2019 +0200
"},{"location":"release_notes/#v0160","title":"v0.16.0","text":"Released on: Fri Apr 5 23:18:22 2019 +0200
Released on: Tue Mar 5 10:35:08 2019 +0100
Released on: Mon Feb 4 22:04:14 2019 +0100
Released on: Mon Oct 28 17:02:35 2019 -0400
"},{"location":"release_notes/#v0136","title":"v0.13.6","text":"Released on: Wed Sep 25 17:19:44 2019 +0200
"},{"location":"release_notes/#v0135","title":"v0.13.5","text":"Released on: Thu Aug 1 11:25:00 2019 -0400
"},{"location":"release_notes/#v0134","title":"v0.13.4","text":"Released on: Thu Aug 1 09:52:35 2019 -0400
"},{"location":"release_notes/#v0133","title":"v0.13.3","text":"Released on: Mon Feb 4 15:46:48 2019 -0500
"},{"location":"release_notes/#v0132","title":"v0.13.2","text":"Released on: Thu Jan 24 23:24:06 2019 +0100
"},{"location":"release_notes/#v0131","title":"v0.13.1","text":"Released on: Thu Jan 24 11:16:20 2019 +0100
"},{"location":"release_notes/#v0130","title":"v0.13.0","text":"Released on: Tue Jan 15 08:26:25 2019 +0100
Released on: Fri Jan 11 22:22:02 2019 +0100
Released on: Thu Dec 13 10:21:56 2018 +0200
"},{"location":"release_notes/#v0110","title":"v0.11.0","text":"Released on: Thu Dec 6 10:15:51 2018 +0100
Released on: Thu Nov 8 15:21:34 2018 +0100
Released on: Thu Nov 22 17:14:18 2018 +0100
"},{"location":"release_notes/#v095","title":"v0.9.5","text":"Released on: Thu Nov 8 09:57:48 2018 +0100
"},{"location":"release_notes/#v094","title":"v0.9.4","text":"Released on: Wed Nov 7 08:22:14 2018 -0500
"},{"location":"release_notes/#v093","title":"v0.9.3","text":"Released on: Mon Oct 22 09:04:02 2018 -0400
"},{"location":"release_notes/#v092","title":"v0.9.2","text":"Released on: Thu Oct 18 12:14:09 2018 +0200
"},{"location":"release_notes/#v091","title":"v0.9.1","text":"Released on: Fri Oct 5 09:01:51 2018 +0200
"},{"location":"release_notes/#v090","title":"v0.9.0","text":"Released on: Thu Oct 4 14:42:28 2018 +0200
Released on: Thu Sep 6 14:25:22 2018 +0200
Released on: Wed Jul 4 17:41:33 2018 +0200
Released on: Tue Aug 21 17:29:28 2018 +0300
"},{"location":"release_notes/#v063","title":"v0.6.3","text":"Released on: Mon Jul 30 16:14:22 2018 +0200
"},{"location":"release_notes/#v062","title":"v0.6.2","text":"Released on: Wed Jul 4 17:49:37 2018 +0200
Released on: Mon Jun 18 17:07:48 2018 -0400
"},{"location":"release_notes/#v060","title":"v0.6.0","text":"Released on: Mon Jun 11 09:30:28 2018 +0200
Released on: Fri May 4 18:25:32 2018 +0200
Released on: Thu Apr 12 11:46:09 2018 +0200
Released on: Fri Apr 6 16:40:31 2018 +0200
Released on: Thu Mar 8 10:21:57 2018 +0100
Released on: Fri Jan 5 16:30:45 2018 +0100
Released on: Fri Dec 8 20:43:06 2017 +0100
Released on: Tue Nov 7 11:51:45 2017 +0100
Released on: Fri Oct 6 10:21:16 2017 +0200
Released on: Mon Sep 4 21:12:46 2017 +0200
virtctl
KubeVirt has a set of features that are not mature enough to be enabled by default. As such, they are protected by a Kubernetes concept called feature gates.
"},{"location":"cluster_admin/activating_feature_gates/#how-to-activate-a-feature-gate","title":"How to activate a feature gate","text":"You can activate a specific feature gate directly in KubeVirt's CR, by provisioning the following yaml, which uses the LiveMigration
feature gate as an example:
cat << END > enable-feature-gate.yaml\n---\napiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration: \n featureGates:\n - LiveMigration\nEND\n\nkubectl apply -f enable-feature-gate.yaml\n
Alternatively, the existing kubevirt CR can be altered:
kubectl edit kubevirt kubevirt -n kubevirt\n
...\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - DataVolumes\n - LiveMigration\n
Note: the name of the feature gates is case sensitive.
The snippet above assumes KubeVirt is installed in the kubevirt
namespace. Change the namespace to suite your installation.
The list of feature gates (which evolve in time) can be checked directly from the source code.
"},{"location":"cluster_admin/annotations_and_labels/","title":"Annotations and labels","text":"KubeVirt builds on and exposes a number of labels and annotations that either are used for internal implementation needs or expose useful information to API users. This page documents the labels and annotations that may be useful for regular API consumers. This page intentionally does not list labels and annotations that are merely part of internal implementation.
Note: Annotations and labels that are not specific to KubeVirt are also documented here.
"},{"location":"cluster_admin/annotations_and_labels/#kubevirtio","title":"kubevirt.io","text":"Example: kubevirt.io=virt-launcher
Used on: Pod
This label marks resources that belong to KubeVirt. An optional value may indicate which specific KubeVirt component a resource belongs to. This label may be used to list all resources that belong to KubeVirt, for example, to uninstall it from a cluster.
"},{"location":"cluster_admin/annotations_and_labels/#kubevirtioschedulable","title":"kubevirt.io/schedulable","text":"Example: kubevirt.io/schedulable=true
Used on: Node
This label declares whether a particular node is available for scheduling virtual machine instances on it.
"},{"location":"cluster_admin/annotations_and_labels/#kubevirtioheartbeat","title":"kubevirt.io/heartbeat","text":"Example: kubevirt.io/heartbeat=2018-07-03T20:07:25Z
Used on: Node
This annotation is regularly updated by virt-handler to help determine if a particular node is alive and hence should be available for new virtual machine instance scheduling.
"},{"location":"cluster_admin/api_validation/","title":"API Validation","text":"The KubeVirt VirtualMachineInstance API is implemented using a Kubernetes Custom Resource Definition (CRD). Because of this, KubeVirt is able to leverage a couple of features Kubernetes provides in order to perform validation checks on our API as objects created and updated on the cluster.
"},{"location":"cluster_admin/api_validation/#how-api-validation-works","title":"How API Validation Works","text":""},{"location":"cluster_admin/api_validation/#crd-openapiv3-schema","title":"CRD OpenAPIv3 Schema","text":"The KubeVirt API is registered with Kubernetes at install time through a series of CRD definitions. KubeVirt includes an OpenAPIv3 schema in these definitions which indicates to the Kubernetes Apiserver some very basic information about our API, such as what fields are required and what type of data is expected for each value.
This OpenAPIv3 schema validation is installed automatically and requires no thought on the users part to enable.
"},{"location":"cluster_admin/api_validation/#admission-control-webhooks","title":"Admission Control Webhooks","text":"The OpenAPIv3 schema validation is limited. It only validates the general structure of a KubeVirt object looks correct. It does not however verify that the contents of that object make sense.
With OpenAPIv3 validation alone, users can easily make simple mistakes (like not referencing a volume's name correctly with a disk) and the cluster will still accept the object. However, the VirtualMachineInstance will of course not start if these errors in the API exist. Ideally we'd like to catch configuration issues as early as possible and not allow an object to even be posted to the cluster if we can detect there's a problem with the object's Spec.
In order to perform this advanced validation, KubeVirt implements its own admission controller which is registered with kubernetes as an admission controller webhook. This webhook is registered with Kubernetes at install time. As KubeVirt objects are posted to the cluster, the Kubernetes API server forwards Creation requests to our webhook for validation before persisting the object into storage.
Note however that the KubeVirt admission controller requires features to be enabled on the cluster in order to be enabled.
"},{"location":"cluster_admin/api_validation/#enabling-kubevirt-admission-controller-on-kubernetes","title":"Enabling KubeVirt Admission Controller on Kubernetes","text":"When provisioning a new Kubernetes cluster, ensure that both the MutatingAdmissionWebhook and ValidatingAdmissionWebhook values are present in the Apiserver's --admission-control cli argument.
Below is an example of the --admission-control values we use during development
--admission-control='Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota'\n
Note that the old --admission-control flag was deprecated in 1.10 and replaced with --enable-admission-plugins. MutatingAdmissionWebhook and ValidatingAdmissionWebhook are enabled by default.
"},{"location":"cluster_admin/api_validation/#enabling-kubevirt-admission-controller-on-okd","title":"Enabling KubeVirt Admission Controller on OKD","text":"OKD also requires the admission control webhooks to be enabled at install time. The process is slightly different though. With OKD, we enable webhooks using an admission plugin.
These admission control plugins can be configured in openshift-ansible by setting the following value in ansible inventory file.
openshift_master_admission_plugin_config={\"ValidatingAdmissionWebhook\":{\"configuration\":{\"kind\": \"DefaultAdmissionConfig\",\"apiVersion\": \"v1\",\"disable\": false}},\"MutatingAdmissionWebhook\":{\"configuration\":{\"kind\": \"DefaultAdmissionConfig\",\"apiVersion\": \"v1\",\"disable\": false}}}\n
"},{"location":"cluster_admin/authorization/","title":"Authorization","text":"KubeVirt authorization is performed using Kubernetes's Resource Based Authorization Control system (RBAC). RBAC allows cluster admins to grant access to cluster resources by binding RBAC roles to users.
For example, an admin creates an RBAC role that represents the permissions required to create a VirtualMachineInstance. The admin can then bind that role to users in order to grant them the permissions required to launch a VirtualMachineInstance.
With RBAC roles, admins can grant users targeted access to various KubeVirt features.
"},{"location":"cluster_admin/authorization/#kubevirt-default-rbac-clusterroles","title":"KubeVirt Default RBAC ClusterRoles","text":"KubeVirt comes with a set of predefined RBAC ClusterRoles that can be used to grant users permissions to access KubeVirt Resources.
"},{"location":"cluster_admin/authorization/#default-view-role","title":"Default View Role","text":"The kubevirt.io:view ClusterRole gives users permissions to view all KubeVirt resources in the cluster. The permissions to create, delete, modify or access any KubeVirt resources beyond viewing the resource's spec are not included in this role. This means a user with this role could see that a VirtualMachineInstance is running, but neither shutdown nor gain access to that VirtualMachineInstance via console/VNC.
"},{"location":"cluster_admin/authorization/#default-edit-role","title":"Default Edit Role","text":"The kubevirt.io:edit ClusterRole gives users permissions to modify all KubeVirt resources in the cluster. For example, a user with this role can create new VirtualMachineInstances, delete VirtualMachineInstances, and gain access to both console and VNC.
"},{"location":"cluster_admin/authorization/#default-admin-role","title":"Default Admin Role","text":"The kubevirt.io:admin ClusterRole grants users full permissions to all KubeVirt resources, including the ability to delete collections of resources.
The admin role also grants users access to view and modify the KubeVirt runtime config. This config exists within the Kubevirt Custom Resource under the configuration
key in the namespace the KubeVirt operator is running.
NOTE Users are only guaranteed the ability to modify the kubevirt runtime configuration if a ClusterRoleBinding is used. A RoleBinding will work to provide kubevirt CR access only if the RoleBinding targets the same namespace that the kubevirt CR exists in.
"},{"location":"cluster_admin/authorization/#binding-default-clusterroles-to-users","title":"Binding Default ClusterRoles to Users","text":"The KubeVirt default ClusterRoles are granted to users by creating either a ClusterRoleBinding or RoleBinding object.
"},{"location":"cluster_admin/authorization/#binding-within-all-namespaces","title":"Binding within All Namespaces","text":"With a ClusterRoleBinding, users receive the permissions granted by the role across all namespaces.
"},{"location":"cluster_admin/authorization/#binding-within-single-namespace","title":"Binding within Single Namespace","text":"With a RoleBinding, users receive the permissions granted by the role only within a targeted namespace.
"},{"location":"cluster_admin/authorization/#extending-kubernetes-default-roles-with-kubevirt-permissions","title":"Extending Kubernetes Default Roles with KubeVirt permissions","text":"The aggregated ClusterRole Kubernetes feature facilitates combining multiple ClusterRoles into a single aggregated ClusterRole. This feature is commonly used to extend the default Kubernetes roles with permissions to access custom resources that do not exist in the Kubernetes core.
In order to extend the default Kubernetes roles to provide permission to access KubeVirt resources, we need to add the following labels to the KubeVirt ClusterRoles.
kubectl label clusterrole kubevirt.io:admin rbac.authorization.k8s.io/aggregate-to-admin=true\nkubectl label clusterrole kubevirt.io:edit rbac.authorization.k8s.io/aggregate-to-edit=true\nkubectl label clusterrole kubevirt.io:view rbac.authorization.k8s.io/aggregate-to-view=true\n
By adding these labels, any user with a RoleBinding or ClusterRoleBinding involving one of the default Kubernetes roles will automatically gain access to the equivalent KubeVirt roles as well.
More information about aggregated cluster roles can be found here
"},{"location":"cluster_admin/authorization/#creating-custom-rbac-roles","title":"Creating Custom RBAC Roles","text":"If the default KubeVirt ClusterRoles are not expressive enough, admins can create their own custom RBAC roles to grant user access to KubeVirt resources. The creation of a RBAC role is inclusive only, meaning there's no way to deny access. Instead access is only granted.
Below is an example of what KubeVirt's default admin ClusterRole looks like. A custom RBAC role can be created by reducing the permissions in this example role.
apiVersion: rbac.authorization.k8s.io/v1beta1\nkind: ClusterRole\nmetadata:\n name: my-custom-rbac-role\n labels:\n kubevirt.io: \"\"\nrules:\n - apiGroups:\n - subresources.kubevirt.io\n resources:\n - virtualmachineinstances/console\n - virtualmachineinstances/vnc\n verbs:\n - get\n - apiGroups:\n - kubevirt.io\n resources:\n - virtualmachineinstances\n - virtualmachines\n - virtualmachineinstancepresets\n - virtualmachineinstancereplicasets\n verbs:\n - get\n - delete\n - create\n - update\n - patch\n - list\n - watch\n - deletecollection\n
"},{"location":"cluster_admin/confidential_computing/","title":"Confidential computing","text":""},{"location":"cluster_admin/confidential_computing/#amd-secure-encrypted-virtualization-sev","title":"AMD Secure Encrypted Virtualization (SEV)","text":"FEATURE STATE: KubeVirt v0.49.0 (experimental support)
Secure Encrypted Virtualization (SEV) is a feature of AMD's EPYC CPUs that allows the memory of a virtual machine to be encrypted on the fly.
KubeVirt supports running confidential VMs on AMD EPYC hardware with SEV feature.
"},{"location":"cluster_admin/confidential_computing/#preconditions","title":"Preconditions","text":"In order to run an SEV guest the following condition must be met:
WorkloadEncryptionSEV
feature gate must be enabled.SEV memory encryption can be requested by setting the spec.domain.launchSecurity.sev
element in the VMI definition:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n launchSecurity:\n sev: {}\n firmware:\n bootloader:\n efi:\n secureBoot: false\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n
"},{"location":"cluster_admin/confidential_computing/#current-limitations","title":"Current limitations","text":"If the patch created is invalid KubeVirt will not be able to update or deploy the system. This is intended for special use cases and should not be used unless you know what you are doing.
Valid resource types are: Deployment, DaemonSet, Service, ValidatingWebhookConfiguraton, MutatingWebhookConfiguration, APIService, and CertificateSecret. More information can be found in the API spec.
Example customization patch:
---\napiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n certificateRotateStrategy: {}\n configuration: {}\n customizeComponents:\n patches:\n - resourceType: Deployment\n resourceName: virt-controller\n patch: '[{\"op\": \"remove\", \"path\": \"/spec/template/spec/containers/0/livenessProbe\"}]'\n type: json\n - resourceType: Deployment\n resourceName: virt-controller\n patch: '{\"metadata\":{\"annotations\":{\"patch\": \"true\"}}}'\n type: strategic\n
The above example will update the virt-controller
deployment to have an annotation in it's metadata that says patch: true
and will remove the livenessProbe from the container definition.
If the flags are invalid or become invalid on update the component will not be able to run
By using the customize flag option, whichever component the flags are to be applied to, all default flags will be removed and only the flags specified will be used. The available resources to change the flags on are api
, controller
and handler
. You can find our more details about the API in the API spec.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n certificateRotateStrategy: {}\n configuration: {}\n customizeComponents:\n flags:\n api:\n v: \"5\"\n port: \"8443\"\n console-server-port: \"8186\"\n subresources-only: \"true\"\n
The above example would produce a virt-api
pod with the following command
...\nspec:\n ....\n container:\n - name: virt-api\n command:\n - virt-api\n - --v\n - \"5\"\n - --console-server-port\n - \"8186\"\n - --port\n - \"8443\"\n - --subresources-only\n - \"true\"\n ...\n
"},{"location":"cluster_admin/device_status_on_Arm64/","title":"Device Status on Arm64","text":"This page is based on https://github.com/kubevirt/kubevirt/issues/8916
Devices Description Status on Arm64 DisableHotplug supported Disks sata/ virtio bus support virtio bus Watchdog i6300esb not supported UseVirtioTransitional virtio-transitional supported Interfaces e1000/ virtio-net-device support virtio-net-device Inputs tablet virtio/usb bus supported AutoattachPodInterface connect to /net/tun (devices.kubevirt.io/tun) supported AutoattachGraphicsDevice create a virtio-gpu device / vga device support virtio-gpu AutoattachMemBalloon virtio-balloon-pci-non-transitional supported AutoattachInputDevice auto add tablet supported Rng virtio-rng-pci-non-transitional host:/dev/urandom supported BlockMultiQueue \"driver\":\"virtio-blk-pci-non-transitional\",\"num-queues\":$cpu_number supported NetworkInterfaceMultiQueue -netdev tap,fds=21:23:24:25,vhost=on,vhostfds=26:27:28:29,id=hostua-default#fd number equals to queue number supported GPUs not verified Filesystems virtiofs, vhost-user-fs-pci, need to enable featuregate: ExperimentalVirtiofsSupport supported ClientPassthrough https://www.linaro.org/blog/kvm-pciemsi-passthrough-armarm64/on x86_64, iommu need to be enabled not verified Sound ich9/ ac97 not supported TPM tpm-tis-devicehttps://qemu.readthedocs.io/en/latest/specs/tpm.html supported Sriov vfio-pci not verified"},{"location":"cluster_admin/feature_gate_status_on_Arm64/","title":"Feature Gate Status on Arm64","text":"This page is based on https://github.com/kubevirt/kubevirt/issues/9749 It records the feature gate status on Arm64 platform. Here is the explanation of the status:
-blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-private/downwardapi-disks/vhostmd0\",\"node-name\":\"libvirt-1-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"}
But unable to get information via vm-dump-metrics
:LIBMETRICS: read_mdisk(): Unable to read metrics disk
LIBMETRICS: get_virtio_metrics(): Unable to export metrics: open(/dev/virtio-ports/org.github.vhostmd.1) No such file or directory
LIBMETRICS: get_virtio_metrics(): Unable to read metrics
NonRootDeprecated Supported NonRoot Supported Root Supported ClusterProfiler Supported WorkloadEncryptionSEV Not supported SEV is only available on x86_64 VSOCKGate Supported HotplugNetworkIfacesGate Not supported yet Need to setup multus-cni and multus-dynamic-networks-controller: https://github.com/k8snetworkplumbingwg/multus-cni cat ./deployments/multus-daemonset-thick.yml \\| kubectl apply -f -
https://github.com/k8snetworkplumbingwg/multus-dynamic-networks-controller kubectl apply -f manifests/dynamic-networks-controller.yaml
Currently, the image ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick does not support Arm64 server. For more information please refer to https://github.com/k8snetworkplumbingwg/multus-cni/pull/1027. CommonInstancetypesDeploymentGate Not supported yet Support of common-instancetypes instancetypes needs to be tested, common-instancetypes preferences for ARM workloads are still missing"},{"location":"cluster_admin/gitops/","title":"Managing KubeVirt with GitOps","text":"The GitOps way uses Git repositories as a single source of truth to deliver infrastructure as code. Automation is employed to keep the desired and the live state of clusters in sync at all times. This means any change to a repository is automatically applied to one or more clusters while changes to a cluster will be automatically reverted to the state described in the single source of truth.
With GitOps the separation of testing and production environments, improving the availability of applications and working with multi-cluster environments becomes considerably easier.
"},{"location":"cluster_admin/gitops/#demo-repository","title":"Demo repository","text":"A demo with detailed explanation on how to manage KubeVirt with GitOps can be found here.
The demo is using Open Cluster Management and ArgoCD to deploy KubeVirt and virtual machines across multiple clusters.
"},{"location":"cluster_admin/installation/","title":"Installation","text":"KubeVirt is a virtualization add-on to Kubernetes and this guide assumes that a Kubernetes cluster is already installed.
If installed on OKD, the web console is extended for management of virtual machines.
"},{"location":"cluster_admin/installation/#requirements","title":"Requirements","text":"A few requirements need to be met before you can begin:
--allow-privileged=true
in order to run KubeVirt's privileged DaemonSet.kubectl
client utilityKubeVirt is currently supported on the following container runtimes:
Other container runtimes, which do not use virtualization features, should work too. However, the mentioned ones are the main target.
"},{"location":"cluster_admin/installation/#integration-with-apparmor","title":"Integration with AppArmor","text":"In most of the scenarios, KubeVirt can run normally on systems with AppArmor. However, there are several known use cases that may require additional user interaction.
On a system with AppArmor enabled, the locally installed profiles may block the execution of the KubeVirt privileged containers. That usually results in initialization failure of the virt-handler
pod:
$ kubectl get pods -n kubevirt\nNAME READY STATUS RESTARTS AGE\nvirt-api-77df5c4f87-7mqv4 1/1 Running 1 (17m ago) 27m\nvirt-api-77df5c4f87-wcq44 1/1 Running 1 (17m ago) 27m\nvirt-controller-749d8d99d4-56gb7 1/1 Running 1 (17m ago) 27m\nvirt-controller-749d8d99d4-78j6x 1/1 Running 1 (17m ago) 27m\nvirt-handler-4w99d 0/1 Init:Error 14 (5m18s ago) 27m\nvirt-operator-564f568975-g9wh4 1/1 Running 1 (17m ago) 31m\nvirt-operator-564f568975-wnpz8 1/1 Running 1 (17m ago) 31m\n\n$ kubectl logs -n kubevirt virt-handler-4w99d virt-launcher\nerror: failed to get emulator capabilities\n\nerror: internal error: Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: libvirt: error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied\n\n$ journalctl -b | grep DEN\n...\nMay 18 16:44:20 debian audit[6316]: AVC apparmor=\"DENIED\" operation=\"exec\" profile=\"libvirtd\" name=\"/usr/libexec/qemu-kvm\" pid=6316 comm=\"rpc-worker\" requested_mask=\"x\" denied_mask=\"x\" fsuid=107 ouid=0\nMay 18 16:44:20 debian kernel: audit: type=1400 audit(1652888660.539:39): apparmor=\"DENIED\" operation=\"exec\" profile=\"libvirtd\" name=\"/usr/libexec/qemu-kvm\" pid=6316 comm=\"rpc-worker\" requested_mask=\"x\" denied_mask=\"x\" fsuid=107 ouid=0\n...\n
Here, the host AppArmor profile for libvirtd
does not allow the execution of the /usr/libexec/qemu-kvm
binary. In the future this will hopefully work out of the box (tracking issue), but until then there are a couple of possible workarounds.
The first (and simplest) one is to remove the libvirt package from the host: assuming the host is a dedicated Kubernetes node, you likely won't need it anyway.
If you actually need libvirt to be present on the host, then you can add the following rule to the AppArmor profile for libvirtd (usually /etc/apparmor.d/usr.sbin.libvirtd
):
# vim /etc/apparmor.d/usr.sbin.libvirtd\n...\n/usr/libexec/qemu-kvm PUx,\n...\n# apparmor_parser -r /etc/apparmor.d/usr.sbin.libvirtd # or systemctl reload apparmor.service\n
The default AppArmor profile used by the container runtimes usually denies mount
call for the workloads. That may prevent from running VMs with VirtIO-FS. This is a known issue. The current workaround is to run such a VM as unconfined
by adding the following annotation to the VM or VMI object:
annotations:\n container.apparmor.security.beta.kubernetes.io/compute: unconfined\n
Hardware with virtualization support is recommended. You can use virt-host-validate to ensure that your hosts are capable of running virtualization workloads:
$ virt-host-validate qemu\n QEMU: Checking for hardware virtualization : PASS\n QEMU: Checking if device /dev/kvm exists : PASS\n QEMU: Checking if device /dev/kvm is accessible : PASS\n QEMU: Checking if device /dev/vhost-net exists : PASS\n QEMU: Checking if device /dev/net/tun exists : PASS\n...\n
"},{"location":"cluster_admin/installation/#selinux-support","title":"SELinux support","text":"SELinux-enabled nodes need Container-selinux installed. The minimum version is documented inside the kubevirt/kubevirt repository, in docs/getting-started.md, under \"SELinux support\".
For (older) release branches that don't specify a container-selinux version, version 2.170.0 or newer is recommended.
"},{"location":"cluster_admin/installation/#installing-kubevirt-on-kubernetes","title":"Installing KubeVirt on Kubernetes","text":"KubeVirt can be installed using the KubeVirt operator, which manages the lifecycle of all the KubeVirt core components. Below is an example of how to install KubeVirt's latest official release. It supports to deploy KubeVirt on both x86_64 and Arm64 platforms.
# Point at latest release\n$ export RELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)\n# Deploy the KubeVirt operator\n$ kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml\n# Create the KubeVirt CR (instance deployment request) which triggers the actual installation\n$ kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml\n# wait until all KubeVirt components are up\n$ kubectl -n kubevirt wait kv kubevirt --for condition=Available\n
If hardware virtualization is not available, then a software emulation fallback can be enabled using by setting in the KubeVirt CR spec.configuration.developerConfiguration.useEmulation
to true
as follows:
$ kubectl edit -n kubevirt kubevirt kubevirt\n
Add the following to the kubevirt.yaml
file
spec:\n ...\n configuration:\n developerConfiguration:\n useEmulation: true\n
Note: Prior to release v0.20.0 the condition for the kubectl wait
command was named \"Ready\" instead of \"Available\"
Note: Prior to KubeVirt 0.34.2 a ConfigMap called kubevirt-config
in the install-namespace was used to configure KubeVirt. Since 0.34.2 this method is deprecated. The configmap still has precedence over configuration
on the CR exists, but it will not receive future updates and you should migrate any custom configurations to spec.configuration
on the KubeVirt CR.
All new components will be deployed under the kubevirt
namespace:
kubectl get pods -n kubevirt\nNAME READY STATUS RESTARTS AGE\nvirt-api-6d4fc3cf8a-b2ere 1/1 Running 0 1m\nvirt-controller-5d9fc8cf8b-n5trt 1/1 Running 0 1m\nvirt-handler-vwdjx 1/1 Running 0 1m\n...\n
"},{"location":"cluster_admin/installation/#installing-kubevirt-on-okd","title":"Installing KubeVirt on OKD","text":"The following SCC needs to be added prior KubeVirt deployment:
$ oc adm policy add-scc-to-user privileged -n kubevirt -z kubevirt-operator\n
Once privileges are granted, the KubeVirt can be deployed as described above.
"},{"location":"cluster_admin/installation/#web-user-interface-on-okd","title":"Web user interface on OKD","text":"No additional steps are required to extend OKD's web console for KubeVirt.
The virtualization extension is automatically enabled when KubeVirt deployment is detected.
"},{"location":"cluster_admin/installation/#from-service-catalog-as-an-apb","title":"From Service Catalog as an APB","text":"You can find KubeVirt in the OKD Service Catalog and install it from there. In order to do that please follow the documentation in the KubeVirt APB repository.
"},{"location":"cluster_admin/installation/#installing-kubevirt-on-k3os","title":"Installing KubeVirt on k3OS","text":"The following configuration needs to be added to all nodes prior KubeVirt deployment:
k3os:\n modules:\n - kvm\n - vhost_net\n
Once nodes are restarted with this configuration, the KubeVirt can be deployed as described above.
"},{"location":"cluster_admin/installation/#installing-the-daily-developer-builds","title":"Installing the Daily Developer Builds","text":"KubeVirt releases daily a developer build from the current main branch. One can see when the last release happened by looking at our nightly-build-jobs.
To install the latest developer build, run the following commands:
$ LATEST=$(curl -L https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/latest)\n$ kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-operator.yaml\n$ kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-cr.yaml\n
To find out which commit this build is based on, run:
$ LATEST=$(curl -L https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/latest)\n$ curl https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/commit\nd358cf085b5a86cc4fa516215f8b757a4e61def2\n
"},{"location":"cluster_admin/installation/#arm64-developer-builds","title":"ARM64 developer builds","text":"ARM64 developer builds can be installed like this:
$ LATEST=$(curl -L https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/latest-arm64)\n$ kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-operator-arm64.yaml\n$ kubectl apply -f https://storage.googleapis.com/kubevirt-prow/devel/nightly/release/kubevirt/kubevirt/${LATEST}/kubevirt-cr-arm64.yaml\n
"},{"location":"cluster_admin/installation/#deploying-from-source","title":"Deploying from Source","text":"See the Developer Getting Started Guide to understand how to build and deploy KubeVirt from source.
"},{"location":"cluster_admin/installation/#installing-network-plugins-optional","title":"Installing network plugins (optional)","text":"KubeVirt alone does not bring any additional network plugins, it just allows user to utilize them. If you want to attach your VMs to multiple networks (Multus CNI) or have full control over L2 (OVS CNI), you need to deploy respective network plugins. For more information, refer to OVS CNI installation guide.
Note: KubeVirt Ansible network playbook installs these plugins by default.
"},{"location":"cluster_admin/installation/#restricting-kubevirt-components-node-placement","title":"Restricting KubeVirt components node placement","text":"You can restrict the placement of the KubeVirt components across your cluster nodes by editing the KubeVirt CR:
.spec.infra.nodePlacement
field in the KubeVirt CR..spec.workloads.nodePlacement
field in the KubeVirt CR.For each of these .nodePlacement
objects, the .affinity
, .nodeSelector
and .tolerations
sub-fields can be configured. See the description in the API reference for further information about using these fields.
For example, to restrict the virt-controller and virt-api pods to only run on the control-plane nodes:
kubectl patch -n kubevirt kubevirt kubevirt --type merge --patch '{\"spec\": {\"infra\": {\"nodePlacement\": {\"nodeSelector\": {\"node-role.kubernetes.io/control-plane\": \"\"}}}}}'\n
To restrict the virt-handler pods to only run on nodes with the \"region=primary\" label:
kubectl patch -n kubevirt kubevirt kubevirt --type merge --patch '{\"spec\": {\"workloads\": {\"nodePlacement\": {\"nodeSelector\": {\"region\": \"primary\"}}}}}'\n
"},{"location":"cluster_admin/ksm/","title":"KSM Management","text":"Kernel Samepage Merging (KSM) allows de-duplication of memory. KSM tries to find identical Memory Pages and merge those to free memory.
Further Information: - KSM (Kernel Samepage Merging) feature - Kernel Same-page Merging (KSM)
"},{"location":"cluster_admin/ksm/#enabling-ksm-through-kubevirt-cr","title":"Enabling KSM through KubeVirt CR","text":"KSM can be enabled on nodes by spec.configuration.ksmConfiguration
in the KubeVirt CR. ksmConfiguration
instructs on which nodes KSM will be enabled, exposing a nodeLabelSelector
. nodeLabelSelector
is a LabelSelector and defines the filter, based on the node labels. If a node's labels match the label selector term, then on that node, KSM will be enabled.
NOTE If nodeLabelSelector
is nil KSM will not be enabled on any nodes. Empty nodeLabelSelector
will enable KSM on every node.
Enabling KSM on nodes in which the hostname is node01
or node03
:
spec:\n configuration:\n ksmConfiguration:\n nodeLabelSelector:\n matchExpressions:\n - key: kubernetes.io/hostname\n operator: In\n values:\n - node01\n - node03\n
Enabling KSM on nodes with labels kubevirt.io/first-label: true
, kubevirt.io/second-label: true
:
spec:\n configuration:\n ksmConfiguration:\n nodeLabelSelector:\n matchLabels:\n kubevirt.io/first-label: \"true\"\n kubevirt.io/second-label: \"true\"\n
Enabling KSM on every node:
spec:\n configuration:\n ksmConfiguration:\n nodeLabelSelector: {}\n
On those nodes where KubeVirt enables the KSM via configuration, an annotation will be added (kubevirt.io/ksm-handler-managed
). This annotation is an internal record to keep track of which nodes are currently managed by virt-handler, so that it is possible to distinguish which nodes should be restored in case of future ksmConfiguration changes.
Let's imagine this scenario:
node01
) has KSM externally enabled.node02
and node03
.Thanks to the annotation, the virt-handler is able to disable ksm on only those nodes where it itself had enabled it(node02
node03
), leaving the others unchanged (node01
).
KubeVirt can discover on which nodes KSM is enabled and will mark them with a special label (kubevirt.io/ksm-enabled
) with value true
. This label can be used to schedule the vms in nodes with KSM enabled or not.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: testvm\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: testvm\n spec:\n nodeSelector:\n kubevirt.io/ksm-enabled: \"true\"\n [...]\n
"},{"location":"cluster_admin/migration_policies/","title":"Migration Policies","text":"Migration policies provides a new way of applying migration configurations to Virtual Machines. The policies can refine Kubevirt CR's MigrationConfiguration
that sets the cluster-wide migration configurations. This way, the cluster-wide settings serve as a default that can be refined (i.e. changed, removed or added) by the migration policy.
Please bear in mind that migration policies are in version v1alpha1
. This means that this API is not fully stable yet and that APIs may change in the future.
KubeVirt supports Live Migrations of Virtual Machine workloads. Before migration policies were introduced, migration settings could be configurable only on the cluster-wide scope by editing KubevirtCR's spec or more specifically MigrationConfiguration CRD.
Several aspects (although not all) of migration behaviour that can be customized are: - Bandwidth - Auto-convergence - Post/Pre-copy - Max number of parallel migrations - Timeout
Migration policies generalize the concept of defining migration configurations, so it would be possible to apply different configurations to specific groups of VMs.
Such capability can be useful for a lot of different use cases on which there is a need to differentiate between different workloads. Differentiation of different configurations could be needed because different workloads are considered to be in different priorities, security segregation, workloads with different requirements, help to converge workloads which aren't migration-friendly, and many other reasons.
"},{"location":"cluster_admin/migration_policies/#api-examples","title":"API Examples","text":""},{"location":"cluster_admin/migration_policies/#migration-configurations","title":"Migration Configurations","text":"Currently the MigrationPolicy spec will only include the following configurations from KubevirtCR's MigrationConfiguration (in the future more configurations that aren't part of Kubevirt CR are intended to be added):
apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\n spec:\n allowAutoConverge: true\n bandwidthPerMigration: 217Ki\n completionTimeoutPerGiB: 23\n allowPostCopy: false\n
All above fields are optional. When omitted, the configuration will be applied as defined in KubevirtCR's MigrationConfiguration. This way, KubevirtCR will serve as a configurable set of defaults for both VMs that are not bound to any MigrationPolicy and VMs that are bound to a MigrationPolicy that does not define all fields of the configurations.
"},{"location":"cluster_admin/migration_policies/#matching-policies-to-vms","title":"Matching Policies to VMs","text":"Next in the spec are the selectors that define the group of VMs on which to apply the policy. The options to do so are the following.
This policy applies to the VMs in namespaces that have all the required labels:
apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\n spec:\n selectors:\n namespaceSelector:\n hpc-workloads: true # Matches a key and a value \n
This policy applies for the VMs that have all the required labels:
apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\n spec:\n selectors:\n virtualMachineInstanceSelector:\n workload-type: db # Matches a key and a value \n
It is also possible to combine the previous two:
apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\n spec:\n selectors:\n namespaceSelector:\n hpc-workloads: true\n virtualMachineInstanceSelector:\n workload-type: db\n
"},{"location":"cluster_admin/migration_policies/#full-manifest","title":"Full Manifest:","text":"apiVersion: migrations.kubevirt.io/v1alpha1\nkind: MigrationPolicy\nmetadata:\n name: my-awesome-policy\nspec:\n # Migration Configuration\n allowAutoConverge: true\n bandwidthPerMigration: 217Ki\n completionTimeoutPerGiB: 23\n allowPostCopy: false\n\n # Matching to VMs\n selectors:\n namespaceSelector:\n hpc-workloads: true\n virtualMachineInstanceSelector:\n workload-type: db\n
"},{"location":"cluster_admin/migration_policies/#policies-precedence","title":"Policies' Precedence","text":"It is possible that multiple policies apply to the same VMI. In such cases, the precedence is in the same order as the bullets above (VMI labels first, then namespace labels). It is not allowed to define two policies with the exact same selectors.
If multiple policies apply to the same VMI: * The most detailed policy will be applied, that is, the policy with the highest number of matching labels
For example, let's imagine a VMI with the following labels:
size: small
os: fedora
gpu: nvidia
And let's say the namespace to which the VMI belongs contains the following labels:
priority: high
bandwidth: medium
hpc-workload: true
The following policies are listed by their precedence (high to low):
1) VMI labels: {size: small, gpu: nvidia}
, Namespace labels: {priority:high, bandwidth: medium}
bandwidth
.2) VMI labels: {size: small, gpu: nvidia}
, Namespace labels: {priority:high, hpc-workload:true}
gpu
.3) VMI labels: {size: small, gpu: nvidia}
, Namespace labels: {priority:high}
gpu
.4) VMI labels: {size: small}
, Namespace labels: {priority:high, hpc-workload:true}
hpc-workload
.5) VMI labels: {gpu: nvidia}
, Namespace labels: {priority:high}
gpu
.6) VMI labels: {gpu: nvidia}
, Namespace labels: {}
gpu
.7) VMI labels: {gpu: intel}
, Namespace labels: {priority:high}
Before removing a kubernetes node from the cluster, users will want to ensure that VirtualMachineInstances have been gracefully terminated before powering down the node. Since all VirtualMachineInstances are backed by a Pod, the recommended method of evicting VirtualMachineInstances is to use the kubectl drain command, or in the case of OKD the oc adm drain command.
"},{"location":"cluster_admin/node_maintenance/#evict-all-vms-from-a-node","title":"Evict all VMs from a Node","text":"Select the node you'd like to evict VirtualMachineInstances from by identifying the node from the list of cluster nodes.
kubectl get nodes
The following command will gracefully terminate all VMs on a specific node. Replace <node-name>
with the name of the node where the eviction should occur.
kubectl drain <node-name> --delete-local-data --ignore-daemonsets=true --force --pod-selector=kubevirt.io=virt-launcher
Below is a break down of why each argument passed to the drain command is required.
kubectl drain <node-name>
is selecting a specific node as a target for the eviction
--delete-local-data
is a required flag that is necessary for removing any pod that utilizes an emptyDir volume. The VirtualMachineInstance Pod does use emptyDir volumes, however the data in those volumes are ephemeral which means it is safe to delete after termination.
--ignore-daemonsets=true
is a required flag because every node running a VirtualMachineInstance will also be running our helper DaemonSet called virt-handler. DaemonSets are not allowed to be evicted using kubectl drain. By default, if this command encounters a DaemonSet on the target node, the command will fail. This flag tells the command it is safe to proceed with the eviction and to just ignore DaemonSets.
--force
is a required flag because VirtualMachineInstance pods are not owned by a ReplicaSet or DaemonSet controller. This means kubectl can't guarantee that the pods being terminated on the target node will get re-scheduled replacements placed else where in the cluster after the pods are evicted. KubeVirt has its own controllers which manage the underlying VirtualMachineInstance pods. Each controller behaves differently to a VirtualMachineInstance being evicted. That behavior is outlined further down in this document.
--pod-selector=kubevirt.io=virt-launcher
means only VirtualMachineInstance pods managed by KubeVirt will be removed from the node.
By removing the -pod-selector
argument from the previous command, we can issue the eviction of all Pods on a node. This command ensures Pods associated with VMs as well as all other Pods are evicted from the target node.
kubectl drain <node name> --delete-local-data --ignore-daemonsets=true --force
If the LiveMigration
feature gate is enabled, it is possible to specify an evictionStrategy
on VMIs which will react with live-migrations on specific taints on nodes. The following snippet on a VMI or the VMI templates in a VM ensures that the VMI is migrated during node eviction:
spec:\n evictionStrategy: LiveMigrate\n
Here a full VMI:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 30\n evictionStrategy: LiveMigrate\n domain:\n resources:\n requests:\n memory: 1024M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - disk:\n bus: virtio\n name: cloudinitdisk\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n
Behind the scenes a PodDisruptionBudget is created for each VMI which has an evictionStrategy defined. This ensures that evictions are be blocked on these VMIs and that we can guarantee that a VMI will be migrated instead of shut off. Note Prior to v0.34 the drain process with live migrations was detached from the kubectl drain
itself and required in addition specifying a special taint on the nodes: kubectl taint nodes foo kubevirt.io/drain=draining:NoSchedule
. This is no longer needed. The taint will still be respected if provided but is obsolete.
The kubectl drain will result in the target node being marked as unschedulable. This means the node will not be eligible for running new VirtualMachineInstances or Pods.
If it is decided that the target node should become schedulable again, the following command must be run.
kubectl uncordon <node name>
or in the case of OKD.
oc adm uncordon <node name>
From KubeVirt's perspective, a node is safe to shutdown once all VirtualMachineInstances have been evicted from the node. In a multi-use cluster where VirtualMachineInstances are being scheduled alongside other containerized workloads, it is up to the cluster admin to ensure all other pods have been safely evicted before powering down the node.
"},{"location":"cluster_admin/node_maintenance/#virtualmachine-evictions","title":"VirtualMachine Evictions","text":"The eviction of any VirtualMachineInstance that is owned by a VirtualMachine set to running=true will result in the VirtualMachineInstance being re-scheduled to another node.
The VirtualMachineInstance in this case will be forced to power down and restart on another node. In the future once KubeVirt introduces live migration support, the VM will be able to seamlessly migrate to another node during eviction.
"},{"location":"cluster_admin/node_maintenance/#virtualmachineinstancereplicaset-eviction-behavior","title":"VirtualMachineInstanceReplicaSet Eviction Behavior","text":"The eviction of VirtualMachineInstances owned by a VirtualMachineInstanceReplicaSet will result in the VirtualMachineInstanceReplicaSet scheduling replacements for the evicted VirtualMachineInstances on other nodes in the cluster.
"},{"location":"cluster_admin/node_maintenance/#virtualmachineinstance-eviction-behavior","title":"VirtualMachineInstance Eviction Behavior","text":"VirtualMachineInstances not backed by either a VirtualMachineInstanceReplicaSet or an VirtualMachine object will not be re-scheduled after eviction.
"},{"location":"cluster_admin/operations_on_Arm64/","title":"Arm64 Operations","text":"This page summarizes all operations that are not supported on Arm64.
"},{"location":"cluster_admin/operations_on_Arm64/#hotplug-network-interfaces","title":"Hotplug Network Interfaces","text":"Hotplug Network Interfaces are not supported on Arm64, because the image ghcr.io/k8snetworkplumbingwg/multus-cni:snapshot-thick does not support for the Arm64 platform. For more information please refer to https://github.com/k8snetworkplumbingwg/multus-cni/pull/1027.
"},{"location":"cluster_admin/operations_on_Arm64/#hotplug-volumes","title":"Hotplug Volumes","text":"Hotplug Volumes are not supported on Arm64, because the Containerized Data Importer is not supported on Arm64 for now.
"},{"location":"cluster_admin/operations_on_Arm64/#hugepages-support","title":"Hugepages support","text":"Hugepages feature is not supported on Arm64. The hugepage mechanism differs between X86_64 and Arm64. Now we only verify KubeVirt on 4k pagesize systems.
"},{"location":"cluster_admin/operations_on_Arm64/#containerized-data-importer","title":"Containerized Data Importer","text":"For now, we have not supported this project on Arm64, but it is in our plan.
"},{"location":"cluster_admin/operations_on_Arm64/#export-api","title":"Export API","text":"Export API is partially supported on the Arm64 platform. As CDI is not supported yet, the export of DataVolumes and MemoryDump are not supported on Arm64.
"},{"location":"cluster_admin/operations_on_Arm64/#virtual-machine-memory-dump","title":"Virtual machine memory dump","text":"As explained above, MemoryDump requires CDI, and is not yet supported on Arm64.
"},{"location":"cluster_admin/operations_on_Arm64/#mediated-devices-and-virtual-gpus","title":"Mediated devices and virtual GPUs","text":"This is not verified on Arm64 platform.
"},{"location":"cluster_admin/scheduler/","title":"KubeVirt Scheduler","text":"Scheduling is the process of matching Pods/VMs to Nodes. By default, the scheduler used is kube-scheduler. Further details can be found at Kubernetes Scheduler Documentation.
Custom schedulers can be used if the default scheduler does not satisfy your needs. For instance, you might want to schedule VMs using a load aware scheduler such as Trimaran Schedulers.
"},{"location":"cluster_admin/scheduler/#creating-a-custom-scheduler","title":"Creating a Custom Scheduler","text":"KubeVirt is compatible with custom schedulers. The configuration steps are described in the Official Kubernetes Documentation. Please note, the Kubernetes version KubeVirt is running on and the Kubernetes version used to build the custom scheduler have to match. To get the Kubernetes version KubeVirt is running on, you can run the following command:
$ kubectl version\nClient Version: version.Info{Major:\"1\", Minor:\"22\", GitVersion:\"v1.22.13\", GitCommit:\"a43c0904d0de10f92aa3956c74489c45e6453d6e\", GitTreeState:\"clean\", BuildDate:\"2022-08-17T18:28:56Z\", GoVersion:\"go1.16.15\", Compiler:\"gc\", Platform:\"linux/amd64\"}\nServer Version: version.Info{Major:\"1\", Minor:\"22\", GitVersion:\"v1.22.13\", GitCommit:\"a43c0904d0de10f92aa3956c74489c45e6453d6e\", GitTreeState:\"clean\", BuildDate:\"2022-08-17T18:23:45Z\", GoVersion:\"go1.16.15\", Compiler:\"gc\", Platform:\"linux/amd64\"}\n
Pay attention to the Server
line. In this case, the Kubernetes version is v1.22.13
. You have to checkout the matching Kubernetes version and build the Kubernetes project:
$ cd kubernetes\n$ git checkout v1.22.13\n$ make\n
Then, you can follow the configuration steps described here. Additionally, the ClusterRole system:kube-scheduler
needs permissions to use the verbs watch
, list
and get
on StorageClasses.
- apiGroups: \n - storage.k8s.io \n resources: \n - storageclasses \n verbs: \n - watch \n - list \n - get \n
"},{"location":"cluster_admin/scheduler/#scheduling-vms-with-the-custom-scheduler","title":"Scheduling VMs with the Custom Scheduler","text":"The second scheduler should be up and running. You can check it with:
$ kubectl get all -n kube-system\n
The deployment my-scheduler
should be up and running if everything is setup properly. In order to launch the VM using the custom scheduler, you need to set the SchedulerName
in the VM's spec to my-scheduler
. Here is an example VM definition:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\nspec:\n running: true\n template:\n spec:\n schedulerName: my-scheduler\n domain:\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n rng: {}\n resources:\n requests:\n memory: 1Gi\n terminationGracePeriodSeconds: 180\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n name: cloudinitdisk\n
In case the specified SchedulerName
does not match any existing scheduler, the virt-launcher
pod will stay in state Pending, until the specified scheduler can be found. You can check if the VM has been scheduled using the my-scheduler
checking the virt-launcher
pod events associated with the VM. The pod should have been scheduled with my-scheduler
. $ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vm-fedora-dpc87 2/2 Running 0 24m\n\n$ kubectl describe pod virt-launcher-vm-fedora-dpc87\n[...] \nEvents:\n Type Reason Age From Message\n ---- ------ ---- ---- -------\n Normal Scheduled 21m my-scheduler Successfully assigned default/virt-launcher-vm-fedora-dpc87 to node01\n[...]\n
"},{"location":"cluster_admin/tekton_tasks/","title":"KubeVirt Tekton","text":""},{"location":"cluster_admin/tekton_tasks/#prerequisites","title":"Prerequisites","text":"KubeVirt-specific Tekton Tasks, which are focused on:
KubeVirt Tekton Tasks and example Pipelines are available in artifacthub.io from where you can easily deploy them to your cluster.
"},{"location":"cluster_admin/tekton_tasks/#existing-tasks","title":"Existing Tasks","text":""},{"location":"cluster_admin/tekton_tasks/#create-virtual-machines","title":"Create Virtual Machines","text":"All these Tasks can be used for creating Pipelines. We prepared example Pipelines which show what can you do with the KubeVirt Tasks.
Windows efi installer - This Pipeline will prepare a Windows 10/11/2k22 datavolume with virtio drivers installed. User has to provide a working link to a Windows 10/11/2k22 iso file. The Pipeline is suitable for Windows versions, which requires EFI (e.g. Windows 10/11/2k22). More information about Pipeline can be found here
Windows customize - This Pipeline will install a SQL server or a VS Code in a Windows VM. More information about Pipeline can be found here
Note
kubevirt-os-images
namespace. baseDvNamespace
attribute in Pipeline), additional RBAC permissions will be required (list of all required RBAC permissions can be found here). KubeVirt has its own node daemon, called virt-handler. In addition to the usual k8s methods of detecting issues on nodes, the virt-handler daemon has its own heartbeat mechanism. This allows for fine-tuned error handling of VirtualMachineInstances.
"},{"location":"cluster_admin/unresponsive_nodes/#virt-handler-heartbeat","title":"virt-handler heartbeat","text":"virt-handler
periodically tries to update the kubevirt.io/schedulable
label and the kubevirt.io/heartbeat
annotation on the node it is running on:
$ kubectl get nodes -o yaml\napiVersion: v1\nitems:\n- apiVersion: v1\n kind: Node\n metadata:\n annotations:\n kubevirt.io/heartbeat: 2018-11-05T09:42:25Z\n creationTimestamp: 2018-11-05T08:55:53Z\n labels:\n beta.kubernetes.io/arch: amd64\n beta.kubernetes.io/os: linux\n cpumanager: \"false\"\n kubernetes.io/hostname: node01\n kubevirt.io/schedulable: \"true\"\n node-role.kubernetes.io/control-plane: \"\"\n
If a VirtualMachineInstance
gets scheduled, the scheduler is only considering nodes where kubevirt.io/schedulable
is true
. This can be seen when looking on the corresponding pod of a VirtualMachineInstance
:
$ kubectl get pods virt-launcher-vmi-nocloud-ct6mr -o yaml\napiVersion: v1\nkind: Pod\nmetadata:\n [...]\nspec:\n [...]\n nodeName: node01\n nodeSelector:\n kubevirt.io/schedulable: \"true\"\n [...]\n
In case there is a communication issue or the host goes down, virt-handler
can't update its labels and annotations any-more. Once the last kubevirt.io/heartbeat
timestamp is older than five minutes, the KubeVirt node-controller kicks in and sets the kubevirt.io/schedulable
label to false
. As a consequence no more VMIs will be schedule to this node until virt-handler is connected again.
In cases where virt-handler
has some issues but the node is in general fine, a VirtualMachineInstance
can be deleted as usual via kubectl delete vmi <myvm>
. Pods of a VirtualMachineInstance
will be told by the cluster-controllers they should shut down. As soon as the Pod is gone, the VirtualMachineInstance
will be moved to Failed
state, if virt-handler
did not manage to update it's heartbeat in the meantime. If virt-handler
could recover in the meantime, virt-handler
will move the VirtualMachineInstance
to failed state instead of the cluster-controllers.
If the whole node is unresponsive, deleting a VirtualMachineInstance
via kubectl delete vmi <myvmi>
alone will never remove the VirtualMachineInstance
. In this case all pods on the unresponsive node need to be force-deleted: First make sure that the node is really dead. Then delete all pods on the node via a force-delete: kubectl delete pod --force --grace-period=0 <mypod>
.
As soon as the pod disappears and the heartbeat from virt-handler timed out, the VMIs will be moved to Failed
state. If they were already marked for deletion they will simply disappear. If not, they can be deleted and will disappear almost immediately.
It takes up to five minutes until the KubeVirt cluster components can detect that virt-handler is unhealthy. During that time-frame it is possible that new VMIs are scheduled to the affected node. If virt-handler is not capable of connecting to these pods on the node, the pods will sooner or later go to failed state. As soon as the cluster finally detects the issue, the VMIs will be set to failed by the cluster.
"},{"location":"cluster_admin/updating_and_deletion/","title":"Updating and deletion","text":""},{"location":"cluster_admin/updating_and_deletion/#updating-kubevirt-control-plane","title":"Updating KubeVirt Control Plane","text":"Zero downtime rolling updates are supported starting with release v0.17.0
onward. Updating from any release prior to the KubeVirt v0.17.0
release is not supported.
Note: Updating is only supported from N-1 to N release.
Updates are triggered one of two ways.
By changing the imageTag value in the KubeVirt CR's spec.
For example, updating from v0.17.0-alpha.1
to v0.17.0
is as simple as patching the KubeVirt CR with the imageTag: v0.17.0
value. From there the KubeVirt operator will begin the process of rolling out the new version of KubeVirt. Existing VM/VMIs will remain uninterrupted both during and after the update succeeds.
$ kubectl patch kv kubevirt -n kubevirt --type=json -p '[{ \"op\": \"add\", \"path\": \"/spec/imageTag\", \"value\": \"v0.17.0\" }]'\n
Or, by updating the kubevirt operator if no imageTag value is set.
When no imageTag value is set in the kubevirt CR, the system assumes that the version of KubeVirt is locked to the version of the operator. This means that updating the operator will result in the underlying KubeVirt installation being updated as well.
$ export RELEASE=v0.26.0\n$ kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml\n
The first way provides a fine granular approach where you have full control over what version of KubeVirt is installed independently of what version of the KubeVirt operator you might be running. The second approach allows you to lock both the operator and operand to the same version.
Newer KubeVirt may require additional or extended RBAC rules. In this case, the #1 update method may fail, because the virt-operator present in the cluster doesn't have these RBAC rules itself. In this case, you need to update the virt-operator
first, and then proceed to update kubevirt. See this issue for more details.
Workload updates are supported as an opt in feature starting with v0.39.0
By default, when KubeVirt is updated this only involves the control plane components. Any existing VirtualMachineInstance (VMI) workloads that are running before an update occurs remain 100% untouched. The workloads continue to run and are not interrupted as part of the default update process.
It's important to note that these VMI workloads do involve components such as libvirt, qemu, and virt-launcher, which can optionally be updated during the KubeVirt update process as well. However that requires opting in to having virt-operator perform automated actions on workloads.
Opting in to VMI updates involves configuring the workloadUpdateStrategy
field on the KubeVirt CR. This field controls the methods virt-operator will use to when updating the VMI workload pods.
There are two methods supported.
LiveMigrate: Which results in VMIs being updated by live migrating the virtual machine guest into a new pod with all the updated components enabled.
Evict: Which results in the VMI's pod being shutdown. If the VMI is controlled by a higher level VirtualMachine object with runStrategy: always
, then a new VMI will spin up in a new pod with updated components.
The least disruptive way to update VMI workloads is to use LiveMigrate. Any VMI workload that is not live migratable will be left untouched. If live migration is not enabled in the cluster, then the only option available for virt-operator managed VMI updates is the Evict method.
Example: Enabling VMI workload updates via LiveMigration
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n imagePullPolicy: IfNotPresent\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - LiveMigrate\n
Example: Enabling VMI workload updates via Evict with batch tunings
The batch tunings allow configuring how quickly VMI's are evicted. In large clusters, it's desirable to ensure that VMI's are evicted in batches in order to distribute load.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n imagePullPolicy: IfNotPresent\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - Evict\n batchEvictionSize: 10\n batchEvictionInterval: \"1m\"\n
Example: Enabling VMI workload updates with both LiveMigrate and Evict
When both LiveMigrate and Evict are specified, then any workloads which are live migratable will be guaranteed to be live migrated. Only workloads which are not live migratable will be evicted.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n imagePullPolicy: IfNotPresent\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - LiveMigrate\n - Evict\n batchEvictionSize: 10\n batchEvictionInterval: \"1m\"\n
"},{"location":"cluster_admin/updating_and_deletion/#deleting-kubevirt","title":"Deleting KubeVirt","text":"To delete the KubeVirt you should first to delete KubeVirt
custom resource and then delete the KubeVirt operator.
$ export RELEASE=v0.17.0\n$ kubectl delete -n kubevirt kubevirt kubevirt --wait=true # --wait=true should anyway be default\n$ kubectl delete apiservices v1.subresources.kubevirt.io # this needs to be deleted to avoid stuck terminating namespaces\n$ kubectl delete mutatingwebhookconfigurations virt-api-mutator # not blocking but would be left over\n$ kubectl delete validatingwebhookconfigurations virt-operator-validator # not blocking but would be left over\n$ kubectl delete validatingwebhookconfigurations virt-api-validator # not blocking but would be left over\n$ kubectl delete -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml --wait=false\n
Note: If by mistake you deleted the operator first, the KV custom resource will get stuck in the Terminating
state, to fix it, delete manually finalizer from the resource.
Note: The apiservice
and the webhookconfigurations
need to be deleted manually due to a bug.
$ kubectl -n kubevirt patch kv kubevirt --type=json -p '[{ \"op\": \"remove\", \"path\": \"/metadata/finalizers\" }]'\n
"},{"location":"cluster_admin/virtual_machines_on_Arm64/","title":"Virtual Machines on Arm64","text":"This page summaries all unsupported Virtual Machines configurations and different default setups on Arm64 platform.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#virtual-hardware","title":"Virtual hardware","text":""},{"location":"cluster_admin/virtual_machines_on_Arm64/#machine-type","title":"Machine Type","text":"Currently, we only support one machine type, virt
, which is set by default.
On Arm64 platform, we only support UEFI boot which is set by default. UEFI secure boot is not supported.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#cpu","title":"CPU","text":""},{"location":"cluster_admin/virtual_machines_on_Arm64/#node-labeller","title":"Node-labeller","text":"Currently, Node-labeller is partially supported on Arm64 platform. It does not yet support parsing virsh_domcapabilities.xml and capabilities.xml, and extracting related information such as CPU features.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#model","title":"Model","text":"host-passthrough
is the only model that supported on Arm64. The CPU model is set by default on Arm64 platform.
kvm
and hyperv
timers are not supported on Arm64 platform.
We do not support vga devices but use virtio-gpu by default.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#hugepages","title":"Hugepages","text":"Hugepages are not supported on Arm64 platform.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#resources-requests-and-limits","title":"Resources Requests and Limits","text":"CPU pinning is supported on Arm64 platform.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#numa","title":"NUMA","text":"As Hugepages are a precondition of the NUMA feature, and Hugepages are not enabled on the Arm64 platform, the NUMA feature does not work on Arm64.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#disks-and-volumes","title":"Disks and Volumes","text":"Arm64 only supports virtio and scsi disk bus types.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#interface-and-networks","title":"Interface and Networks","text":""},{"location":"cluster_admin/virtual_machines_on_Arm64/#macvlan","title":"macvlan","text":"We do not support macvlan
network because the project https://github.com/kubevirt/macvtap-cni does not support Arm64.
This class of devices is not verified on the Arm64 platform.
"},{"location":"cluster_admin/virtual_machines_on_Arm64/#liveness-and-readiness-probes","title":"Liveness and Readiness Probes","text":"Watchdog
device is not supported on Arm64 platform.
KubeVirt included support for redirecting devices from the client's machine to the VMI with the support of virtctl command.
"},{"location":"compute/client_passthrough/#usb-redirection","title":"USB Redirection","text":"Support for redirection of client's USB device was introduced in release v0.44. This feature is not enabled by default. To enable it, add an empty clientPassthrough
under devices, as such:
spec:\n domain:\n devices:\n clientPassthrough: {}\n
This configuration currently adds 4 USB slots to the VMI that can only be used with virtctl.
There are two ways of redirecting the same USB devices: Either using its device's vendor and product information or the actual bus and device address information. In Linux, you can gather this info with lsusb
, a redacted example below:
> lsusb\nBus 002 Device 008: ID 0951:1666 Kingston Technology DataTraveler 100 G3/G4/SE9 G2/50\nBus 001 Device 003: ID 13d3:5406 IMC Networks Integrated Camera\nBus 001 Device 010: ID 0781:55ae SanDisk Corp. Extreme 55AE\n
"},{"location":"compute/client_passthrough/#using-vendor-and-product","title":"Using Vendor and Product","text":"Redirecting the Kingston storage device.
virtctl usbredir 0951:1666 vmi-name\n
"},{"location":"compute/client_passthrough/#using-bus-and-device-address","title":"Using Bus and Device address","text":"Redirecting the integrated camera
virtctl usbredir 01-03 vmi-name\n
"},{"location":"compute/client_passthrough/#requirements-for-virtctl-usbredir","title":"Requirements for virtctl usbredir
","text":"The virtctl
command uses an application called usbredirect
to handle client's USB device by unplugging the device from the Client OS and channeling the communication between the device and the VMI.
The usbredirect
binary comes from the usbredir project and is supported by most Linux distros. You can either fetch the latest release or MSI installer for Windows support.
Managing USB devices requires privileged access in most Operation Systems. The user running virtctl usbredir
would need to be privileged or run it in a privileged manner (e.g: with sudo
)
usbredirect
included in the PATH Enviroment Variable.The CPU hotplug feature was introduced in KubeVirt v1.0, making it possible to configure the VM workload to allow for adding or removing virtual CPUs while the VM is running.
"},{"location":"compute/cpu_hotplug/#abstract","title":"Abstract","text":"A virtual CPU (vCPU) is the CPU that is seen to the Guest VM OS. A VM owner can manage the amount of vCPUs from the VM spec template using the CPU topology fields (spec.template.spec.domain.cpu
). The cpu
object has the integers cores,sockets,threads
so that the virtual CPU is calculated by the following formula: cores * sockets * threads
.
Before CPU hotplug was introduced, the VM owner could change these integers in the VM template while the VM is running, and they were staged until the next boot cycle. With CPU hotplug, it is possible to patch the sockets
integer in the VM template and the change will take effect right away.
Per each new socket that is hot-plugged, the amount of new vCPUs that would be seen by the guest is cores * threads
, since the overall calculation of vCPUs is cores * sockets * threads
.
In order to enable CPU hotplug we need to add the VMLiveUpdateFeatures
feature gate in Kubevirt CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - VMLiveUpdateFeatures\n
"},{"location":"compute/cpu_hotplug/#configure-the-workload-update-strategy","title":"Configure the workload update strategy","text":"Current implementation of the hotplug process requires the VM to live-migrate. The migration will be triggered automatically by the workload updater. The workload update strategy in the KubeVirt CR must be configured with LiveMigrate
, as follows:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - LiveMigrate\n
"},{"location":"compute/cpu_hotplug/#configure-the-vm-rollout-strategy","title":"Configure the VM rollout strategy","text":"Hotplug requires a VM rollout strategy of LiveUpdate
, so that the changes made to the VM object propagate to the VMI without a restart. This is also done in the KubeVirt CR configuration:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmRolloutStrategy: \"LiveUpdate\"\n
More information can be found on the VM Rollout Strategies page
"},{"location":"compute/cpu_hotplug/#optional-set-maximum-sockets-or-hotplug-ratio","title":"[OPTIONAL] Set maximum sockets or hotplug ratio","text":"You can explicitly set the maximum amount of sockets in three ways:
maxSockets = ratio * sockets
).Note: the third way (cluster-level ratio) will also affect other quantitative hotplug resources like memory.
VM level
Cluster level value
Cluster level ratio
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nspec:\n template:\n spec:\n domain:\n cpu:\n maxSockets: 8\n
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n liveUpdateConfiguration:\n maxCpuSockets: 8\n
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n liveUpdateConfiguration:\n maxHotplugRatio: 4\n
The VM-level configuration will take precedence over the cluster-wide configuration.
"},{"location":"compute/cpu_hotplug/#hotplug-process","title":"Hotplug process","text":"Let's assume we have a running VM with the 4 vCPUs, which were configured with sockets:4 cores:1 threads:1
In the VMI status we can observe the current CPU topology the VM is running with:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\n...\nstatus:\n currentCPUTopology:\n cores: 1\n sockets: 4\n threads: 1\n
Now we want to hotplug another socket, by patching the VM object: kubectl patch vm vm-cirros --type='json' \\\n-p='[{\"op\": \"replace\", \"path\": \"/spec/template/spec/domain/cpu/sockets\", \"value\": 5}]'\n
We can observe the CPU hotplug process in the VMI status: status:\n conditions:\n - lastProbeTime: null\n lastTransitionTime: null\n status: \"True\"\n type: LiveMigratable\n - lastProbeTime: null\n lastTransitionTime: null\n status: \"True\"\n type: HotVCPUChange\n currentCPUTopology:\n cores: 1\n sockets: 4\n threads: 1\n
Please note the condition HotVCPUChange
that indicates the hotplug process is taking place. Also you can notice the VirtualMachineInstanceMigration object that was created for the VM in subject:
NAME PHASE VMI\nkubevirt-workload-update-kflnl Running vm-cirros\n
When the hotplug process has completed, the currentCPUTopology
will be updated with the new number of sockets and the migration is marked as successful. #kubectl get vmi vm-cirros -oyaml\n\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: vm-cirros\nspec:\n domain:\n cpu:\n cores: 1\n sockets: 5\n threads: 1\n...\n...\nstatus:\n currentCPUTopology:\n cores: 1\n sockets: 5\n threads: 1\n\n\n#kubectl get vmim -l kubevirt.io/vmi-name=vm-cirros\nNAME PHASE VMI\nkubevirt-workload-update-cgdgd Succeeded vm-cirros\n
"},{"location":"compute/cpu_hotplug/#limitations","title":"Limitations","text":"Certain workloads, requiring a predictable latency and enhanced performance during its execution would benefit from obtaining dedicated CPU resources. KubeVirt, relying on the Kubernetes CPU manager, is able to pin guest's vCPUs to the host's pCPUs.
"},{"location":"compute/dedicated_cpu_resources/#kubernetes-cpu-manager","title":"Kubernetes CPU manager","text":"Kubernetes CPU manager is a mechanism that affects the scheduling of workloads, placing it on a host which can allocate Guaranteed
resources and pin certain Pod's containers to host pCPUs, if the following requirements are met:
Additional information:
Setting spec.domain.cpu.dedicatedCpuPlacement
to true
in a VMI spec will indicate the desire to allocate dedicated CPU resource to the VMI
Kubevirt will verify that all the necessary conditions are met, for the Kubernetes CPU manager to pin the virt-launcher container to dedicated host CPUs. Once, virt-launcher is running, the VMI's vCPUs will be pinned to the pCPUS that has been dedicated for the virt-launcher container.
Expressing the desired amount of VMI's vCPUs can be done by either setting the guest topology in spec.domain.cpu
(sockets
, cores
, threads
) or spec.domain.resources.[requests/limits].cpu
to a whole number integer ([1-9]+) indicating the number of vCPUs requested for the VMI. Number of vCPUs is counted as sockets * cores * threads
or if spec.domain.cpu
is empty then it takes value from spec.domain.resources.requests.cpu
or spec.domain.resources.limits.cpu
.
Note: Users should not specify both spec.domain.cpu
and spec.domain.resources.[requests/limits].cpu
Note: spec.domain.resources.requests.cpu
must be equal to spec.domain.resources.limits.cpu
Note: Multiple cpu-bound microbenchmarks show a significant performance advantage when using spec.domain.cpu.sockets
instead of spec.domain.cpu.cores
.
All inconsistent requirements will be rejected.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n cpu:\n sockets: 2\n cores: 1\n threads: 1\n dedicatedCpuPlacement: true\n resources:\n limits:\n memory: 2Gi\n[...]\n
OR
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n cpu:\n dedicatedCpuPlacement: true\n resources:\n limits:\n cpu: 2\n memory: 2Gi\n[...]\n
"},{"location":"compute/dedicated_cpu_resources/#requesting-dedicated-cpu-for-qemu-emulator","title":"Requesting dedicated CPU for QEMU emulator","text":"A number of QEMU threads, such as QEMU main event loop, async I/O operation completion, etc., also execute on the same physical CPUs as the VMI's vCPUs. This may affect the expected latency of a vCPU. In order to enhance the real-time support in KubeVirt and provide improved latency, KubeVirt will allocate an additional dedicated CPU, exclusively for the emulator thread, to which it will be pinned. This will effectively \"isolate\" the emulator thread from the vCPUs of the VMI. In case ioThreadsPolicy
is set to auto
IOThreads will also be \"isolated\" and placed on the same physical CPU as the QEMU emulator thread.
This functionality can be enabled by specifying isolateEmulatorThread: true
inside VMI spec's Spec.Domain.CPU
section. Naturally, this setting has to be specified in a combination with a dedicatedCpuPlacement: true
.
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n cpu:\n dedicatedCpuPlacement: true\n isolateEmulatorThread: true\n resources:\n limits:\n cpu: 2\n memory: 2Gi\n
"},{"location":"compute/dedicated_cpu_resources/#compute-nodes-with-smt-enabled","title":"Compute Nodes with SMT Enabled","text":"When the following conditions are met:
dedicatedCpuPlacement
and isolateEmulatorThread
are enabledThe VM is scheduled, but rejected by the kubelet with the following event:
SMT Alignment Error: requested 3 cpus not multiple cpus per core = 2\n
In order to address this issue:
AlignCPUs
feature gate in the KubeVirt CR.alpha.kubevirt.io/EmulatorThreadCompleteToEvenParity:\n
KubeVirt will then add one or two dedicated CPUs for the emulator threads, in a way that completes the total CPU count to be even.
"},{"location":"compute/dedicated_cpu_resources/#identifying-nodes-with-a-running-cpu-manager","title":"Identifying nodes with a running CPU manager","text":"At this time, Kubernetes doesn't label the nodes that has CPU manager running on it.
KubeVirt has a mechanism to identify which nodes has the CPU manager running and manually add a cpumanager=true
label. This label will be removed when KubeVirt will identify that CPU manager is no longer running on the node. This automatic identification should be viewed as a temporary workaround until Kubernetes will provide the required functionality. Therefore, this feature should be manually enabled by activating the CPUManager
feature gate to the KubeVirt CR.
When automatic identification is disabled, cluster administrator may manually add the above label to all the nodes when CPU Manager is running.
Nodes' labels are view-able: kubectl describe nodes
Administrators may manually label a missing node: kubectl label node [node_name] cpumanager=true
Note: In order to run sidecar containers, KubeVirt requires the Sidecar
feature gate to be enabled in KubeVirt's CR.
According to the Kubernetes CPU manager model, in order the POD would reach the required QOS level Guaranteed
, all containers in the POD must express CPU and memory requirements. At this time, Kubevirt often uses a sidecar container to mount VMI's registry disk. It also uses a sidecar container of it's hooking mechanism. These additional resources can be viewed as an overhead and should be taken into account when calculating a node capacity.
Note: The current defaults for sidecar's resources: CPU: 200m
Memory: 64M
As the CPU resource is not expressed as a whole number, CPU manager will not attempt to pin the sidecar container to a host CPU.
KubeVirt provides a mechanism for assigning host devices to a virtual machine. This mechanism is generic and allows various types of PCI devices, such as accelerators (including GPUs) or any other devices attached to a PCI bus, to be assigned. It also allows Linux Mediated devices, such as pre-configured virtual GPUs to be assigned using the same mechanism.
"},{"location":"compute/host-devices/#host-preparation-for-pci-passthrough","title":"Host preparation for PCI Passthrough","text":"Host Devices passthrough requires the virtualization extension and the IOMMU extension (Intel VT-d or AMD IOMMU) to be enabled in the BIOS.
To enable IOMMU, depending on the CPU type, a host should be booted with an additional kernel parameter, intel_iommu=on
for Intel and amd_iommu=on
for AMD.
Append these parameters to the end of the GRUB_CMDLINE_LINUX line in the grub configuration file.
# vi /etc/default/grub\n...\nGRUB_CMDLINE_LINUX=\"nofb splash=quiet console=tty0 ... intel_iommu=on\n...\n\n# grub2-mkconfig -o /boot/grub2/grub.cfg\n\n# reboot\n
# modprobe vfio-pci\n
At this time, KubeVirt is only able to assign PCI devices that are using the vfio-pci
driver. To prepare a specific device for device assignment, it should first be unbound from its original driver and bound to the vfio-pci
driver.
$ lspci -DD|grep NVIDIA\n0000.65:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)\n
vfio-pci
driver: echo 0000:65:00.0 > /sys/bus/pci/drivers/nvidia/unbind\necho \"vfio-pci\" > /sys/bus/pci/devices/0000\\:65\\:00.0/driver_override\necho 0000:65:00.0 > /sys/bus/pci/drivers/vfio-pci/bind\n
In general, configuration of a Mediated devices (mdevs), such as vGPUs, should be done according to the vendor directions. KubeVirt can now facilitate the creation of the mediated devices / vGPUs on the cluster nodes. This assumes that the required vendor driver is already installed on the nodes. See the Mediated devices and virtual GPUs to learn more about this functionality.
Once the mdev is configured, KubeVirt will be able to discover and use it for device assignment.
"},{"location":"compute/host-devices/#listing-permitted-devices","title":"Listing permitted devices","text":"Administrators can control which host devices are exposed and permitted to be used in the cluster. Permitted host devices in the cluster will need to be allowlisted in KubeVirt CR by its vendor:product
selector for PCI devices or mediated device names.
configuration:\n permittedHostDevices:\n pciHostDevices:\n - pciVendorSelector: \"10DE:1EB8\"\n resourceName: \"nvidia.com/TU104GL_Tesla_T4\"\n externalResourceProvider: true\n - pciVendorSelector: \"8086:6F54\"\n resourceName: \"intel.com/qat\"\n mediatedDevices:\n - mdevNameSelector: \"GRID T4-1Q\"\n resourceName: \"nvidia.com/GRID_T4-1Q\"\n
pciVendorSelector
is a PCI vendor ID and product ID tuple in the form vendor_id:product_id
. This tuple can identify specific types of devices on a host. For example, the identifier 10de:1eb8
, shown above, can be found using lspci
.
$ lspci -nnv|grep -i nvidia\n65:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)\n
mdevNameSelector
is a name of a Mediated device type that can identify specific types of Mediated devices on a host.
You can see what mediated types a given PCI device supports by examining the contents of /sys/bus/pci/devices/SLOT:BUS:DOMAIN.FUNCTION/mdev_supported_types/TYPE/name
. For example, if you have an NVIDIA T4 GPU on your system, and you substitute in the SLOT
, BUS
, DOMAIN
, and FUNCTION
values that are correct for your system into the above path name, you will see that a TYPE
of nvidia-226
contains the selector string GRID T4-2A
in its name
file.
Taking GRID T4-2A
and specifying it as the mdevNameSelector
allows KubeVirt to find a corresponding mediated device by matching it against /sys/class/mdev_bus/SLOT:BUS:DOMAIN.FUNCTION/$mdevUUID/mdev_type/name
for some values of SLOT:BUS:DOMAIN.FUNCTION
and $mdevUUID
.
External providers: externalResourceProvider
field indicates that this resource is being provided by an external device plugin. In this case, KubeVirt will only permit the usage of this device in the cluster but will leave the allocation and monitoring to an external device plugin.
Host devices can be assigned to virtual machines via the gpus
and hostDevices
fields. The deviceNames
can reference both PCI and Mediated device resource names.
kind: VirtualMachineInstance\nspec:\n domain:\n devices:\n gpus:\n - deviceName: nvidia.com/TU104GL_Tesla_T4\n name: gpu1\n - deviceName: nvidia.com/GRID_T4-1Q\n name: gpu2\n hostDevices:\n - deviceName: intel.com/qat\n name: quickaccess1\n
"},{"location":"compute/host-devices/#nvme-pci-passthrough","title":"NVMe PCI passthrough","text":"In order to passthrough an NVMe device the procedure is very similar to the gpu case. The device needs to be listed under the permittedHostDevice
and under hostDevices
in the VM declaration.
Currently, the KubeVirt device plugin doesn't allow the user to select a specific device by specifying the address. Therefore, if multiple NVMe devices with the same vendor and product id exist in the cluster, they could be randomly assigned to a VM. If the devices are not on the same node, then the nodeSelector mitigates the issue.
Example:
Modify the permittedHostDevice
configuration:\n permittedHostDevices:\n pciHostDevices:\n - pciVendorSelector: 8086:5845\n resourceName: devices.kubevirt.io/nvme\n
VMI declaration:
kind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-nvme\n name: vmi-nvme\nspec:\n nodeSelector: \n kubernetes.io/hostname: node03 # <--\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n hostDevices: # <--\n - name: nvme # <--\n deviceName: devices.kubevirt.io/nvme # <--\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n
"},{"location":"compute/host-devices/#usb-host-passthrough","title":"USB Host Passthrough","text":"Since KubeVirt v1.1, we can provide USB devices that are plugged in a Node to the VM running in the same Node.
"},{"location":"compute/host-devices/#requirements","title":"Requirements","text":"Cluster admin privilege to edit the KubeVirt CR in order to:
HostDevices
feature gatepermittedHostDevices
configuration to expose node USB devices to the clusterIn order to assign USB devices to your VMI, you'll need to expose those devices to the cluster under a resource name. The device allowlist can be edited in KubeVirt CR under configuration.permittedHostDevices.usb
.
For this example, we will use the kubevirt.io/storage
resource name for the device with vendor: \"46f4\"
and product: \"0001\"
1.
spec:\n configuration:\n permittedHostDevices:\n usb:\n - resourceName: kubevirt.io/storage\n selectors:\n - vendor: \"46f4\"\n product: \"0001\"\n
After adding the usb
configuration under permittedHostDevices
to the KubeVirt CR, KubeVirt's device-plugin will expose this resource name and you can use it in your VMI.
Now, in the VMI configuration, you can add the devices.hostDevices.deviceName
and reference the resource name provided in the previous step, and also give it a local name
, for example:
spec:\n domain:\n devices:\n hostDevices:\n - deviceName: kubevirt.io/storage\n name: usb-storage\n
You can find a working example, which uses QEMU's emulated USB storage, under examples/vmi-usb.yaml.
"},{"location":"compute/host-devices/#bundle-of-usb-devices","title":"Bundle of USB devices","text":"You might be interested to redirect more than one USB device to a VMI, for example, a keyboard, a mouse and a smartcard device. The KubeVirt CR supports assigning multiple USB devices under the same resource name, so you could do:
spec:\n configuration:\n permittedHostDevices:\n usb:\n - resourceName: kubevirt.io/peripherals\n selectors:\n - vendor: \"045e\"\n product: \"07a5\"\n - vendor: \"062a\"\n product: \"4102\"\n - vendor: \"072f\"\n product: \"b100\"\n
Adding to the VMI configuration:
spec:\n domain:\n devices:\n hostDevices:\n - deviceName: kubevirt.io/peripherals\n name: local-peripherals \n
Note that all USB devices need to be present in order for the assignment to work.
Note that you can easily find the vendor:product
value with the lsusb
command.\u00a0\u21a9
For hugepages support you need at least Kubernetes version 1.9
.
To enable hugepages on Kubernetes, check the official documentation.
To enable hugepages on OKD, check the official documentation.
"},{"location":"compute/hugepages/#pre-allocate-hugepages-on-a-node","title":"Pre-allocate hugepages on a node","text":"To pre-allocate hugepages on boot time, you will need to specify hugepages under kernel boot parameters hugepagesz=2M hugepages=64
and restart your machine.
You can find more about hugepages under official documentation.
"},{"location":"compute/live_migration/","title":"Live Migration","text":"Live migration is a process during which a running Virtual Machine Instance moves to another compute node while the guest workload continues to run and remain accessible.
"},{"location":"compute/live_migration/#enabling-the-live-migration-support","title":"Enabling the live-migration support","text":"Live migration is enabled by default in recent versions of KubeVirt. Versions prior to v0.56, it must be enabled in the feature gates. The feature gates field in the KubeVirt CR must be expanded by adding the LiveMigration
to it.
Virtual machines using a PersistentVolumeClaim (PVC) must have a shared ReadWriteMany (RWX) access mode to be live migrated.
Live migration is not allowed with a pod network binding of bridge interface type ()
Live migration requires ports 49152, 49153
to be available in the virt-launcher pod. If these ports are explicitly specified in masquarade interface, live migration will not function.
Live migration is initiated by posting a VirtualMachineInstanceMigration (VMIM) object to the cluster. The example below starts a migration process for a virtual machine instance vmi-fedora
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceMigration\nmetadata:\n name: migration-job\nspec:\n vmiName: vmi-fedora\n
"},{"location":"compute/live_migration/#using-virtctl-to-initiate-live-migration","title":"Using virtctl to initiate live migration","text":"Live migration can also be initiated using virtctl
virtctl migrate vmi-fedora\n
"},{"location":"compute/live_migration/#migration-status-reporting","title":"Migration Status Reporting","text":""},{"location":"compute/live_migration/#condition-and-migration-method","title":"Condition and migration method","text":"When starting a virtual machine instance, it has also been calculated whether the machine is live migratable. The result is being stored in the VMI VMI.status.conditions
. The calculation can be based on multiple parameters of the VMI, however, at the moment, the calculation is largely based on the Access Mode
of the VMI volumes. Live migration is only permitted when the volume access mode is set to ReadWriteMany
. Requests to migrate a non-LiveMigratable VMI will be rejected.
The reported Migration Method
is also being calculated during VMI start. BlockMigration
indicates that some of the VMI disks require copying from the source to the destination. LiveMigration
means that only the instance memory will be copied.
Status:\n Conditions:\n Status: True\n Type: LiveMigratable\n Migration Method: BlockMigration\n
"},{"location":"compute/live_migration/#migration-status","title":"Migration Status","text":"The migration progress status is being reported in the VMI VMI.status
. Most importantly, it indicates whether the migration has been Completed
or if it Failed
.
Below is an example of a successful migration.
Migration State:\n Completed: true\n End Timestamp: 2019-03-29T03:37:52Z\n Migration Config:\n Completion Timeout Per GiB: 800\n Progress Timeout: 150\n Migration UID: c64d4898-51d3-11e9-b370-525500d15501\n Source Node: node02\n Start Timestamp: 2019-03-29T04:02:47Z\n Target Direct Migration Node Ports:\n 35001: 0\n 41068: 49152\n 38284: 49153\n Target Node: node01\n Target Node Address: 10.128.0.46\n Target Node Domain Detected: true\n Target Pod: virt-launcher-testvmimcbjgw6zrzcmp8wpddvztvzm7x2k6cjbdgktwv8tkq\n
"},{"location":"compute/live_migration/#canceling-a-live-migration","title":"Canceling a live migration","text":"Live migration can also be canceled by simply deleting the migration object. A successfully aborted migration will indicate that the abort has been requested Abort Requested
, and that it succeeded: Abort Status: Succeeded
. The migration in this case will be Completed
and Failed
.
Migration State:\n Abort Requested: true\n Abort Status: Succeeded\n Completed: true\n End Timestamp: 2019-03-29T04:02:49Z\n Failed: true\n Migration Config:\n Completion Timeout Per GiB: 800\n Progress Timeout: 150\n Migration UID: 57a693d6-51d7-11e9-b370-525500d15501\n Source Node: node02\n Start Timestamp: 2019-03-29T04:02:47Z\n Target Direct Migration Node Ports:\n 39445: 0\n 43345: 49152\n 44222: 49153\n Target Node: node01\n Target Node Address: 10.128.0.46\n Target Node Domain Detected: true\n Target Pod: virt-launcher-testvmimcbjgw6zrzcmp8wpddvztvzm7x2k6cjbdgktwv8tkq\n
"},{"location":"compute/live_migration/#using-virtctl-to-cancel-a-live-migration","title":"Using virtctl to cancel a live migration","text":"Live migration can also be canceled using virtctl, by specifying the name of a VMI which is currently being migrated
virtctl migrate-cancel vmi-fedora\n
"},{"location":"compute/live_migration/#changing-cluster-wide-migration-limits","title":"Changing Cluster Wide Migration Limits","text":"KubeVirt puts some limits in place, so that migrations don't overwhelm the cluster. By default, it is configured to only run 5
migrations in parallel with an additional limit of a maximum of 2
outbound migrations per node. Finally, every migration is limited to a bandwidth of 64MiB/s
.
These values can be changed in the kubevirt
CR:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n migrations:\n parallelMigrationsPerCluster: 5\n parallelOutboundMigrationsPerNode: 2\n bandwidthPerMigration: 64Mi\n completionTimeoutPerGiB: 800\n progressTimeout: 150\n disableTLS: false\n nodeDrainTaintKey: \"kubevirt.io/drain\"\n allowAutoConverge: false\n allowPostCopy: false\n unsafeMigrationOverride: false\n
Bear in mind that most of these configuration can be overridden and fine-tuned to a specified group of VMs. For more information, please see Migration Policies.
"},{"location":"compute/live_migration/#understanding-different-migration-strategies","title":"Understanding different migration strategies","text":"Live migration is a complex process. During a migration, the source VM needs to transfer its whole state (mainly RAM) to the target VM. If there are enough resources available, such as network bandwidth and CPU power, migrations should converge nicely. If this is not the scenario, however, the migration might get stuck without an ability to progress.
The main factor that affects migrations from the guest perspective is its dirty rate
, which is the rate by which the VM dirties memory. Guests with high dirty rate lead to a race during migration. On the one hand, memory would be transferred continuously to the target, and on the other, the same memory would get dirty by the guest. On such scenarios, one could consider to use more advanced migration strategies.
Let's explain the 3 supported migration strategies as of today.
"},{"location":"compute/live_migration/#pre-copy","title":"Pre-copy","text":"Pre-copy is the default strategy. It should be used for most cases.
The way it works is as following:
Pre-copy is the safest and fastest strategy for most cases. Furthermore, it can be easily cancelled, can utilize multithreading, and more. If there is no real reason to use another strategy, this is definitely the strategy to go with.
However, on some cases migrations might not converge easily, that is, by the time the chunk of source VM state would be received by the target VM, it would already be mutated by the source VM (which is the VM the guest executes on). There are many reasons for migrations to fail converging, such as a high dirty-rate or low resources like network bandwidth and CPU. On such scenarios, see the following alternative strategies below.
"},{"location":"compute/live_migration/#post-copy","title":"Post-copy","text":"The way post-copy migrations work is as following:
The main idea here is that the guest starts to run immediately on the target VM. This approach has advantages and disadvantages:
advantages:
disadvantages:
Auto-converge is a technique to help pre-copy migrations converge faster without changing the core algorithm of how the migration works.
Since a high dirty-rate is usually the most significant factor for migrations to not converge, auto-converge simply throttles the guest's CPU. If the migration would converge fast enough, the guest's CPU would not be throttled or throttled negligibly. But, if the migration would not converge fast enough, the CPU would be throttled more and more as time goes.
This technique dramatically increases the probability of the migration converging eventually.
"},{"location":"compute/live_migration/#using-a-different-network-for-migrations","title":"Using a different network for migrations","text":"Live migrations can be configured to happen on a different network than the one Kubernetes is configured to use. That potentially allows for more determinism, control and/or bandwidth, depending on use-cases.
"},{"location":"compute/live_migration/#creating-a-migration-network-on-a-cluster","title":"Creating a migration network on a cluster","text":"A separate physical network is required, meaning that every node on the cluster has to have at least 2 NICs, and the NICs that will be used for migrations need to be interconnected, i.e. all plugged to the same switch. The examples below assume that eth1
will be used for migrations.
It is also required for the Kubernetes cluster to have multus installed.
If the desired network doesn't include a DHCP server, then whereabouts will be needed as well.
Finally, a NetworkAttachmentDefinition needs to be created in the namespace where KubeVirt is installed. Here is an example:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: migration-network\n namespace: kubevirt\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"migration-bridge\",\n \"type\": \"macvlan\",\n \"master\": \"eth1\",\n \"mode\": \"bridge\",\n \"ipam\": {\n \"type\": \"whereabouts\",\n \"range\": \"10.1.1.0/24\"\n }\n }'\n
"},{"location":"compute/live_migration/#configuring-kubevirt-to-migrate-vmis-over-that-network","title":"Configuring KubeVirt to migrate VMIs over that network","text":"This is just a matter of adding the name of the NetworkAttachmentDefinition to the KubeVirt CR, like so:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - LiveMigration\n migrations:\n network: migration-network\n
That change will trigger a restart of the virt-handler pods, as they get connected to that new network.
From now on, migrations will happen over that network.
"},{"location":"compute/live_migration/#configuring-kubevirtci-for-testing-migration-networks","title":"Configuring KubeVirtCI for testing migration networks","text":"Developers and people wanting to test the feature before deploying it on a real cluster might want to configure a dedicated migration network in KubeVirtCI.
KubeVirtCI can simply be configured to include a virtual secondary network, as well as automatically install multus and whereabouts. The following environment variables just have to be declared before running make cluster-up
:
export KUBEVIRT_NUM_NODES=2;\nexport KUBEVIRT_NUM_SECONDARY_NICS=1;\nexport KUBEVIRT_DEPLOY_ISTIO=true;\nexport KUBEVIRT_WITH_CNAO=true\n
"},{"location":"compute/live_migration/#migration-timeouts","title":"Migration timeouts","text":"Depending on the type, the live migration process will copy virtual machine memory pages and disk blocks to the destination. During this process non-locked pages and blocks are being copied and become free for the instance to use again. To achieve a successful migration, it is assumed that the instance will write to the free pages and blocks (pollute the pages) at a lower rate than these are being copied.
"},{"location":"compute/live_migration/#completion-time","title":"Completion time","text":"In some cases the virtual machine can write to different memory pages / disk blocks at a higher rate than these can be copied, which will prevent the migration process from completing in a reasonable amount of time. In this case, live migration will be aborted if it is running for a long period of time. The timeout is calculated base on the size of the VMI, it's memory and the ephemeral disks that are needed to be copied. The configurable parameter completionTimeoutPerGiB
, which defaults to 800s is the time for GiB of data to wait for the migration to be completed before aborting it. A VMI with 8Gib of memory will time out after 6400 seconds.
Live migration will also be aborted when it will be noticed that copying memory doesn't make any progress. The time to wait for live migration to make progress in transferring data is configurable by progressTimeout
parameter, which defaults to 150s
FEATURE STATE: KubeVirt v0.43
Sometimes it may be desirable to disable TLS encryption of migrations to improve performance. Use disableTLS
to do that:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - \"LiveMigration\"\n migrationConfiguration:\n disableTLS: true\n
Note: While this increases performance it may allow MITM attacks. Be careful.
"},{"location":"compute/mediated_devices_configuration/","title":"Mediated devices and virtual GPUs","text":""},{"location":"compute/mediated_devices_configuration/#configuring-mediated-devices-and-virtual-gpus","title":"Configuring mediated devices and virtual GPUs","text":"KubeVirt aims to facilitate the configuration of mediated devices on large clusters. Administrators can use the mediatedDevicesConfiguration
API in the KubeVirt CR to create or remove mediated devices in a declarative way, by providing a list of the desired mediated device types that they expect to be configured in the cluster.
You can also include the nodeMediatedDeviceTypes
option to provide a more specific configuration that targets a specific node or a group of nodes directly with a node selector. The nodeMediatedDeviceTypes
option must be used in combination with mediatedDevicesTypes
in order to override the global configuration set in the mediatedDevicesTypes
section.
KubeVirt will use the provided configuration to automatically create the relevant mdev/vGPU devices on nodes that can support it.
Currently, a single mdev type per card will be configured. The maximum amount of instances of the selected mdev type will be configured per card.
Note: Some vendors, such as NVIDIA, require a driver to be installed on the nodes to provide mediated devices, including vGPUs.
Example snippet of a KubeVirt CR configuration that includes both nodeMediatedDeviceTypes
and mediatedDevicesTypes
:
spec:\n configuration:\n mediatedDevicesConfiguration:\n mediatedDevicesTypes:\n - nvidia-222\n - nvidia-228\n nodeMediatedDeviceTypes:\n - nodeSelector:\n kubernetes.io/hostname: nodeName\n mediatedDevicesTypes:\n - nvidia-234\n
"},{"location":"compute/mediated_devices_configuration/#configuration-scenarios","title":"Configuration scenarios","text":""},{"location":"compute/mediated_devices_configuration/#example-large-cluster-with-multiple-cards-on-each-node","title":"Example: Large cluster with multiple cards on each node","text":"On nodes with multiple cards that can support similar vGPU types, the relevant desired types will be created in a round-robin manner.
For example, considering the following KubeVirt CR configuration:
spec:\n configuration:\n mediatedDevicesConfiguration:\n mediatedDevicesTypes:\n - nvidia-222\n - nvidia-228\n - nvidia-105\n - nvidia-108\n
This cluster has nodes with two different PCIe cards:
Nodes with 3 Tesla T4 cards, where each card can support multiple devices types:
Nodes with 2 Tesla V100 cards, where each card can support multiple device types:
KubeVirt will then create the following devices:
When nodes only have a single card, the first supported type from the list will be configured.
For example, consider the following list of desired types, where nvidia-223 and nvidia-224 are supported:
spec:\n configuration:\n mediatedDevicesConfiguration:\n mediatedDevicesTypes:\n - nvidia-223\n - nvidia-224\n
In this case, nvidia-223 will be configured on the node because it is the first supported type in the list."},{"location":"compute/mediated_devices_configuration/#overriding-configuration-on-a-specifc-node","title":"Overriding configuration on a specifc node","text":"To override the global configuration set by mediatedDevicesTypes
, include the nodeMediatedDeviceTypes
option, specifying the node selector and the mediatedDevicesTypes
that you want to override for that node.
In this example, the KubeVirt CR includes the nodeMediatedDeviceTypes
option to override the global configuration specifically for node 2, which will only use the nvidia-234 type.
spec:\n configuration:\n mediatedDevicesConfiguration:\n mediatedDevicesTypes:\n - nvidia-230\n - nvidia-223\n - nvidia-224\n nodeMediatedDeviceTypes:\n - nodeSelector:\n kubernetes.io/hostname: node2 \n mediatedDevicesTypes:\n - nvidia-234\n
The cluster has two nodes that both have 3 Tesla T4 cards.
KubeVirt will then create the following devices:
Node 1 has been configured in a round-robin manner based on the global configuration but node 2 only uses the nvidia-234 that was specified for it.
"},{"location":"compute/mediated_devices_configuration/#updating-and-removing-vgpu-types","title":"Updating and Removing vGPU types","text":"Changes made to the mediatedDevicesTypes
section of the KubeVirt CR will trigger a re-evaluation of the configured mdevs/vGPU types on the cluster nodes.
Any change to the node labels that match the nodeMediatedDeviceTypes
nodeSelector in the KubeVirt CR will trigger a similar re-evaluation.
Consequently, mediated devices will be reconfigured or entirely removed based on the updated configuration.
"},{"location":"compute/mediated_devices_configuration/#assigning-vgpumdev-to-a-virtual-machine","title":"Assigning vGPU/MDEV to a Virtual Machine","text":"See the Host Devices Assignment to learn how to consume the newly created mediated devices/vGPUs.
"},{"location":"compute/memory_dump/","title":"Virtual machine memory dump","text":"Kubevirt now supports getting a VM memory dump for analysis purposes. The Memory dump can be used to diagnose, identify and resolve issues in the VM. Typically providing information about the last state of the programs, applications and system before they were terminated or crashed.
Note This memory dump is not used for saving VM state and resuming it later.
"},{"location":"compute/memory_dump/#prerequisites","title":"Prerequisites","text":""},{"location":"compute/memory_dump/#hot-plug-feature-gate","title":"Hot plug Feature Gate","text":"The memory dump process mounts a PVC to the virt-launcher in order to get the output in that PVC, hence the hot plug volumes feature gate must be enabled. The feature gates field in the KubeVirt CR must be expanded by adding the HotplugVolumes
to it.
Now lets assume we have a running VM and the name of the VM is 'my-vm'. We can either dump to an existing pvc, or request one to be created.
"},{"location":"compute/memory_dump/#existing-pvc","title":"Existing PVC","text":"The size of the PVC must be big enough to hold the memory dump. The calculation is (VMMemorySize + 100Mi) * FileSystemOverhead, Where VMMemorySize
is the memory size, 100Mi is reserved space for the memory dump overhead and FileSystemOverhead
is the value used to adjust requested PVC size with the filesystem overhead. also the PVC must have a FileSystem
volume mode.
Example for such PVC:
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: my-pvc\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 2Gi\n storageClassName: rook-ceph-block\n volumeMode: Filesystem\n
We can get a memory dump of the VM to the PVC by using the 'memory-dump get' command available with virtctl
$ virtctl memory-dump get my-vm --claim-name=my-pvc\n
"},{"location":"compute/memory_dump/#on-demand-pvc","title":"On demand PVC","text":"For on demand PVC, we need to add --create-claim
flag to the virtctl request:
$ virtctl memory-dump get my-vm --claim-name=new-pvc --create-claim\n
A PVC with size big enough for the dump will be created. We can also request specific storage class and access mode with appropriate flags.
"},{"location":"compute/memory_dump/#download-memory-dump","title":"Download memory dump","text":"By adding the --output
flag, the memory will be dumped to the PVC and then downloaded to the given output path.
$ virtctl memory-dump get myvm --claim-name=memoryvolume --create-claim --output=memoryDump.dump.gz\n
For downloading the last memory dump from the PVC associated with the VM, without triggering another memory dump, use the memory dump download command.
$ virtctl memory-dump download myvm --output=memoryDump.dump.gz\n
For downloading a memory dump from a PVC already disassociated from the VM you can use the virtctl vmexport command
"},{"location":"compute/memory_dump/#monitoring-the-memory-dump","title":"Monitoring the memory dump","text":"Information regarding the memory dump process will be available on the VM's status section
memoryDumpRequest:\n claimName: memory-dump\n phase: Completed\n startTimestamp: \"2022-03-29T11:00:04Z\"\n endTimestamp: \"2022-03-29T11:00:09Z\"\n fileName: my-vm-my-pvc-20220329-110004\n
During the process the volumeStatus on the VMI will be updated with the process information such as the attachment pod information and messages, if all goes well once the process is completed, the PVC is unmounted from the virt-launcher pod and the volumeStatus is deleted. A memory dump annotation will be added to the PVC with the memory dump file name.
"},{"location":"compute/memory_dump/#retriggering-the-memory-dump","title":"Retriggering the memory dump","text":"Getting a new memory dump to the same PVC is possible without the need to use any flag:
$ virtctl memory-dump get my-vm\n
Note Each memory-dump command will delete the previous dump in that PVC.
In order to get a memory dump to a different PVC you need to 'remove' the current memory-dump PVC and then do a new get with the new PVC name.
"},{"location":"compute/memory_dump/#remove-memory-dump","title":"Remove memory dump","text":"As mentioned in order to remove the associated memory dump PVC you need to run a 'memory-dump remove' command. This will allow you to replace the current PVC and get the memory dump to a new one.
$ virtctl memory-dump remove my-vm\n
"},{"location":"compute/memory_dump/#handle-the-memory-dump","title":"Handle the memory dump","text":"Once the memory dump process is completed the PVC will hold the output. You can manage the dump in one of the following ways: - Download the memory dump - Create a pod with troubleshooting tools that will mount the PVC and inspect it within the pod. - Include the memory dump in the VM Snapshot (will include both the memory dump and the disks) to save a snapshot of the VM in that point of time and inspect it when needed. (The VM Snapshot can be exported and downloaded).
The output of the memory dump can be inspected with memory analysis tools for example Volatility3
"},{"location":"compute/memory_hotplug/","title":"Memory Hotplug","text":"Memory hotplug was introduced in KubeVirt version 1.1, enabling the dynamic resizing of the amount of memory available to a running VM.
"},{"location":"compute/memory_hotplug/#limitations","title":"Limitations","text":"To use memory hotplug we need to add the VMLiveUpdateFeatures
feature gate in the KubeVirt CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - VMLiveUpdateFeatures\n
"},{"location":"compute/memory_hotplug/#configure-the-workload-update-strategy","title":"Configure the Workload Update Strategy","text":"Configure LiveMigrate
as workloadUpdateStrategy
in the KubeVirt CR, since the current implementation of the hotplug process requires the VM to live-migrate.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n workloadUpdateStrategy:\n workloadUpdateMethods:\n - LiveMigrate\n
"},{"location":"compute/memory_hotplug/#configure-the-vm-rollout-strategy","title":"Configure the VM rollout strategy","text":"Finally, set the VM rollout strategy to LiveUpdate
, so that the changes made to the VM object propagate to the VMI without a restart. This is also done in the KubeVirt CR configuration:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmRolloutStrategy: \"LiveUpdate\"\n
NOTE: If memory hotplug is enabled/disabled on an already running VM, a reboot is necessary for the changes to take effect.
More information can be found on the VM Rollout Strategies page.
"},{"location":"compute/memory_hotplug/#optional-set-a-cluster-wide-maximum-amount-of-memory","title":"[OPTIONAL] Set a cluster-wide maximum amount of memory","text":"You can set the maximum amount of memory for the guest using a cluster level setting in the KubeVirt CR.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n liveUpdateConfiguration:\n maxGuest: 8Gi\n
The VM-level configuration will take precedence over the cluster-wide one.
"},{"location":"compute/memory_hotplug/#memory-hotplug-in-action","title":"Memory Hotplug in Action","text":"First we enable the VMLiveUpdateFeatures
feature gate, set the rollout strategy to LiveUpdate
and set LiveMigrate
as workloadUpdateStrategy
in the KubeVirt CR.
$ kubectl --namespace kubevirt patch kv kubevirt -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/developerConfiguration/featureGates\", \"value\": [\"VMLiveUpdateFeatures\"]}]' --type='json'\n$ kubectl --namespace kubevirt patch kv kubevirt -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/vmRolloutStrategy\", \"value\": \"LiveUpdate\"}]' --type='json'\n$ kubectl --namespace kubevirt patch kv kubevirt -p='[{\"op\": \"add\", \"path\": \"/spec/workloadUpdateStrategy/workloadUpdateMethods\", \"value\": [\"LiveMigrate\"]}]' --type='json'\n
Now we create a VM with memory hotplug enabled.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-cirros\nspec:\n running: true\n template:\n spec:\n domain:\n memory:\n maxGuest: 2Gi\n guest: 128Mi\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/alpine-container-disk-demo:devel\n name: containerdisk\n
The Virtual Machine will automatically start and once booted it will report the currently available memory to the guest in the status.memory
field inside the VMI.
$ kubectl get vmi vm-cirros -o json | jq .status.memory\n
{\n \"guestAtBoot\": \"128Mi\",\n \"guestCurrent\": \"128Mi\",\n \"guestRequested\": \"128Mi\"\n}\n
Since the Virtual Machine is now running we can patch the VM object to double the available guest memory so that we'll go from 128Mi to 256Mi.
$ kubectl patch vm vm-cirros -p='[{\"op\": \"replace\", \"path\": \"/spec/template/spec/domain/memory/guest\", \"value\": \"256Mi\"}]' --type='json'\n
After the hotplug request is processed and the Virtual Machine is live migrated, the new amount of memory should be available to the guest and visible in the VMI object.
$ kubectl get vmi vm-cirros -o json | jq .status.memory\n
{\n \"guestAtBoot\": \"128Mi\",\n \"guestCurrent\": \"256Mi\",\n \"guestRequested\": \"256Mi\"\n}\n
"},{"location":"compute/node_assignment/","title":"Node assignment","text":"You can constrain the VM to only run on specific nodes or to prefer running on specific nodes:
Setting spec.nodeSelector
requirements, constrains the scheduler to only schedule VMs on nodes, which contain the specified labels. In the following example the vmi contains the labels cpu: slow
and storage: fast
:
metadata:\n name: testvmi-ephemeral\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n nodeSelector:\n cpu: slow\n storage: fast\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
Thus the scheduler will only schedule the vmi to nodes which contain these labels in their metadata. It works exactly like the Pods nodeSelector
. See the Pod nodeSelector Documentation for more examples.
The spec.affinity
field allows specifying hard- and soft-affinity for VMs. It is possible to write matching rules against workloads (VMs and Pods) and Nodes. Since VMs are a workload type based on Pods, Pod-affinity affects VMs as well.
An example for podAffinity
and podAntiAffinity
may look like this:
metadata:\n name: testvmi-ephemeral\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n nodeSelector:\n cpu: slow\n storage: fast\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n affinity:\n podAffinity:\n requiredDuringSchedulingIgnoredDuringExecution:\n - labelSelector:\n matchExpressions:\n - key: security\n operator: In\n values:\n - S1\n topologyKey: failure-domain.beta.kubernetes.io/zone\n podAntiAffinity:\n preferredDuringSchedulingIgnoredDuringExecution:\n - weight: 100\n podAffinityTerm:\n labelSelector:\n matchExpressions:\n - key: security\n operator: In\n values:\n - S2\n topologyKey: kubernetes.io/hostname\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
Affinity and anti-affinity works exactly like the Pods affinity
. This includes podAffinity
, podAntiAffinity
, nodeAffinity
and nodeAntiAffinity
. See the Pod affinity and anti-affinity Documentation for more examples and details.
Affinity as described above, is a property of VMs that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite - they allow a node to repel a set of VMs.
Taints and tolerations work together to ensure that VMs are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any VMs that do not tolerate the taints. Tolerations are applied to VMs, and allow (but do not require) the VMs to schedule onto nodes with matching taints.
You add a taint to a node using kubectl taint. For example,
kubectl taint nodes node1 key=value:NoSchedule\n
An example for tolerations
may look like this:
metadata:\n name: testvmi-ephemeral\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n nodeSelector:\n cpu: slow\n storage: fast\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n tolerations:\n - key: \"key\"\n operator: \"Equal\"\n value: \"value\"\n effect: \"NoSchedule\"\n
"},{"location":"compute/node_assignment/#node-balancing-with-descheduler","title":"Node balancing with Descheduler","text":"In some cases we might need to rebalance the cluster on current scheduling policy and load conditions. Descheduler can find pods, which violates e.g. scheduling decisions and evict them based on descheduler policies. Kubevirt VMs are handled as pods with local storage, so by default, descheduler will not evict them. But it can be easily overridden by adding special annotation to the VMI template in the VM:
spec:\n template:\n metadata:\n annotations:\n descheduler.alpha.kubernetes.io/evict: true\n
This annotation will cause, that the descheduler will be able to evict the VM's pod which can then be scheduled by scheduler on different nodes. A VirtualMachine will never restart or re-create a VirtualMachineInstance until the current instance of the VirtualMachineInstance is deleted from the cluster.
"},{"location":"compute/node_assignment/#live-update","title":"Live update","text":"When the VM rollout strategy is set to LiveUpdate
, changes to a VM's node selector or affinities will dynamically propagate to the VMI (unless the RestartRequired
condition is set). Changes to tolerations will not dynamically propagate, and will trigger a RestartRequired
condition if changed on a running VM.
Modifications of the node selector / affinities will only take effect on next migration, the change alone will not trigger one.
"},{"location":"compute/node_overcommit/","title":"Node overcommit","text":"KubeVirt does not yet support classical Memory Overcommit Management or Memory Ballooning. In other words VirtualMachineInstances can't give back memory they have allocated. However, a few other things can be tweaked to reduce the memory footprint and overcommit the per-VMI memory overhead.
"},{"location":"compute/node_overcommit/#remove-the-graphical-devices","title":"Remove the Graphical Devices","text":"First the safest option to reduce the memory footprint, is removing the graphical device from the VMI by setting spec.domain.devices.autottachGraphicsDevice
to false
. See the video and graphics device documentation for further details and examples.
This will save a constant amount of 16MB
per VirtualMachineInstance but also disable VNC access.
Before you continue, make sure you make yourself comfortable with the Out of Resource Management of Kubernetes.
Every VirtualMachineInstance requests slightly more memory from Kubernetes than what was requested by the user for the Operating System. The additional memory is used for the per-VMI overhead consisting of our infrastructure which is wrapping the actual VirtualMachineInstance process.
In order to increase the VMI density on the node, it is possible to not request the additional overhead by setting spec.domain.resources.overcommitGuestOverhead
to true
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 30\n domain:\n resources:\n overcommitGuestOverhead: true\n requests:\n memory: 1024M\n[...]\n
This will work fine for as long as most of the VirtualMachineInstances will not request the whole memory. That is especially the case if you have short-lived VMIs. But if you have long-lived VirtualMachineInstances or do extremely memory intensive tasks inside the VirtualMachineInstance, your VMIs will use all memory they are granted sooner or later.
"},{"location":"compute/node_overcommit/#overcommit-guest-memory","title":"Overcommit Guest Memory","text":"The third option is real memory overcommit on the VMI. In this scenario the VMI is explicitly told that it has more memory available than what is requested from the cluster by setting spec.domain.memory.guest
to a value higher than spec.domain.resources.requests.memory
.
The following definition requests 1024MB
from the cluster but tells the VMI that it has 2048MB
of memory available:
apiVersion: kubevirt.io/v1alpha3\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 30\n domain:\n resources:\n overcommitGuestOverhead: true\n requests:\n memory: 1024M\n memory:\n guest: 2048M\n[...]\n
For as long as there is enough free memory available on the node, the VMI can happily consume up to 2048MB
. This VMI will get the Burstable
resource class assigned by Kubernetes (See QoS classes in Kubernetes for more details). The same eviction rules like for Pods apply to the VMI in case the node gets under memory pressure.
Implicit memory overcommit is disabled by default. This means that when memory request is not specified, it is set to match spec.domain.memory.guest
. However, it can be enabled using spec.configuration.developerConfiguration.memoryOvercommit
in the kubevirt
CR. For example, by setting memoryOvercommit: \"150\"
we define that when memory request is not explicitly set, it will be implicitly set to achieve memory overcommit of 150%. For instance, when spec.domain.memory.guest: 3072M
, memory request is set to 2048M, if omitted. Note that the actual memory request depends on additional configuration options like OvercommitGuestOverhead.
If the node gets under memory pressure, depending on the kubelet
configuration the virtual machines may get killed by the OOM handler or by the kubelet
itself. It is possible to tweak that behaviour based on the requirements of your VirtualMachineInstances by:
--system-reserved
and --kubelet-reserved
Note: Soft Eviction will effectively shutdown VirtualMachineInstances. They are not paused, hibernated or migrated. Further, Soft Eviction is disabled by default.
If configured, VirtualMachineInstances get evicted once the available memory falls below the threshold specified via --eviction-soft
and the VirtualmachineInstance is given the chance to perform a shutdown of the VMI within a timespan specified via --eviction-max-pod-grace-period
. The flag --eviction-soft-grace-period
specifies for how long a soft eviction condition must be held before soft evictions are triggered.
If set properly according to the demands of the VMIs, overcommitting should only lead to soft evictions in rare cases for some VMIs. They may even get re-scheduled to the same node with less initial memory demand. For some workload types, this can be perfectly fine and lead to better overall memory-utilization.
"},{"location":"compute/node_overcommit/#configuring-hard-eviction-thresholds","title":"Configuring Hard Eviction Thresholds","text":"Note: If unspecified, the kubelet will do hard evictions for Pods once memory.available
falls below 100Mi
.
Limits set via --eviction-hard
will lead to immediate eviction of VirtualMachineInstances or Pods. This stops VMIs without a grace period and is comparable with power-loss on a real computer.
If the hard limit is hit, VMIs may from time to time simply be killed. They may be re-scheduled to the same node immediately again, since they start with less memory consumption again. This can be a simple option, if the memory threshold is only very seldom hit and the work performed by the VMIs is reproducible or it can be resumed from some checkpoints.
"},{"location":"compute/node_overcommit/#requesting-the-right-qos-class-for-virtualmachineinstances","title":"Requesting the right QoS Class for VirtualMachineInstances","text":"Different QoS classes get assigned to Pods and VirtualMachineInstances based on the requests.memory
and limits.memory
. KubeVirt right now supports the QoS classes Burstable
and Guaranteed
. Burstable
VMIs are evicted before Guaranteed
VMIs.
This allows creating two classes of VMIs:
requests.memory
and limits.memory
set and therefore gets the Guaranteed
class assigned. This one will not get evicted and should never run into memory issues, but is more demanding.limits.memory
or a limits.memory
which is greater than requests.memory
and therefore gets the Burstable
class assigned. These VMIs will be evicted first.--system-reserved
and --kubelet-reserved
","text":"It may be important to reserve some memory for other daemons (not DaemonSets) which are running on the same node (ssh, dhcp servers, etc). The reservation can be done with the --system reserved
switch. Further for the Kubelet and Docker a special flag called --kubelet-reserved
exists.
The KSM (Kernel same-page merging) daemon can be started on the node. Depending on its tuning parameters it can more or less aggressively try to merge identical pages between applications and VirtualMachineInstances. The more aggressive it is configured the more CPU it will use itself, so the memory overcommit advantages comes with a slight CPU performance hit.
Config file tuning allows changes to scanning frequency (how often will KSM activate) and aggressiveness (how many pages per second will it scan).
"},{"location":"compute/node_overcommit/#enabling-swap","title":"Enabling Swap","text":"Note: This will definitely make sure that your VirtualMachines can't crash or get evicted from the node but it comes with the cost of pretty unpredictable performance once the node runs out of memory and the kubelet may not detect that it should evict Pods to increase the performance again.
Enabling swap is in general not recommended on Kubernetes right now. However, it can be useful in combination with KSM, since KSM merges identical pages over time. Swap allows the VMIs to successfully allocate memory which will then effectively never be used because of the later de-duplication done by KSM.
"},{"location":"compute/node_overcommit/#node-cpu-allocation-ratio","title":"Node CPU allocation ratio","text":"KubeVirt runs Virtual Machines in a Kubernetes Pod. This pod requests a certain amount of CPU time from the host. On the other hand, the Virtual Machine is being created with a certain amount of vCPUs. The number of vCPUs may not necessarily correlate to the number of requested CPUs by the POD. Depending on the QOS of the POD, vCPUs can be scheduled on a variable amount of physical CPUs; this depends on the available CPU resources on a node. When there are fewer available CPUs on the node as the requested vCPU, vCPU will be over committed.
By default, each pod requests 100mil of CPU time. The CPU requested on the pod sets the cgroups cpu.shares which serves as a priority for the scheduler to provide CPU time for vCPUs in this POD. As the number of vCPUs increases, this will reduce the amount of CPU time each vCPU may get when competing with other processes on the node or other Virtual Machine Instances with a lower amount of vCPUs.
The cpuAllocationRatio
comes to normalize the amount of CPU time the POD will request based on the number of vCPUs. For example, POD CPU request = number of vCPUs * 1/cpuAllocationRatio When cpuAllocationRatio is set to 1, a full amount of vCPUs will be requested for the POD.
Note: In Kubernetes, one full core is 1000 of CPU time More Information
Administrators can change this ratio by updating the KubeVirt CR
...\nspec:\n configuration:\n developerConfiguration:\n cpuAllocationRatio: 10\n
"},{"location":"compute/numa/","title":"NUMA","text":"FEATURE STATE: KubeVirt v0.43
NUMA support in KubeVirt is at this stage limited to a small set of special use-cases and will improve over time together with improvements made to Kubernetes.
In general, the goal is to map the host NUMA topology as efficiently as possible to the Virtual Machine topology to improve the performance.
The following NUMA mapping strategies can be used:
In order to use current NUMA support, the following preconditions must be met:
NUMA
feature gate must be enabled.GuestMappingPassthrough will pass through the node numa topology to the guest. The topology is based on the dedicated CPUs which the VMI got assigned from the kubelet via the CPU Manager. It can be requested by setting spec.domain.cpu.guestMappingPassthrough
on the VMI.
Since KubeVirt does not know upfront which exclusive CPUs the VMI will get from the kubelet, there are some limitations:
While this NUMA modelling strategy has its limitations, aligning the guest's NUMA architecture with the node's can be critical for high-performance applications.
An example VMI may look like this:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: numavm\nspec:\n domain:\n cpu:\n cores: 4\n dedicatedCpuPlacement: true\n numa:\n guestMappingPassthrough: { }\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n resources:\n requests:\n memory: 64Mi\n memory:\n hugepages:\n pageSize: 2Mi\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/cirros-container-disk-demo\n name: containerdisk\n - cloudInitNoCloud:\n userData: |\n #!/bin/sh\n echo 'printed from cloud-init userdata'\n name: cloudinitdisk\n
"},{"location":"compute/numa/#running-real-time-workloads","title":"Running real-time workloads","text":""},{"location":"compute/numa/#overview","title":"Overview","text":"It is possible to deploy Virtual Machines that run a real-time kernel and make use of libvirtd's guest cpu and memory optimizations that improve the overall latency. These changes leverage mostly on already available settings in KubeVirt, as we will see shortly, but the VMI manifest now exposes two new settings that instruct KubeVirt to configure the generated libvirt XML with the recommended tuning settings for running real-time workloads.
To make use of the optimized settings, two new settings have been added to the VMI schema:
spec.domain.cpu.realtime
: When defined, it instructs KubeVirt to configure the linux scheduler for the VCPUS to run processes in FIFO scheduling policy (SCHED_FIFO) with priority 1. This setting guarantees that all processes running in the host will be executed with real-time priority.
spec.domain.cpu.realtime.mask
: It defines which VCPUs assigned to the VM are used for real-time. If not defined, libvirt will define all VCPUS assigned to run processes in FIFO scheduling and in the highest priority (1).
A prerequisite to running real-time workloads include locking resources in the cluster to allow the real-time VM exclusive usage. This translates into nodes, or node, that have been configured with a dedicated set of CPUs and also provides support for NUMA with a free number of hugepages of 2Mi or 1Gi size (depending on the configuration in the VMI). Additionally, the node must be configured to allow the scheduler to run processes with real-time policy.
"},{"location":"compute/numa/#nodes-capable-of-running-real-time-workloads","title":"Nodes capable of running real-time workloads","text":"When the KubeVirt pods are deployed in a node, it will check if it is capable of running processes in real-time scheduling policy and label the node as real-time capable (kubevirt.io/realtime). If, on the other hand, the node is not able to deliver such capability, the label is not applied. To check which nodes are able to host real-time VM workloads run this command:
$>kubectl get nodes -l kubevirt.io/realtime\nNAME STATUS ROLES AGE VERSION\nworker-0-0 Ready worker 12d v1.20.0+df9c838\n
Internally, the KubeVirt pod running in each node checks if the kernel setting kernel.sched_rt_runtime_us
equals to -1, which grants processes to run in real-time scheduling policy for an unlimited amount of time.
Here is an example of a VM manifest that runs a custom fedora container disk configured to run with a real-time kernel. The settings have been configured for optimal efficiency.
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: fedora-realtime\n name: fedora-realtime\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: fedora-realtime\n spec:\n domain:\n devices:\n autoattachSerialConsole: true\n autoattachMemBalloon: false\n autoattachGraphicsDevice: false\n disks:\n - disk:\n bus: virtio\n name: containerdisk \n - disk:\n bus: virtio\n name: cloudinitdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1Gi\n cpu: 2\n limits:\n memory: 1Gi\n cpu: 2\n cpu:\n model: host-passthrough\n dedicatedCpuPlacement: true\n isolateEmulatorThread: true\n ioThreadsPolicy: auto\n features:\n - name: tsc-deadline\n policy: require\n numa:\n guestMappingPassthrough: {}\n realtime: {}\n memory:\n hugepages:\n pageSize: 1Gi\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-realtime-container-disk:v20211008-22109a3\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n bootcmd:\n - tuned-adm profile realtime\n name: cloudinitdisk\n
Breaking down the tuned sections, we have the following configuration:
Devices: - Disable the guest's memory balloon capability - Avoid attaching a graphics device, to reduce the number of interrupts to the kernel.
spec:\n domain:\n devices:\n autoattachSerialConsole: true\n autoattachMemBalloon: false\n autoattachGraphicsDevice: false\n
CPU: - model: host-passthrough
to allow the guest to see host CPU without masking any capability. - dedicated CPU Placement: The VM needs to have dedicated CPUs assigned to it. The Kubernetes CPU Manager takes care of this aspect. - isolatedEmulatorThread: to request an additional CPU to run the emulator on it, thus avoid using CPU cycles from the workload CPUs. - ioThreadsPolicy: Set to auto to let the dedicated IO thread to run in the same CPU as the emulator thread. - NUMA: defining guestMappingPassthrough
enables NUMA support for this VM. - realtime: instructs the virt-handler to configure this VM for real-time workloads, such as configuring the VCPUS to use FIFO scheduler policy and set priority to 1. cpu:
cpu:\n model: host-passthrough\n dedicatedCpuPlacement: true\n isolateEmulatorThread: true\n ioThreadsPolicy: auto\n features:\n - name: tsc-deadline\n policy: require\n numa:\n guestMappingPassthrough: {}\n realtime: {}\n
Memory - pageSize: allocate the pod's memory in hugepages of the given size, in this case of 1Gi.
memory:\n hugepages:\n pageSize: 1Gi\n
"},{"location":"compute/numa/#how-to-dedicate-vcpus-for-real-time-only","title":"How to dedicate VCPUS for real-time only","text":"It is possible to pass a regular expression of the VCPUs to isolate to use real-time scheduling policy, by using the realtime.mask
setting.
cpu:\n numa:\n guestMappingPassthrough: {}\n realtime:\n mask: \"0\"\n
When applied this configuration, KubeVirt will only set the first VCPU for real-time scheduler policy, leaving the remaining VCPUS to use the default scheduler policy. Other examples of valid masks are: - 0-3
: Use cores 0 to 3 for real-time scheduling, assuming that the VM has requested at least 3 cores. - 0-3,^1
: Use cores 0, 2 and 3 for real-time scheduling only, assuming that the VM has requested at least 3 cores.
Kubernetes provides additional NUMA components that may be relevant to your use-case but typically are not enabled by default. Please consult the Kubernetes documentation for details on configuration of these components.
"},{"location":"compute/numa/#topology-manager","title":"Topology Manager","text":"Topology Manager provides optimizations related to CPU isolation, memory and device locality. It is useful, for example, where an SR-IOV network adaptor VF allocation needs to be aligned with a NUMA node.
https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/
"},{"location":"compute/numa/#memory-manager","title":"Memory Manager","text":"Memory Manager is analogous to CPU Manager. It is useful, for example, where you want to align hugepage allocations with a NUMA node. It works in conjunction with Topology Manager.
The Memory Manager employs hint generation protocol to yield the most suitable NUMA affinity for a pod. The Memory Manager feeds the central manager (Topology Manager) with these affinity hints. Based on both the hints and Topology Manager policy, the pod is rejected or admitted to the node.
https://kubernetes.io/docs/tasks/administer-cluster/memory-manager/
"},{"location":"compute/persistent_tpm_and_uefi_state/","title":"Persistent TPM and UEFI state","text":"FEATURE STATE: KubeVirt v1.0.0
For both TPM and UEFI, libvirt supports persisting data created by a virtual machine as files on the virtualization host. In KubeVirt, the virtualization host is the virt-launcher pod, which is ephemeral (created on VM start and destroyed on VM stop). As of v1.0.0, KubeVirt supports using a PVC to persist those files. KubeVirt usually refers to that storage area as \"backend storage\".
"},{"location":"compute/persistent_tpm_and_uefi_state/#backend-storage","title":"Backend storage","text":"KubeVirt automatically creates backend storage PVCs for VMs that need it. However, the admin must first enable the VMPersistentState
feature gate, and tell KubeVirt which storage class to use by setting the vmStateStorageClass
configuration parameter in the KubeVirt Custom Resource (CR). The storage class must support read-write-many (RWX) in filesystem mode (FS). Here's an example of KubeVirt CR that sets both:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmStateStorageClass: \"nfs-csi\"\n developerConfiguration:\n featureGates:\n - VMPersistentState\n
"},{"location":"compute/persistent_tpm_and_uefi_state/#limitations","title":"Limitations","text":"Since KubeVirt v0.53.0, a TPM device can be added to a VM (with just tpm: {}
). However, the data stored in it does not persist across reboots. Support for persistence was added in v1.0.0 using a simple persistent
boolean parameter that default to false, to preserve previous behavior. Of course, backend storage must first be configured before adding a persistent TPM to a VM. See above. Here's a portion of a VM definition that includes a persistent TPM:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm\nspec:\n template:\n spec:\n domain:\n devices:\n tpm:\n persistent: true\n
"},{"location":"compute/persistent_tpm_and_uefi_state/#uses","title":"Uses","text":"tpm-crb
model is used (instead of tpm-tis
for non-persistent vTPMs)EFI support is handled by libvirt using OVMF. OVMF data usually consists of 2 files, CODE and VARS. VARS is where persistent data from the guest can be stored. When EFI persistence is enabled on a VM, the VARS file will be persisted inside the backend storage. Of course, backend storage must first be configured before enabling EFI persistence on a VM. See above. Here's a portion of a VM definition that includes a persistent EFI:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm\nspec:\n template:\n spec:\n domain:\n firmware:\n bootloader:\n efi:\n persistent: true\n
"},{"location":"compute/persistent_tpm_and_uefi_state/#uses_1","title":"Uses","text":"In this document, we are talking about the resources values set on the virt-launcher compute container, referred to as \"the container\" below for simplicity.
"},{"location":"compute/resources_requests_and_limits/#cpu","title":"CPU","text":"Note: dedicated CPUs (and isolated emulator thread) are ignored here as they have a dedicated page.
"},{"location":"compute/resources_requests_and_limits/#cpu-requests-on-the-container","title":"CPU requests on the container","text":"KubeVirt provides two ways to automatically set CPU limits on VM(I)s:
AutoResourceLimitsGate
feature gate.In both cases, the VM(I) created will have a CPU limit of 1 per vCPU.
"},{"location":"compute/resources_requests_and_limits/#autoresourcelimitsgate-feature-gate","title":"AutoResourceLimitsGate feature gate","text":"By enabling this feature gate, cpu limits will be added to the vmi if all the following conditions are true:
Cluster admins can define a label selector in the KubeVirt CR. Once that label selector is defined, if the creation namespace matches the selector, all VM(I)s created in it will have a CPU limits set.
Example:
CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n autoCPULimitNamespaceLabelSelector:\n matchLabels:\n autoCpuLimit: \"true\"\n
Namespace:
apiVersion: v1\nkind: Namespace\nmetadata:\n labels:\n autoCpuLimit: \"true\"\n kubernetes.io/metadata.name: default\n name: default\n
KubeVirt provides a feature gate(AutoResourceLimitsGate
) to automatically set memory limits on VM(I)s. By enabling this feature gate, memory limits will be added to the vmi if all the following conditions are true:
If all the previous conditions are true, the memory limits will be set to a value (2x
) of the memory requests. This ratio can be adjusted, per namespace, by adding the annotation alpha.kubevirt.io/auto-memory-limits-ratio
, with the desired custom value. For example, with alpha.kubevirt.io/auto-memory-limits-ratio: 1.2
, the memory limits set will be equal to (1.2x
) of the memory requests.
VirtualMachines have a Running
setting that determines whether or not there should be a guest running or not. Because KubeVirt will always immediately restart a VirtualMachineInstance for VirtualMachines with spec.running: true
, a simple boolean is not always enough to fully describe desired behavior. For instance, there are cases when a user would like the ability to shut down a guest from inside the virtual machine. With spec.running: true
, KubeVirt would immediately restart the VirtualMachineInstance.
To allow for greater variation of user states, the RunStrategy
field has been introduced. This is mutually exclusive with Running
as they have somewhat overlapping conditions. There are currently four RunStrategies defined:
Always: The system is tasked with keeping the VM in a running state. This is achieved by respawning a VirtualMachineInstance whenever the current one terminated in a controlled (e.g. shutdown from inside the guest) or uncontrolled (e.g. crash) way. This behavior is equal to spec.running: true
.
RerunOnFailure: Similar to Always
, except that the VM is only restarted if it terminated in an uncontrolled way (e.g. crash) and due to an infrastructure reason (i.e. the node crashed, the KVM related process OOMed). This allows a user to determine when the VM should be shut down by initiating the shut down inside the guest. Note: Guest sided crashes (i.e. BSOD) are not covered by this. In such cases liveness checks or the use of a watchdog can help.
Manual: The system will not automatically turn the VM on or off, instead the user manually controlls the VM status by issuing start, stop, and restart commands on the VirtualMachine subresource endpoints.
Halted: The system is asked to ensure that no VM is running. This is achieved by stopping any VirtualMachineInstance that is associated ith the VM. If a guest is already running, it will be stopped. This behavior is equal to spec.running: false
.
Note: RunStrategy
and running
are mutually exclusive, because they can be contradictory. The API server will reject VirtualMachine resources that define both.
The start
, stop
and restart
methods of virtctl will invoke their respective subresources of VirtualMachines. This can have an effect on the runStrategy of the VirtualMachine as below:
Always
-
Halted
Always
RerunOnFailure
RerunOnFailure
RerunOnFailure
RerunOnFailure
Manual
Manual
Manual
Manual
Halted
Always
-
-
Table entries marked with -
don't make sense, so won't have an effect on RunStrategy.
An example usage of the Always RunStrategy.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-cirros\n name: vm-cirros\nspec:\n runStrategy: Always\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-cirros\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n name: containerdisk\n
"},{"location":"compute/virtual_hardware/","title":"Virtual hardware","text":"Fine-tuning different aspects of the hardware which are not device related (BIOS, mainboard, etc.) is sometimes necessary to allow guest operating systems to properly boot and reboot.
"},{"location":"compute/virtual_hardware/#machine-type","title":"Machine Type","text":"QEMU is able to work with two different classes of chipsets for x86_64, so called machine types. The x86_64 chipsets are i440fx (also called pc) and q35. They are versioned based on qemu-system-${ARCH}, following the format pc-${machine_type}-${qemu_version}
, e.g.pc-i440fx-2.10
and pc-q35-2.10
.
KubeVirt defaults to QEMU's newest q35 machine type. If a custom machine type is desired, it is configurable through the following structure:
metadata:\n name: myvmi\nspec:\n domain:\n machine:\n # This value indicates QEMU machine type.\n type: pc-q35-2.10\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
Comparison of the machine types' internals can be found at QEMU wiki."},{"location":"compute/virtual_hardware/#biosuefi","title":"BIOS/UEFI","text":"All virtual machines use BIOS by default for booting.
It is possible to utilize UEFI/OVMF by setting a value via spec.firmware.bootloader
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-alpine-efi\n name: vmi-alpine-efi\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n features:\n smm:\n enabled: true\n firmware:\n # this sets the bootloader type\n bootloader:\n efi: {}\n
Enabling EFI automatically enables Secure Boot, unless the secureBoot
field under efi
is set to false
. Secure Boot itself requires the SMM CPU feature to be enabled as above, which does not happen automatically, for security reasons.
In order to provide a consistent view on the virtualized hardware for the guest OS, the SMBIOS UUID can be set to a constant value via spec.firmware.uuid
:
metadata:\n name: myvmi\nspec:\n domain:\n firmware:\n # this sets the UUID\n uuid: 5d307ca9-b3ef-428c-8861-06e72d69f223\n serial: e4686d2c-6e8d-4335-b8fd-81bee22f4815\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
In addition, the SMBIOS serial number can be set to a constant value via spec.firmware.serial
, as demonstrated above.
Note: This is not related to scheduling decisions or resource assignment.
"},{"location":"compute/virtual_hardware/#topology","title":"Topology","text":"Setting the number of CPU cores is possible via spec.domain.cpu.cores
. The following VM will have a CPU with 3
cores:
metadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this sets the cores\n cores: 3\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
"},{"location":"compute/virtual_hardware/#labeling-nodes-with-cpu-models-and-cpu-features","title":"Labeling nodes with cpu models and cpu features","text":"KubeVirt can create node selectors based on VM cpu models and features. With these node selectors, VMs will be scheduled on the nodes that support the matching VM cpu model and features.
To properly label the node, user can use Kubevirt Node-labeller, which creates all necessary labels or create node labels by himself.
Kubevirt node-labeller creates 3 types of labels: cpu models, cpu features and kvm info. It uses libvirt to get all supported cpu models and cpu features on host and then Node-labeller creates labels from cpu models.
Node-labeller supports obsolete list of cpu models and minimal baseline cpu model for features. Both features can be set via KubeVirt CR:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n ...\n configuration:\n obsoleteCPUModels:\n 486: true\n pentium: true\n...\n
Obsolete cpus will not be inserted in labels. If KubeVirt CR doesn't contain obsoleteCPUModels
variable, Labeller sets default values (\"pentium, pentium2, pentium3, pentiumpro, coreduo, n270, core2duo, Conroe, athlon, phenom, kvm32, kvm64, qemu32 and qemu64\").
User can change obsoleteCPUModels by adding / removing cpu model in config map. Kubevirt then update nodes with new labels.
For homogenous cluster / clusters without live migration enabled it's possible to disable the node labeler and avoid adding labels to the nodes by adding the following annotation to the nodes:
node-labeller.kubevirt.io/skip-node
.
Note: If CPU model wasn't defined, the VM will have CPU model closest to one that used on the node where the VM is running.
Note: CPU model is case sensitive.
Setting the CPU model is possible via spec.domain.cpu.model
. The following VM will have a CPU with the Conroe
model:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this sets the CPU model\n model: Conroe\n...\n
You can check list of available models here.
When CPUNodeDiscovery feature-gate is enabled and VM has cpu model, Kubevirt creates node selector with format: cpu-model.node.kubevirt.io/<cpuModel>
, e.g. cpu-model.node.kubevirt.io/Conroe
. When VM doesn\u2019t have cpu model, then no node selector is created.
To enable the default cpu model, user may add the cpuModel
field in the KubeVirt CR.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n ...\n configuration:\n cpuModel: \"EPYC\"\n...\n
Default CPU model is set when vmi doesn't have any cpu model. When vmi has cpu model set, then vmi's cpu model is preferred. When default cpu model is not set and vmi's cpu model is not set too, host-model
will be set. Default cpu model can be changed when kubevirt is running. When CPUNodeDiscovery feature gate is enabled Kubevirt creates node selector with default cpu model.
As special cases you can set spec.domain.cpu.model
equals to: - host-passthrough
to passthrough CPU from the node to the VM
metadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this passthrough the node CPU to the VM\n model: host-passthrough\n...\n
host-model
to get CPU on the VM close to the node onemetadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this set the VM CPU close to the node one\n model: host-model\n...\n
See the CPU API reference for more details.
"},{"location":"compute/virtual_hardware/#features","title":"Features","text":"Setting CPU features is possible via spec.domain.cpu.features
and can contain zero or more CPU features :
metadata:\n name: myvmi\nspec:\n domain:\n cpu:\n # this sets the CPU features\n features:\n # this is the feature's name\n - name: \"apic\"\n # this is the feature's policy\n policy: \"require\"\n...\n
Note: Policy attribute can either be omitted or contain one of the following policies: force, require, optional, disable, forbid.
Note: In case a policy is omitted for a feature, it will default to require.
Behaviour according to Policies:
Full description about features and policies can be found here.
When CPUNodeDiscovery feature-gate is enabled Kubevirt creates node selector from cpu features with format: cpu-feature.node.kubevirt.io/<cpuFeature>
, e.g. cpu-feature.node.kubevirt.io/apic
. When VM doesn\u2019t have cpu feature, then no node selector is created.
Sets the virtualized hardware clock inside the VM to a specific time. Available options are
utc
timezone
See the Clock API Reference for all possible configuration options.
"},{"location":"compute/virtual_hardware/#utc","title":"utc","text":"If utc
is specified, the VM's clock will be set to UTC.
metadata:\n name: myvmi\nspec:\n domain:\n clock:\n utc: {}\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
"},{"location":"compute/virtual_hardware/#timezone","title":"timezone","text":"If timezone
is specified, the VM's clock will be set to the specified local time.
metadata:\n name: myvmi\nspec:\n domain:\n clock:\n timezone: \"America/New York\"\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
"},{"location":"compute/virtual_hardware/#timers","title":"Timers","text":"pit
rtc
kvm
hyperv
A pretty common timer configuration for VMs looks like this:
metadata:\n name: myvmi\nspec:\n domain:\n clock:\n utc: {}\n # here are the timer\n timer:\n hpet:\n present: false\n pit:\n tickPolicy: delay\n rtc:\n tickPolicy: catchup\n hyperv: {}\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
hpet
is disabled,pit
and rtc
are configured to use a specific tickPolicy
. Finally, hyperv
is made available too.
See the Timer API Reference for all possible configuration options.
Note: Timer can be part of a machine type. Thus it may be necessary to explicitly disable them. We may in the future decide to add them via cluster-level defaulting, if they are part of a QEMU machine definition.
"},{"location":"compute/virtual_hardware/#random-number-generator-rng","title":"Random number generator (RNG)","text":"You may want to use entropy collected by your cluster nodes inside your guest. KubeVirt allows to add a virtio
RNG device to a virtual machine as follows.
metadata:\n name: vmi-with-rng\nspec:\n domain:\n devices:\n rng: {}\n
For Linux guests, the virtio-rng
kernel module should be loaded early in the boot process to acquire access to the entropy source. Other systems may require similar adjustments to work with the virtio
RNG device.
Note: Some guest operating systems or user payloads may require the RNG device with enough entropy and may fail to boot without it. For example, fresh Fedora images with newer kernels (4.16.4+) may require the virtio
RNG device to be present to boot to login.
By default a minimal Video and Graphics device configuration will be applied to the VirtualMachineInstance. The video device is vga
compatible and comes with a memory size of 16 MB. This device allows connecting to the OS via vnc
.
It is possible not attach it by setting spec.domain.devices.autoattachGraphicsDevice
to false
:
metadata:\n name: myvmi\nspec:\n domain:\n devices:\n autoattachGraphicsDevice: false\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimName: myclaim\n
VMIs without graphics and video devices are very often referenced as headless
VMIs.
If using a huge amount of small VMs this can be helpful to increase the VMI density per node, since no memory needs to be reserved for video.
"},{"location":"compute/virtual_hardware/#features_1","title":"Features","text":"KubeVirt supports a range of virtualization features which may be tweaked in order to allow non-Linux based operating systems to properly boot. Most noteworthy are
acpi
apic
hyperv
A common feature configuration is shown by the following example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n domain:\n # typical features\n features:\n acpi: {}\n apic: {}\n hyperv:\n relaxed: {}\n vapic: {}\n spinlocks:\n spinlocks: 8191\n resources:\n requests:\n memory: 512M\n devices:\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimname: myclaim\n
See the Features API Reference for all available features and configuration options.
"},{"location":"compute/virtual_hardware/#resources-requests-and-limits","title":"Resources Requests and Limits","text":"An optional resource request can be specified by the users to allow the scheduler to make a better decision in finding the most suitable Node to place the VM.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n domain:\n resources:\n requests:\n memory: \"1Gi\"\n cpu: \"1\"\n limits:\n memory: \"2Gi\"\n cpu: \"2\"\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimname: myclaim\n
"},{"location":"compute/virtual_hardware/#cpu_1","title":"CPU","text":"Specifying CPU limits will determine the amount of cpu shares set on the control group the VM is running in, in other words, the amount of time the VM's CPUs can execute on the assigned resources when there is a competition for CPU resources.
For more information please refer to how Pods with resource limits are run.
"},{"location":"compute/virtual_hardware/#memory-overhead","title":"Memory Overhead","text":"Various VM resources, such as a video adapter, IOThreads, and supplementary system software, consume additional memory from the Node, beyond the requested memory intended for the guest OS consumption. In order to provide a better estimate for the scheduler, this memory overhead will be calculated and added to the requested memory.
Please see how Pods with resource requests are scheduled for additional information on resource requests and limits.
"},{"location":"compute/virtual_hardware/#hugepages","title":"Hugepages","text":"KubeVirt give you possibility to use hugepages as backing memory for your VM. You will need to provide desired amount of memory resources.requests.memory
and size of hugepages to use memory.hugepages.pageSize
, for example for x86_64 architecture it can be 2Mi
.
apiVersion: kubevirt.io/v1alpha1\nkind: VirtualMachine\nmetadata:\n name: myvm\nspec:\n domain:\n resources:\n requests:\n memory: \"64Mi\"\n memory:\n hugepages:\n pageSize: \"2Mi\"\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimname: myclaim\n
In the above example the VM will have 64Mi
of memory, but instead of regular memory it will use node hugepages of the size of 2Mi
.
a node must have pre-allocated hugepages
hugepages size cannot be bigger than requested memory
requested memory must be divisible by hugepages size
hugepages uses by default memfd. Memfd is supported from kernel >= 4.14. If you run on an older host (e.g centos 7.9), it is required to disable memfd with the annotation kubevirt.io/memfd: \"false\"
in the VMI metadata annotation.
Kubevirt supports input devices. The only type which is supported is tablet
. Tablet input device supports only virtio
and usb
bus. Bus can be empty. In that case, usb
will be selected.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: myvm\nspec:\n domain:\n devices:\n inputs:\n - type: tablet\n bus: virtio\n name: tablet1\n disks:\n - name: myimage\n disk: {}\n volumes:\n - name: myimage\n persistentVolumeClaim:\n claimname: myclaim\n
"},{"location":"compute/vsock/","title":"VSOCK","text":"VM Sockets (vsock) is a fast and efficient guest-host communication mechanism.
"},{"location":"compute/vsock/#background","title":"Background","text":"Right now KubeVirt uses virtio-serial for local guest-host communication. Currently it used in KubeVirt by libvirt and qemu to communicate with the qemu-guest-agent. Virtio-serial can also be used by other agents, but it is a little bit cumbersome due to:
With virtio-vsock we get support for easy guest-host communication which solves the above issues from a user/admin perspective.
"},{"location":"compute/vsock/#usage","title":"Usage","text":""},{"location":"compute/vsock/#feature-gate","title":"Feature Gate","text":"To enable VSOCK in KubeVirt cluster, the user may expand the featureGates
field in the KubeVirt CR by adding the VSOCK
to it.
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n ...\n configuration:\n developerConfiguration:\n featureGates:\n - \"VSOCK\"\n
Alternatively, users can edit an existing kubevirt CR:
kubectl edit kubevirt kubevirt -n kubevirt
...\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - \"VSOCK\"\n
"},{"location":"compute/vsock/#virtual-machine-instance","title":"Virtual Machine Instance","text":"To attach VSOCK device to a Virtual Machine, the user has to add autoattachVSOCK: true
in a devices
section of Virtual Machine Instance specification:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-vsock\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n autoattachVSOCK: true\n
This will expose VSOCK device to the VM. The CID
will be assigned randomly by virt-controller
, and exposed to the Virtual Machine Instance status:
status:\n VSOCKCID: 123\n
"},{"location":"compute/vsock/#security","title":"Security","text":"NOTE: The /dev/vhost-vsock
device is NOT NEEDED to connect or bind to a VSOCK socket.
To make VSOCK feature secure, following measures are put in place:
CAP_NET_BIND_SERVICE
capability.AF_VSOCK
socket syscall gets blocked in containerd 1.7+ (containerd/containerd#7442). It is right now the responsibility of the vendor to ensure that the used CRI selects a default seccomp policy which blocks VSOCK socket calls in a similar way like it was done for containerd.virt-controller
and are unique per Virtual Machine Instance to ensure that virt-handler
has an easy way of tracking the identity without races. While this still allows virt-launcher
to fake-use an assigned CID, it eliminates possible assignment races which attackers could make use-of to redirect VSOCK calls.Purpose of this document is to explain how to install virtio drivers for Microsoft Windows running in a fully virtualized guest.
"},{"location":"compute/windows_virtio_drivers/#do-i-need-virtio-drivers","title":"Do I need virtio drivers?","text":"Yes. Without the virtio drivers, you cannot use paravirtualized hardware properly. It would either not work, or will have a severe performance penalty.
For more information about VirtIO and paravirtualization, see VirtIO and paravirtualization
For more details on configuring your VirtIO driver please refer to Installing VirtIO driver on a new Windows virtual machine and Installing VirtIO driver on an existing Windows virtual machine.
"},{"location":"compute/windows_virtio_drivers/#which-drivers-i-need-to-install","title":"Which drivers I need to install?","text":"There are usually up to 8 possible devices that are required to run Windows smoothly in a virtualized environment. KubeVirt currently supports only:
viostor, the block driver, applies to SCSI Controller in the Other devices group.
viorng, the entropy source driver, applies to PCI Device in the Other devices group.
NetKVM, the network driver, applies to Ethernet Controller in the Other devices group. Available only if a virtio NIC is configured.
Other virtio drivers, that exists and might be supported in the future:
Balloon, the balloon driver, applies to PCI Device in the Other devices group
vioserial, the paravirtual serial driver, applies to PCI Simple Communications Controller in the Other devices group.
vioscsi, the SCSI block driver, applies to SCSI Controller in the Other devices group.
qemupciserial, the emulated PCI serial driver, applies to PCI Serial Port in the Other devices group.
qxl, the paravirtual video driver, applied to Microsoft Basic Display Adapter in the Display adapters group.
pvpanic, the paravirtual panic driver, applies to Unknown device in the Other devices group.
Note
Some drivers are required in the installation phase. When you are installing Windows onto the virtio block storage you have to provide an appropriate virtio driver. Namely, choose viostor driver for your version of Microsoft Windows, eg. does not install XP driver when you run Windows 10.
Other drivers can be installed after the successful windows installation. Again, please install only drivers matching your Windows version.
"},{"location":"compute/windows_virtio_drivers/#how-to-install-during-windows-install","title":"How to install during Windows install?","text":"To install drivers before the Windows starts its install, make sure you have virtio-win package attached to your VirtualMachine as SATA CD-ROM. In the Windows installation, choose advanced install and load driver. Then please navigate to loaded Virtio CD-ROM and install one of viostor or vioscsi, depending on whichever you have set up.
Step by step screenshots:
"},{"location":"compute/windows_virtio_drivers/#how-to-install-after-windows-install","title":"How to install after Windows install?","text":"After windows install, please go to Device Manager. There you should see undetected devices in \"available devices\" section. You can install virtio drivers one by one going through this list.
For more details on how to choose a proper driver and how to install the driver, please refer to the Windows Guest Virtual Machines on Red Hat Enterprise Linux 7.
"},{"location":"compute/windows_virtio_drivers/#how-to-obtain-virtio-drivers","title":"How to obtain virtio drivers?","text":"The virtio Windows drivers are distributed in a form of containerDisk, which can be simply mounted to the VirtualMachine. The container image, containing the disk is located at: https://quay.io/repository/kubevirt/virtio-container-disk?tab=tags and the image be pulled as any other docker container:
docker pull quay.io/kubevirt/virtio-container-disk\n
However, pulling image manually is not required, it will be downloaded if not present by Kubernetes when deploying VirtualMachine.
"},{"location":"compute/windows_virtio_drivers/#attaching-to-virtualmachine","title":"Attaching to VirtualMachine","text":"KubeVirt distributes virtio drivers for Microsoft Windows in a form of container disk. The package contains the virtio drivers and QEMU guest agent. The disk was tested on Microsoft Windows Server 2012. Supported Windows version is XP and up.
The package is intended to be used as CD-ROM attached to the virtual machine with Microsoft Windows. It can be used as SATA CDROM during install phase or to provide drivers in an existing Windows installation.
Attaching the virtio-win package can be done simply by adding ContainerDisk to you VirtualMachine.
spec:\n domain:\n devices:\n disks:\n - name: virtiocontainerdisk\n # Any other disk you want to use, must go before virtioContainerDisk.\n # KubeVirt boots from disks in order ther are defined.\n # Therefore virtioContainerDisk, must be after bootable disk.\n # Other option is to choose boot order explicitly:\n # - https://kubevirt.io/api-reference/v0.13.2/definitions.html#_v1_disk\n # NOTE: You either specify bootOrder explicitely or sort the items in\n # disks. You can not do both at the same time.\n # bootOrder: 2\n cdrom:\n bus: sata\nvolumes:\n - containerDisk:\n image: quay.io/kubevirt/virtio-container-disk\n name: virtiocontainerdisk\n
Once you are done installing virtio drivers, you can remove virtio container disk by simply removing the disk from yaml specification and restarting the VirtualMachine.
"},{"location":"debug_virt_stack/debug/","title":"Debug","text":"This page contains instructions on how to debug KubeVirt.
This is useful to both KubeVirt developers and advanced users that would like to gain deep understanding on what's happening behind the scenes.
"},{"location":"debug_virt_stack/debug/#log-verbosity","title":"Log Verbosity","text":"KubeVirt produces a lot of logging throughout its codebase. Some log entries have a verbosity level defined to them. The verbosity level that's defined for a log entry determines the minimum verbosity level in order to expose the log entry.
In code, the log entry looks similar to: log.Log.V(verbosity).Infof(\"...\")
while verbosity
is the minimum verbosity level for this entry.
For example, if the log verbosity for some log entry is 3
, then the log would be exposed only if the log verbosity is defined to be equal or greater than 3
, or else it would be filtered out.
Currently, log verbosity can be defined per-component or per-node. The most updated API is detailed here.
"},{"location":"debug_virt_stack/debug/#setting-verbosity-per-kubevirt-component","title":"Setting verbosity per KubeVirt component","text":"One way of raising log verbosity is to manually determine it for the different components in KubeVirt
CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n logVerbosity:\n virtLauncher: 2\n virtHandler: 3\n virtController: 4\n virtAPI: 5\n virtOperator: 6\n
This option is best for debugging specific components.
"},{"location":"debug_virt_stack/debug/#libvirt-virtqemudconf-set-log_filters-according-to-virt-launcher-log-verbosity","title":"libvirt virtqemud.conf set log_filters according to virt-launcher log Verbosity","text":"Verbosity level log_filters in virtqemud.conf 5 log_filters=\"3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 3:util.threadjob 3:cpu.cpu 3:qemu.qemu_monitor 3:qemu.qemu_monitor_json 3:conf.domain_addr 1:*\" 6 3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 3:util.threadjob 3:cpu.cpu 3:qemu.qemu_monitor 1:* 7 3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 3:util.threadjob 3:cpu.cpu 1:* 8 and above 3:remote 4:event 3:util.json 3:util.object 3:util.dbus 3:util.netlink 3:node_device 3:rpc 3:access 1:*User can set self-defined log-filters via the annotations tag kubevirt.io/libvirt-log-filters
in VMI configuration. e.g.
kind: VirtualMachineInstance\nmetadata:\n name: my-vmi\n annotations:\n kubevirt.io/libvirt-log-filters: \"3:remote 4:event 1:*\"\n
"},{"location":"debug_virt_stack/debug/#setting-verbosity-per-nodes","title":"Setting verbosity per nodes","text":"Another way is to set verbosity level per node:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n nodeVerbosity:\n \"node01\": 4\n \"otherNodeName\": 6\n
nodeVerbosity
is essentially a map from string to int where the key is the node name and the value is the verbosity level. The verbosity level would be defined for all the different components in that node (e.g. virt-handler
, virt-launcher
, etc).
In Kubernetes, logs are defined at the Pod level. Therefore, first it's needed to list the Pods of KubeVirt's core components. In order to do that we can first list the Pods under KubeVirt's install namespace.
For example:
$> kubectl get pods -n <KubeVirt Install Namespace>\nNAME READY STATUS RESTARTS AGE\ndisks-images-provider-7gqbc 1/1 Running 0 32m\ndisks-images-provider-vg4kx 1/1 Running 0 32m\nvirt-api-57fcc4497b-7qfmc 1/1 Running 0 31m\nvirt-api-57fcc4497b-tx9nc 1/1 Running 0 31m\nvirt-controller-76c784655f-7fp6m 1/1 Running 0 30m\nvirt-controller-76c784655f-f4pbd 1/1 Running 0 30m\nvirt-handler-2m86x 1/1 Running 0 30m\nvirt-handler-9qs6z 1/1 Running 0 30m\nvirt-operator-7ccfdbf65f-q5snk 1/1 Running 0 32m\nvirt-operator-7ccfdbf65f-vllz8 1/1 Running 0 32m\n
Then, we can pick one of the pods and fetch its logs. For example:
$> kubectl logs -n <KubeVirt Install Namespace> virt-handler-2m86x | head -n8\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"set verbosity to 2\",\"pos\":\"virt-handler.go:453\",\"timestamp\":\"2022-04-17T08:58:37.373695Z\"}\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"set verbosity to 2\",\"pos\":\"virt-handler.go:453\",\"timestamp\":\"2022-04-17T08:58:37.373726Z\"}\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"setting rate limiter to 5 QPS and 10 Burst\",\"pos\":\"virt-handler.go:462\",\"timestamp\":\"2022-04-17T08:58:37.373782Z\"}\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"CPU features of a minimum baseline CPU model: map[apic:true clflush:true cmov:true cx16:true cx8:true de:true fpu:true fxsr:true lahf_lm:true lm:true mca:true mce:true mmx:true msr:true mtrr:true nx:true pae:true pat:true pge:true pni:true pse:true pse36:true sep:true sse:true sse2:true sse4.1:true ssse3:true syscall:true tsc:true]\",\"pos\":\"cpu_plugin.go:96\",\"timestamp\":\"2022-04-17T08:58:37.390221Z\"}\n{\"component\":\"virt-handler\",\"level\":\"warning\",\"msg\":\"host model mode is expected to contain only one model\",\"pos\":\"cpu_plugin.go:103\",\"timestamp\":\"2022-04-17T08:58:37.390263Z\"}\n{\"component\":\"virt-handler\",\"level\":\"info\",\"msg\":\"node-labeller is running\",\"pos\":\"node_labeller.go:94\",\"timestamp\":\"2022-04-17T08:58:37.391011Z\"}\n
Obviously, for both examples above, <KubeVirt Install Namespace>
needs to be replaced with the actual namespace KubeVirt is installed in.
Using the cluster-profiler
client tool, a developer can get the PProf profiling data for every component in the Kubevirt Control plane. Here is a user guide:
cluster-profiler
","text":"Build from source code
$ git clone https://github.com/kubevirt/kubevirt.git\n$ cd kubevirt/tools/cluster-profiler\n$ go build\n
"},{"location":"debug_virt_stack/debug/#enable-the-feature-gate","title":"Enable the feature gate","text":"Add ClusterProfiler
in KubeVirt config
$ cat << END > enable-feature-gate.yaml\n\n---\napiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - ClusterProfiler\nEND\n\n$ kubectl apply -f enable-feature-gate.yaml\n
"},{"location":"debug_virt_stack/debug/#do-the-profiling","title":"Do the profiling","text":"Start CPU profiling
$ cluster-profiler --cmd start\n\n2023/05/17 09:31:09 SUCCESS: started cpu profiling KubeVirt control plane\n
Stop CPU profiling $ cluster-profiler --cmd stop\n\n2023/05/17 09:31:14 SUCCESS: stopped cpu profiling KubeVirt control plane\n
Dump the pprof result $ cluster-profiler --cmd dump\n\n2023/05/17 09:31:18 Moving already existing \"cluster-profiler-results\" => \"cluster-profiler-results-old-67fq\"\nSUCCESS: Dumped PProf 6 results for KubeVirt control plane to [cluster-profiler-results]\n
The PProf result can be found in the folder cluster-profiler-results
$ tree cluster-profiler-results\n\ncluster-profiler-results\n\u251c\u2500\u2500 virt-api-5f96f84dcb-lkpb7\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u251c\u2500\u2500 virt-controller-5bbd9554d9-2f8j2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u251c\u2500\u2500 virt-controller-5bbd9554d9-qct2w\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u251c\u2500\u2500 virt-handler-ccq6c\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u251c\u2500\u2500 virt-operator-cdc677b7-pg9j2\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 allocs.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 block.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 cpu.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 goroutine.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 heap.pprof\n\u2502\u00a0\u00a0 \u251c\u2500\u2500 mutex.pprof\n\u2502\u00a0\u00a0 \u2514\u2500\u2500 threadcreate.pprof\n\u2514\u2500\u2500 virt-operator-cdc677b7-pjqdx\n \u251c\u2500\u2500 allocs.pprof\n \u251c\u2500\u2500 block.pprof\n \u251c\u2500\u2500 cpu.pprof\n \u251c\u2500\u2500 goroutine.pprof\n \u251c\u2500\u2500 heap.pprof\n \u251c\u2500\u2500 mutex.pprof\n \u2514\u2500\u2500 threadcreate.pprof\n
"},{"location":"debug_virt_stack/launch-qemu-gdb/","title":"Launch QEMU with gdb and connect locally with gdb client","text":"This guide is for cases where QEMU counters very early failures and it is hard to synchronize it in a later point in time.
"},{"location":"debug_virt_stack/launch-qemu-gdb/#image-creation-and-pvc-population","title":"Image creation and PVC population","text":"This scenario is a slight variation of the guide about starting strace, hence some of the details on the image build and the PVC population are simply skipped and explained in the other section.
In this example, QEMU will be launched with gdbserver
and later we will connect to it using a local gdb
client.
The wrapping script looks like:
#!/bin/bash\n\nLD_LIBRARY_PATH=$LD_LIBRARY_PATH:/var/run/debug/usr/lib64 /var/run/debug/usr/bin/gdbserver \\\n localhost:1234 \\\n /usr/libexec/qemu-kvm $@ &\nprintf \"%d\" $(pgrep gdbserver) > /run/libvirt/qemu/run/default_vmi-debug-tools.pid\n
First, we need to build and push the image with the wrapping script and the gdbserver:
FROM quay.io/centos/centos:stream9 as build\n\nENV DIR /debug-tools\nENV DEBUGINFOD_URLS https://debuginfod.centos.org/\nRUN mkdir -p ${DIR}/logs\n\nRUN yum install --installroot=${DIR} -y gdb-gdbserver && yum clean all\n\nCOPY ./wrap_qemu_gdb.sh $DIR/wrap_qemu_gdb.sh\nRUN chmod 0755 ${DIR}/wrap_qemu_gdb.sh\nRUN chown 107:107 ${DIR}/wrap_qemu_gdb.sh\nRUN chown 107:107 ${DIR}/logs\n
Then, we can create and populate the debug-tools
PVC as with did in the strace example:
$ k apply -f debug-tools-pvc.yaml\npersistentvolumeclaim/debug-tools created\n$ kubectl apply -f populate-job-pvc.yaml\njob.batch/populate-pvc created\n$ $ kubectl get jobs\nNAME COMPLETIONS DURATION AGE\npopulate-pvc 1/1 7s 2m12s\n
Configmap:
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: my-config-map\ndata:\n my_script.sh: |\n #!/bin/sh\n tempFile=`mktemp --dry-run`\n echo $4 > $tempFile\n sed -i \"s|<emulator>/usr/libexec/qemu-kvm</emulator>|<emulator>/var/run/debug/wrap_qemu_gdb.sh</emulator>|\" $tempFile\n cat $tempFile\n
As last step, we need to create the configmaps to modify the VM XML:
$ kubectl apply -f configmap.yaml\nconfigmap/my-config-map created\n
"},{"location":"debug_virt_stack/launch-qemu-gdb/#build-client-image","title":"Build client image","text":"In this scenario, we use an additional container image containing gdb
and the same qemu binary as the target process to debug. This image will be run locally with podman
.
In order to build this image, we need to identify the image of the virt-launcher
container we want to debug. Based on the KubeVirt installation, the namespace and the name of the KubeVirt CR could vary. In this example, we'll assume that KubeVirt CR is called kubevirt
and installed in the kubevirt
namespace.
You can easily find out the right names in your cluster by searching with:
$ kubectl get kubevirt -A\nNAMESPACE NAME AGE PHASE\nkubevirt kubevirt 3h11m Deployed\n
The steps to build the image are:
Get the registry of the images of the KubeVirt installation:
$ export registry=$(kubectl get kubevirt kubevirt -n kubevirt -o jsonpath='{.status.observedDeploymentConfig}' |jq '.registry'|tr -d \"\\\"\")\n$ echo $registry\n\"registry:5000/kubevirt\"\n
Get the shasum of the virt-launcher image:
$ export tag=$(kubectl get kubevirt kubevirt -n kubevirt -o jsonpath='{.status.observedDeploymentConfig}' |jq '.virtLauncherSha'|tr -d \"\\\"\")\n$ echo $tag\n\"sha256:6c8b85eed8e83a4c70779836b246c057d3e882eb513f3ded0a02e0a4c4bda837\"\n
Example of Dockerfile:
ARG registry\nARG tag\nFROM ${registry}/kubevirt/virt-launcher${tag} AS launcher\nFROM quay.io/centos/centos:stream9 as build\n\nRUN yum install -y gdb && yum clean all\n\nCOPY --from=launcher /usr/libexec/qemu-kvm /usr/libexec/qemu-kvm\n
registry
and the tag
retrieved in the previous steps: $ podman build \\\n -t gdb-client \\\n --build-arg registry=$registry \\\n --build-arg tag=@$tag \\\n -f Dockerfile.client .\n
Podman will replace the registry and tag arguments provided on the command line. In this way, we can specify the image registry and shasum for the KubeVirt version to debug.
"},{"location":"debug_virt_stack/launch-qemu-gdb/#run-the-vm-to-troubleshoot","title":"Run the VM to troubleshoot","text":"For this example, we add an annotation to keep the virt-launcher pod running even if any errors occur:
metadata:\n annotations:\n kubevirt.io/keep-launcher-alive-after-failure: \"true\"\n
Then, we can launch the VM:
$ kubectl apply -f debug-vmi.yaml\nvirtualmachineinstance.kubevirt.io/vmi-debug-tools created\n$ kubectl get vmi\nNAME AGE PHASE IP NODENAME READY\nvmi-debug-tools 28s Scheduled node01 False\n$ kubectl get po\nNAME READY STATUS RESTARTS AGE\npopulate-pvc-dnxld 0/1 Completed 0 4m17s\nvirt-launcher-vmi-debug-tools-tfh28 4/4 Running 0 25s\n
The wrapping script starts the gdbserver
and expose in the port 1234
inside the container. In order to be able to connect remotely to the gdbserver, we can use the command kubectl port-forward
to expose the gdb port on our machine.
$ kubectl port-forward virt-launcher-vmi-debug-tools-tfh28 1234\nForwarding from 127.0.0.1:1234 -> 1234\nForwarding from [::1]:1234 -> 1234\n
Finally, we can start the gbd client in the container:
$ podman run -ti --network host gdb-client:latest\n$ gdb /usr/libexec/qemu-kvm -ex 'target remote localhost:1234'\nGNU gdb (GDB) Red Hat Enterprise Linux 10.2-12.el9\nCopyright (C) 2021 Free Software Foundation, Inc.\nLicense GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>\nThis is free software: you are free to change and redistribute it.\nThere is NO WARRANTY, to the extent permitted by law.\nType \"show copying\" and \"show warranty\" for details.\nThis GDB was configured as \"x86_64-redhat-linux-gnu\".\nType \"show configuration\" for configuration details.\nFor bug reporting instructions, please see:\n<https://www.gnu.org/software/gdb/bugs/>.\nFind the GDB manual and other documentation resources online at:\n <http://www.gnu.org/software/gdb/documentation/>.\n\nFor help, type \"help\".\n--Type <RET> for more, q to quit, c to continue without paging--\nType \"apropos word\" to search for commands related to \"word\"...\nReading symbols from /usr/libexec/qemu-kvm...\n\nReading symbols from /root/.cache/debuginfod_client/26221a84fabd219a68445ad0cc87283e881fda15/debuginfo...\nRemote debugging using localhost:1234\nReading /lib64/ld-linux-x86-64.so.2 from remote target...\nwarning: File transfers from remote targets can be slow. Use \"set sysroot\" to access files locally instead.\nReading /lib64/ld-linux-x86-64.so.2 from remote target...\nReading symbols from target:/lib64/ld-linux-x86-64.so.2...\nDownloading separate debug info for /system-supplied DSO at 0x7ffc10eff000...\n0x00007f1a70225e70 in _start () from target:/lib64/ld-linux-x86-64.so.2\n
For simplicity, we started podman with the option --network host
in this way, the container is able to access any port mapped on the host.
This guide explains how launch QEMU with a debugging tool in virt-launcher pod. This method can be useful to debug early failures or starting QEMU as a child of the debug tool relying on ptrace. The second point is particularly relevant when a process is operating in a non-privileged environment since otherwise, it would need root access to be able to ptrace the process.
Ephemeral containers are among the emerging techniques to overcome the lack of debugging tool inside the original image. This solution does, however, come with a number of limitations. For example, it is possible to spawn a new container inside the same pod of the application to debug and share the same PID namespace. Though they share the same PID namespace, KubeVirt's usage of unprivileged containers makes it, for example, impossible to ptrace a running container. Therefore, this technique isn't appropriate for our needs.
Due to its security and image size reduction, KubeVirt container images are based on distroless containers. These kinds of images are extremely beneficial for deployments, but they are challenging to troubleshoot because there is no package management, which prevents the installation of additional tools on the flight.
Wrapping the QEMU binary in a script is one practical method for debugging QEMU launched by Libvirt. This script launches the QEMU as a child of this process together with the debugging tool (such as strace or valgrind).
The final part that needs to be added is the configuration for Libvirt to use the wrapped script rather than calling the QEMU program directly.
It is possible to alter the generated XML with the help of KubeVirt sidecars. This allows us to use the wrapping script in place of the built-in emulator.
The primary concept behind this configuration is that all of the additional tools, scripts, and final output files will be stored in a PerstistentVolumeClaim (PVC) that this guide refers to as debug-tools
. The virt-launcher pod that we wish to debug will have this PVC attached to it.
PVC:
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: debug-tools\nspec:\n accessModes:\n - ReadWriteOnce\n volumeMode: Filesystem\n resources:\n requests:\n storage: 1Gi\n
In this guide, we'll apply the above concepts to debug QEMU inside virt-launcher using strace without the need of build a custom virt-launcher image.
You can see a full demo of this setup:
"},{"location":"debug_virt_stack/launch-qemu-strace/#how-to-bring-the-debug-tools-and-wrapping-script-into-distroless-containers","title":"How to bring the debug tools and wrapping script into distroless containers","text":"This section provides an example of how to provide extra tools into the distroless container that will be supplied as a PVC using a Dockerfile. Although there are several ways to accomplish this, this covers a relatively simple technique. Alternatively, you could run a pod and manually populate the PVC by execing into the pod.
Dockerfile:
FROM quay.io/centos/centos:stream9 as build\n\nENV DIR /debug-tools\nRUN mkdir -p ${DIR}/logs\n\nRUN yum install --installroot=${DIR} -y strace && yum clean all\n\nCOPY ./wrap_qemu_strace.sh $DIR/wrap_qemu_strace.sh\nRUN chmod 0755 ${DIR}/wrap_qemu_strace.sh\nRUN chown 107:107 ${DIR}/wrap_qemu_strace.sh\nRUN chown 107:107 ${DIR}/logs\n
The directory debug-tools
stores the content that will be later copied inside the debug-tools
PVC. We are essentially adding the missing utilities in the custom directory with yum install --installroot=${DIR}}
, and the parent image matches with the parent images of virt-launcher.
The wrap_qemu_strace.sh
is the wrapping script that will be used to launch QEMU with strace
similarly as the example with valgrind
.
#!/bin/bash\n\nLD_LIBRARY_PATH=$LD_LIBRARY_PATH:/var/run/debug/usr/lib64 /var/run/debug/usr/bin/strace \\\n -o /var/run/debug/logs/strace.out \\\n /usr/libexec/qemu-kvm $@\n
It is important to set the dynamic library path LD_LIBRARY_PATH
to the path where the PVC will be mounted in the virt-launcher container.
Then, you will simply need to build the image and your debug setup is ready. The Dockerfle and the script wrap_qemu_strace.sh
need to be in the same directory where you run the command.
$ podman build -t debug .\n
The second step is to populate the PVC. This can be easily achieved using a kubernetes Job
like:
apiVersion: batch/v1\nkind: Job\nmetadata:\n name: populate-pvc\nspec:\n template:\n spec:\n volumes:\n - name: populate\n persistentVolumeClaim:\n claimName: debug-tools\n containers:\n - name: populate\n image: registry:5000/debug:latest\n command: [\"sh\", \"-c\", \"cp -r /debug-tools/* /vol\"]\n imagePullPolicy: Always\n volumeMounts:\n - mountPath: \"/vol\"\n name: populate\n restartPolicy: Never\n backoffLimit: 4\n
The image referenced in the Job
is the image we built in the previous step. Once applied this and the job completed, thedebug-tools
PVC is ready to be used.
This part is achieved by using ConfigMaps and a KubeVirt sidecar (more details in the section Using ConfigMap to run custom script).
Configmap:
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: my-config-map\ndata:\n my_script.sh: |\n #!/bin/sh\n tempFile=`mktemp --dry-run`\n echo $4 > $tempFile\n sed -i \"s|<emulator>/usr/libexec/qemu-kvm</emulator>|<emulator>/var/run/debug/wrap_qemu_strace.sh</emulator>|\" $tempFile\n cat $tempFile\n
The script that replaces the QEMU binary with the wrapping script in the XML is stored in the configmap my-config-map
. This script will run as a hook, as explained in full in the documentation for the KubeVirt sidecar.
Once all the objects created, we can finally run the guest to debug.
VMI:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n hooks.kubevirt.io/hookSidecars: '[{\"args\": [\"--version\", \"v1alpha2\"],\n \"image\":\"registry:5000/kubevirt/sidecar-shim:devel\",\n \"pvc\": {\"name\": \"debug-tools\",\"volumePath\": \"/debug\", \"sharedComputePath\": \"/var/run/debug\"},\n \"configMap\": {\"name\": \"my-config-map\",\"key\": \"my_script.sh\", \"hookPath\": \"/usr/bin/onDefineDomain\"}}]'\n labels:\n special: vmi-debug-tools\n name: vmi-debug-tools\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n
The VMI example is a simply VM instance declaration and the interesting parts are the annotations for the hook: * image
refers to the sidecar-shim already built and shipped with KubeVirt * pvc
refers to the PVC populated with the debug setup. The name
refers to the claim name, the volumePath
is the path inside the sidecar container where the volume is mounted while the sharedComputePath
is the path of the same volume inside the compute container. * configMap
refers to the confimap containing the script to modify the XML for the wrapping script
Once the VM is declared, the hook will modify the emulator section and Libvirt will call the wrapping script instead of QEMU directly.
"},{"location":"debug_virt_stack/launch-qemu-strace/#how-to-fetch-the-output","title":"How to fetch the output","text":"The wrapping script configures strace
to store the output in the PVC. In this way, it is possible to retrieve the output file in a later time, for example using an additional pod like:
apiVersion: v1\nkind: Pod\nmetadata:\n name: fetch-logs\nspec:\n securityContext:\n runAsUser: 107\n fsGroup: 107\n volumes:\n - name: populate\n persistentVolumeClaim:\n claimName: debug-tools\n containers:\n - name: populate\n image: busybox:latest\n command: [\"tail\", \"-f\", \"/dev/null\"]\n volumeMounts:\n - mountPath: \"/vol\"\n name: populate\n
It is then possible to copy the file locally with:
$ kubectl cp fetch-logs:/vol/logs/strace.out strace.out\n
"},{"location":"debug_virt_stack/logging/","title":"Control libvirt logging for each component","text":"Generally, cluster admins can control the log verbosity of each KubeVirt component in KubeVirt CR. For more details, please, check the KubeVirt documentation.
Nonetheless, regular users can also adjust the qemu component logging to have a finer control over it. The annotation kubevirt.io/libvirt-log-filters
enables you to modify each component's log level.
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n kubevirt.io/libvirt-log-filters: \"2:qemu.qemu_monitor 3:*\"\n labels:\n special: vmi-debug-tools\n name: vmi-debug-tools\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n
Then, it is possible to obtain the logs from the virt-launcher output:
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-debug-tools-fk64q 3/3 Running 0 64s\n$ kubectl logs virt-launcher-vmi-debug-tools-fk64q\n[..]\n{\"component\":\"virt-launcher\",\"level\":\"info\",\"msg\":\"QEMU_MONITOR_RECV_EVENT: mon=0x7faa8801f5d0 event={\\\"timestamp\\\": {\\\"seconds\\\": 1698324640, \\\"microseconds\\\": 523652}, \\\"event\\\": \\\"NIC_RX_FILTER_CHANGED\\\", \\\"data\\\": {\\\"name\\\": \\\"ua-default\\\", \\\"path\\\": \\\"/machine/peripheral/ua-default/virtio-backend\\\"}}\",\"pos\":\"qemuMonitorJSONIOProcessLine:205\",\"subcomponent\":\"libvirt\",\"thread\":\"80\",\"timestamp\":\"2023-10-26T12:50:40.523000Z\"}\n{\"component\":\"virt-launcher\",\"level\":\"info\",\"msg\":\"QEMU_MONITOR_RECV_EVENT: mon=0x7faa8801f5d0 event={\\\"timestamp\\\": {\\\"seconds\\\": 1698324644, \\\"microseconds\\\": 165626}, \\\"event\\\": \\\"VSERPORT_CHANGE\\\", \\\"data\\\": {\\\"open\\\": true, \\\"id\\\": \\\"channel0\\\"}}\",\"pos\":\"qemuMonitorJSONIOProcessLine:205\",\"subcomponent\":\"libvirt\",\"thread\":\"80\",\"timestamp\":\"2023-10-26T12:50:44.165000Z\"}\n[..]\n{\"component\":\"virt-launcher\",\"level\":\"info\",\"msg\":\"QEMU_MONITOR_RECV_EVENT: mon=0x7faa8801f5d0 event={\\\"timestamp\\\": {\\\"seconds\\\": 1698324646, \\\"microseconds\\\": 707666}, \\\"event\\\": \\\"RTC_CHANGE\\\", \\\"data\\\": {\\\"offset\\\": 0, \\\"qom-path\\\": \\\"/machine/unattached/device[8]\\\"}}\",\"pos\":\"qemuMonitorJSONIOProcessLine:205\",\"subcomponent\":\"libvirt\",\"thread\":\"80\",\"timestamp\":\"2023-10-26T12:50:46.708000Z\"}\n[..]\n
The annotation enables the filter from the container creation. However, in certain cases you might desire to change the logging level dynamically once the container and libvirt have already been started. In this case, virt-admin
comes to the rescue.
Example:
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-ephemeral-nqcld 3/3 Running 0 26m\n$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virt-admin -c virtqemud:///session daemon-log-filters \"1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util\"\n$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virt-admin -c virtqemud:///session daemon-log-filters\n Logging filters: 1:*libvirt* 1:*qemu* 1:*conf* 1:*security* 3:*event* 3:*json* 3:*file* 3:*object* 1:*util*\n
Otherwise, if you prefer to redirect the output to a file and fetch it later, you can rely on kubectl cp
to retrieve the file. In this case, we are saving the file in the /var/run/libvirt
directory because the compute container has the permissions to write there.
Example:
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-ephemeral-nqcld 3/3 Running 0 26m\n$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virt-admin -c virtqemud:///session daemon-log-outputs \"1:file:/var/run/libvirt/libvirtd.log\"\n$ kubectl cp virt-launcher-vmi-ephemeral-nqcld:/var/run/libvirt/libvirtd.log libvirt-kubevirt.log\ntar: Removing leading `/' from member names\n
"},{"location":"debug_virt_stack/privileged-node-debugging/","title":"Privileged debugging on the node","text":"This article describes the scenarios in which you can create privileged pods and have root access to the cluster nodes.
With privileged pods, you may access devices in /dev
, utilize host namespaces and ptrace processes that are running on the node, and use the hostPath
volume to mount node directories in the container.
A quick way to verify if you are allowed to create privileged pods is to create a sample pod with the --dry-run=server
option, like:
$ kubectl apply -f debug-pod.ymal --dry-run=server\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#build-the-container-image","title":"Build the container image","text":"KubeVirt uses distroless containers and those images don't have a package manager, for this reason it isn't possible to use the image as parent for installing additional packages.
In certain debugging scenarios, the tools require to have exactly the same binary available. However, if the debug tools are operating in a different container, this can be especially difficult as the filesystems of the containers are isolated.
This section will cover how to build a container image with the debug tools plus binaries of the KubeVirt version you want to debug.
Based on your installation the namespace and the name of the KubeVirt CR could vary. In this example, we'll assume that KubeVirt CR is called kubevirt
and installed in the kubevirt
namespace. You can easily find out how it is called in your cluster by searching with kubectl get kubevirt -A
. This is necessary as we need to retrieve the original virt-launcher
image to have exactly the same QEMU binary we want to debug.
Get the registry of the images of the KubeVirt installation:
$ export registry=$(kubectl get kubevirt kubevirt -n kubevirt -o jsonpath='{.status.observedDeploymentConfig}' |jq '.registry'|tr -d \"\\\"\")\n$ echo $registry\n\"registry:5000/kubevirt\"\n
Get the shasum of the virt-launcher image:
$ export tag=$(kubectl get kubevirt kubevirt -n kubevirt -o jsonpath='{.status.observedDeploymentConfig}' |jq '.virtLauncherSha'|tr -d \"\\\"\")\n$ echo $tag\n\"sha256:6c8b85eed8e83a4c70779836b246c057d3e882eb513f3ded0a02e0a4c4bda837\"\n
Dockerfile:
ARG registry\nARG tag\nFROM ${registry}/kubevirt/virt-launcher${tag} AS launcher\n\nFROM quay.io/centos/centos:stream9\n\nRUN yum install -y \\\n gdb \\\n kernel-devel \\\n qemu-kvm-tools \\\n strace \\\n systemtap-client \\\n systemtap-devel \\\n && yum clean all\nCOPY --from=launcher / /\n
Then, we can build the image by using the registry
and the tag
retrieved in the previous steps:
$ podman build \\\n -t debug-tools \\\n --build-arg registry=$registry \\\n --build-arg tag=@$tag \\\n -f Dockerfile .\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#deploy-the-privileged-debug-pod","title":"Deploy the privileged debug pod","text":"This is an example that gives you a couple of suggestions how you can define your debugging pod:
kind: Pod\nmetadata:\n name: node01-debug\nspec:\n containers:\n - command:\n - /bin/sh\n image: registry:5000/debug-tools:latest\n imagePullPolicy: Always\n name: debug\n securityContext:\n privileged: true\n runAsUser: 0\n stdin: true\n stdinOnce: true\n tty: true\n volumeMounts:\n - mountPath: /host\n name: host\n - mountPath: /usr/lib/modules\n name: modules\n - mountPath: /sys/kernel\n name: sys-kernel\n hostNetwork: true\n hostPID: true\n nodeName: node01\n restartPolicy: Never\n volumes:\n - hostPath:\n path: /\n type: Directory\n name: host\n - hostPath:\n path: /usr/lib/modules\n type: Directory\n name: modules\n - hostPath:\n path: /sys/kernel\n type: Directory\n name: sys-kernel\n
The privileged
option is required to have access to mostly all the resources on the node.
The nodeName
ensures that the debugging pod will be scheduled on the desired node. In order to select the right now, you can use the -owide
option with kubectl get po
and this will report the nodes where the pod is running.
Example:
k get pods -owide\nNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES\nlocal-volume-provisioner-4jtkb 1/1 Running 0 152m 10.244.196.129 node01 <none> <none>\nnode01-debug 1/1 Running 0 44m 192.168.66.101 node01 <none> <none>\nvirt-launcher-vmi-ephemeral-xg98p 3/3 Running 0 2m54s 10.244.196.148 node01 <none> 1/1\n
In the volumes
section, you can specify the directories you want to be directly mounted in the debugging container. For example, /usr/lib/modules
is particularly useful if you need to load some kernel modules.
Sharing the host pid namespace with the option hostPID
allows you to see all the processes on the node and attach to it with tools like gdb
and strace
.
exec
-ing into the pod gives you a shell with privileged access to the node plus the tooling you installed into the image:
$ kubectl exec -ti debug -- bash\n
The following examples assume you have already execed into the node01-debug
pod.
The tool vist-host-validate
is utility to validate the host to run libvirt hypervisor. This, for example, can be used to check if a particular node is kvm capable.
Example:
$ virt-host-validate\n QEMU: Checking for hardware virtualization : PASS\n QEMU: Checking if device /dev/kvm exists : PASS\n QEMU: Checking if device /dev/kvm is accessible : PASS\n QEMU: Checking if device /dev/vhost-net exists : PASS\n QEMU: Checking if device /dev/net/tun exists : PASS\n QEMU: Checking for cgroup 'cpu' controller support : PASS\n QEMU: Checking for cgroup 'cpuacct' controller support : PASS\n QEMU: Checking for cgroup 'cpuset' controller support : PASS\n QEMU: Checking for cgroup 'memory' controller support : PASS\n QEMU: Checking for cgroup 'devices' controller support : PASS\n QEMU: Checking for cgroup 'blkio' controller support : PASS\n QEMU: Checking for device assignment IOMMU support : PASS\n QEMU: Checking if IOMMU is enabled by kernel : PASS\n QEMU: Checking for secure guest support : WARN (Unknown if this platform has Secure\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#run-a-command-directly-on-the-node","title":"Run a command directly on the node","text":"The debug container has in the volume section the host filesystem mounted under /host
. This can be particularly useful if you want to access the node filesystem or execute a command directly on the host. However, the tool needs already to be present on the node.
# chroot /host\nsh-5.1# cat /etc/os-release\nNAME=\"CentOS Stream\"\nVERSION=\"9\"\nID=\"centos\"\nID_LIKE=\"rhel fedora\"\nVERSION_ID=\"9\"\nPLATFORM_ID=\"platform:el9\"\nPRETTY_NAME=\"CentOS Stream 9\"\nANSI_COLOR=\"0;31\"\nLOGO=\"fedora-logo-icon\"\nCPE_NAME=\"cpe:/o:centos:centos:9\"\nHOME_URL=\"https://centos.org/\"\nBUG_REPORT_URL=\"https://bugzilla.redhat.com/\"\nREDHAT_SUPPORT_PRODUCT=\"Red Hat Enterprise Linux 9\"\nREDHAT_SUPPORT_PRODUCT_VERSION=\"CentOS Stream\"\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#attach-to-a-running-process-eg-strace-or-gdb","title":"Attach to a running process (e.g strace or gdb)","text":"This requires the field hostPID: true
in this way you are able to list all the processes running on the node.
$ ps -ef |grep qemu-kvm\nqemu 50122 49850 0 12:34 ? 00:00:25 /usr/libexec/qemu-kvm -name guest=default_vmi-ephemeral,debug-threads=on -S -object {\"qom-type\":\"secret\",\"id\":\"masterKey0\",\"format\":\"raw\",\"file\":\"/var/run/kubevirt-private/libvirt/qemu/lib/domain-1-default_vmi-ephemera/master-key.aes\"} -machine pc-q35-rhel9.2.0,usb=off,dump-guest-core=off,memory-backend=pc.ram,acpi=on -accel kvm -cpu Skylake-Client-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,md-clear=on,stibp=on,flush-l1d=on,arch-capabilities=on,ssbd=on,xsaves=on,pdpe1gb=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rdctl-no=on,ibrs-all=on,skip-l1dfl-vmentry=on,mds-no=on,pschange-mc-no=on,tsx-ctrl=on,fb-clear=on,hle=off,rtm=off -m size=131072k -object {\"qom-type\":\"memory-backend-ram\",\"id\":\"pc.ram\",\"size\":134217728} -overcommit mem-lock=off -smp 1,sockets=1,dies=1,cores=1,threads=1 -object {\"qom-type\":\"iothread\",\"id\":\"iothread1\"} -uuid b56f06f0-07e9-4fe5-8913-18a14e83a4d1 -smbios type=1,manufacturer=KubeVirt,product=None,uuid=b56f06f0-07e9-4fe5-8913-18a14e83a4d1,family=KubeVirt -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=21,server=on,wait=off -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device {\"driver\":\"pcie-root-port\",\"port\":16,\"chassis\":1,\"id\":\"pci.1\",\"bus\":\"pcie.0\",\"multifunction\":true,\"addr\":\"0x2\"} -device {\"driver\":\"pcie-root-port\",\"port\":17,\"chassis\":2,\"id\":\"pci.2\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x1\"} -device {\"driver\":\"pcie-root-port\",\"port\":18,\"chassis\":3,\"id\":\"pci.3\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x2\"} -device {\"driver\":\"pcie-root-port\",\"port\":19,\"chassis\":4,\"id\":\"pci.4\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x3\"} -device {\"driver\":\"pcie-root-port\",\"port\":20,\"chassis\":5,\"id\":\"pci.5\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x4\"} -device {\"driver\":\"pcie-root-port\",\"port\":21,\"chassis\":6,\"id\":\"pci.6\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x5\"} -device {\"driver\":\"pcie-root-port\",\"port\":22,\"chassis\":7,\"id\":\"pci.7\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x6\"} -device {\"driver\":\"pcie-root-port\",\"port\":23,\"chassis\":8,\"id\":\"pci.8\",\"bus\":\"pcie.0\",\"addr\":\"0x2.0x7\"} -device {\"driver\":\"pcie-root-port\",\"port\":24,\"chassis\":9,\"id\":\"pci.9\",\"bus\":\"pcie.0\",\"addr\":\"0x3\"} -device {\"driver\":\"virtio-scsi-pci-non-transitional\",\"id\":\"scsi0\",\"bus\":\"pci.5\",\"addr\":\"0x0\"} -device {\"driver\":\"virtio-serial-pci-non-transitional\",\"id\":\"virtio-serial0\",\"bus\":\"pci.6\",\"addr\":\"0x0\"} -blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt/container-disks/disk_0.img\",\"node-name\":\"libvirt-2-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"} -blockdev {\"node-name\":\"libvirt-2-format\",\"read-only\":true,\"discard\":\"unmap\",\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"qcow2\",\"file\":\"libvirt-2-storage\"} -blockdev {\"driver\":\"file\",\"filename\":\"/var/run/kubevirt-ephemeral-disks/disk-data/containerdisk/disk.qcow2\",\"node-name\":\"libvirt-1-storage\",\"cache\":{\"direct\":true,\"no-flush\":false},\"auto-read-only\":true,\"discard\":\"unmap\"} -blockdev {\"node-name\":\"libvirt-1-format\",\"read-only\":false,\"discard\":\"unmap\",\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"qcow2\",\"file\":\"libvirt-1-storage\",\"backing\":\"libvirt-2-format\"} -device {\"driver\":\"virtio-blk-pci-non-transitional\",\"bus\":\"pci.7\",\"addr\":\"0x0\",\"drive\":\"libvirt-1-format\",\"id\":\"ua-containerdisk\",\"bootindex\":1,\"write-cache\":\"on\",\"werror\":\"stop\",\"rerror\":\"stop\"} -netdev {\"type\":\"tap\",\"fd\":\"22\",\"vhost\":true,\"vhostfd\":\"24\",\"id\":\"hostua-default\"} -device {\"driver\":\"virtio-net-pci-non-transitional\",\"host_mtu\":1480,\"netdev\":\"hostua-default\",\"id\":\"ua-default\",\"mac\":\"7e:cb:ba:c3:71:88\",\"bus\":\"pci.1\",\"addr\":\"0x0\",\"romfile\":\"\"} -add-fd set=0,fd=20,opaque=serial0-log -chardev socket,id=charserial0,fd=18,server=on,wait=off,logfile=/dev/fdset/0,logappend=on -device {\"driver\":\"isa-serial\",\"chardev\":\"charserial0\",\"id\":\"serial0\",\"index\":0} -chardev socket,id=charchannel0,fd=19,server=on,wait=off -device {\"driver\":\"virtserialport\",\"bus\":\"virtio-serial0.0\",\"nr\":1,\"chardev\":\"charchannel0\",\"id\":\"channel0\",\"name\":\"org.qemu.guest_agent.0\"} -audiodev {\"id\":\"audio1\",\"driver\":\"none\"} -vnc vnc=unix:/var/run/kubevirt-private/3a8f7774-7ec7-4cfb-97ce-581db52ee053/virt-vnc,audiodev=audio1 -device {\"driver\":\"VGA\",\"id\":\"video0\",\"vgamem_mb\":16,\"bus\":\"pcie.0\",\"addr\":\"0x1\"} -global ICH9-LPC.noreboot=off -watchdog-action reset -device {\"driver\":\"virtio-balloon-pci-non-transitional\",\"id\":\"balloon0\",\"free-page-reporting\":true,\"bus\":\"pci.8\",\"addr\":\"0x0\"} -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on\n$ gdb -p 50122 /usr/libexec/qemu-kvm\n
"},{"location":"debug_virt_stack/privileged-node-debugging/#debugging-using-crictl","title":"Debugging using crictl
","text":"Crictl
is a cli for CRI runtimes and can be particularly useful to troubleshoot container failures (for a more detailed guide, please refer to this Kubernetes article).
In this example, we'll concentrate to find where libvirt creates the files and directory in the compute
container of the virt-launcher pod.
$ crictl ps |grep compute\n67bc7be3222da 5ef5ba25a087a80e204f28be6c9250bbf378fd87fa927085abd516188993d695 25 minutes ago Running compute 0 7b045ea9f485f virt-launcher-vmi-ephemeral-xg98p\n$ crictl inspect 67bc7be3222da\n[..]\n \"mounts\": [\n {\n {\n \"containerPath\": \"/var/run/libvirt\",\n \"hostPath\": \"/var/lib/kubelet/pods/2ccc3e93-d1c3-4f22-bb31-321bfa74edf6/volumes/kubernetes.io~empty-dir/libvirt-runtime\",\n \"propagation\": \"PROPAGATION_PRIVATE\",\n \"readonly\": false,\n \"selinuxRelabel\": true\n },\n[..]\n$ ls /var/lib/kubelet/pods/2ccc3e93-d1c3-4f22-bb31-321bfa74edf6/volumes/kubernetes.io~empty-dir/libvirt-runtime/\ncommon qemu virtlogd-sock virtqemud-admin-sock virtqemud.conf\nhostdevmgr virtlogd-admin-sock virtlogd.pid virtqemud-sock virtqemud.pid\n
"},{"location":"debug_virt_stack/virsh-commands/","title":"Execute virsh commands in virt-launcher pod","text":"A powerful utility to check and troubleshoot the VM state is virsh
and the utility is already installed in the compute
container on the virt-launcher pod.
For example, it possible to run any QMP commands.
For a full list of QMP command, please refer to the QEMU documentation.
$ kubectl get po\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-ephemeral-xg98p 3/3 Running 0 44m\n$ kubectl exec -ti virt-launcher-vmi-debug-tools-fk64q -- bash\nbash-5.1$ virsh list\n Id Name State\n-----------------------------------------\n 1 default_vmi-debug-tools running\nbash-5.1$ virsh qemu-monitor-command default_vmi-debug-tools query-status --pretty\n{\n \"return\": {\n \"status\": \"running\",\n \"singlestep\": false,\n \"running\": true\n },\n \"id\": \"libvirt-439\"\n}\n$ virsh qemu-monitor-command default_vmi-debug-tools query-kvm --pretty\n{\n \"return\": {\n \"enabled\": true,\n \"present\": true\n },\n \"id\": \"libvirt-438\"\n}\n
Another useful virsh command is the qemu-monitor-event
. Once invoked, it observes and reports the QEMU events.
The following example shows the events generated for pausing and unpausing the guest.
$ kubectl get po\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-ephemeral-nqcld 3/3 Running 0 57m\n$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virsh qemu-monitor-event --pretty --loop\n
Then, you can, for example, pause and then unpause the guest and check the triggered events:
$ virtctl pause vmi vmi-ephemeral\nVMI vmi-ephemeral was scheduled to pause\n $ virtctl unpause vmi vmi-ephemeral\nVMI vmi-ephemeral was scheduled to unpause\n
From the monitored events:
$ kubectl exec -ti virt-launcher-vmi-ephemeral-nqcld -- virsh qemu-monitor-event --pretty --loop\nevent STOP at 1698405797.422823 for domain 'default_vmi-ephemeral': <null>\nevent RESUME at 1698405823.162458 for domain 'default_vmi-ephemeral': <null>\n
"},{"location":"network/dns/","title":"DNS records","text":"In order to create unique DNS records per VirtualMachineInstance, it is possible to set spec.hostname
and spec.subdomain
. If a subdomain is set and a headless service with a name, matching the subdomain, exists, kube-dns will create unique DNS entries for every VirtualMachineInstance which matches the selector of the service. Have a look at the DNS for Services and Pods documentation for additional information.
The following example consists of a VirtualMachine and a headless Service which matches the labels and the subdomain of the VirtualMachineInstance:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: vmi-fedora\n labels:\n expose: me\nspec:\n hostname: \"myvmi\"\n subdomain: \"mysubdomain\"\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-registry-disk-demo:latest\n - cloudInitNoCloud:\n userDataBase64: IyEvYmluL2Jhc2gKZWNobyAiZmVkb3JhOmZlZG9yYSIgfCBjaHBhc3N3ZAo=\n name: cloudinitdisk\n---\napiVersion: v1\nkind: Service\nmetadata:\n name: mysubdomain\nspec:\n selector:\n expose: me\n clusterIP: None\n ports:\n - name: foo # Actually, no port is needed.\n port: 1234\n targetPort: 1234\n
As a consequence, when we enter the VirtualMachineInstance via e.g. virtctl console vmi-fedora
and ping myvmi.mysubdomain
we see that we find a DNS entry for myvmi.mysubdomain.default.svc.cluster.local
which points to 10.244.0.57
, which is the IP of the VirtualMachineInstance (not of the Service):
[fedora@myvmi ~]$ ping myvmi.mysubdomain\nPING myvmi.mysubdomain.default.svc.cluster.local (10.244.0.57) 56(84) bytes of data.\n64 bytes from myvmi.mysubdomain.default.svc.cluster.local (10.244.0.57): icmp_seq=1 ttl=64 time=0.029 ms\n[fedora@myvmi ~]$ ip a\n2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000\n link/ether 0a:58:0a:f4:00:39 brd ff:ff:ff:ff:ff:ff\n inet 10.244.0.57/24 brd 10.244.0.255 scope global dynamic eth0\n valid_lft 86313556sec preferred_lft 86313556sec\n inet6 fe80::858:aff:fef4:39/64 scope link\n valid_lft forever preferred_lft forever\n
So spec.hostname
and spec.subdomain
get translated to a DNS A-record of the form <vmi.spec.hostname>.<vmi.spec.subdomain>.<vmi.metadata.namespace>.svc.cluster.local
. If no spec.hostname
is set, then we fall back to the VirtualMachineInstance name itself. The resulting DNS A-record looks like this then: <vmi.metadata.name>.<vmi.spec.subdomain>.<vmi.metadata.namespace>.svc.cluster.local
.
Release: - v1.1.0: Alpha - v1.3.0: Beta
KubeVirt supports hotplugging and unplugging network interfaces into a running Virtual Machine (VM).
Hotplug is supported for interfaces using the virtio
model connected through bridge binding or SR-IOV binding.
Hot-unplug is supported only for interfaces connected through bridge binding.
"},{"location":"network/hotplug_interfaces/#requirements","title":"Requirements","text":"Adding an interface to a KubeVirt Virtual Machine requires first an interface to be added to a running pod. This is not trivial, and has some requirements:
Network interface hotplug support must be enabled via a feature gate. The feature gates array in the KubeVirt CR must feature HotplugNICs
.
First start a VM. You can refer to the following example:
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\nspec:\n running: true\n template:\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n interfaces:\n - masquerade: {}\n name: defaultnetwork\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: defaultnetwork\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n
You should configure a network attachment definition - where the pod interface configuration is held. The snippet below shows an example of a very simple one:
apiVersion: k8s.cni.cncf.io/v1\nkind: NetworkAttachmentDefinition\nmetadata:\n name: new-fancy-net\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"type\": \"bridge\",\n \"mtu\": 1300,\n \"name\":\"new-fancy-net\"\n }'\n
Please refer to the Multus documentation for more information. Once the virtual machine is running, and the attachment configuration provisioned, the user can request the interface hotplug operation by editing the VM spec template and adding the desired interface and network:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # new interface\n - name: dyniface1\n bridge: {}\n networks:\n - name: defaultnetwork\n pod: {}\n # new network\n - name: dyniface1\n multus:\n networkName: new-fancy-net\n ...\n
Note: virtctl
addinterface
and removeinterface
commands are no longer available, hotplug/unplug interfaces is done by editing the VM spec template.
The interface and network will be added to the corresponding VMI object as well by Kubevirt.
You can now check the VMI status for the presence of this new interface:
kubectl get vmi vm-fedora -ojsonpath=\"{ @.status.interfaces }\"\n
"},{"location":"network/hotplug_interfaces/#removing-an-interface-from-a-running-vm","title":"Removing an interface from a running VM","text":"Following the example above, the user can request an interface unplug operation by editing the VM spec template and set the desired interface state to absent
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # set the interface state to absent \n - name: dyniface1\n state: absent\n bridge: {}\n networks:\n - name: defaultnetwork\n pod: {}\n - name: dyniface1\n multus:\n networkName: new-fancy-net\n
The interface in the corresponding VMI object will be set with state 'absent' as well by Kubevirt. Note: Existing VMs from version v0.59.0 and below do not support hot-unplug interfaces.
"},{"location":"network/hotplug_interfaces/#migration-based-hotplug","title":"Migration based hotplug","text":"In case your cluster doesn't run Multus as thick plugin and Multus Dynamic Networks controller, it's possible to hotplug an interface by migrating the VM.
The actual attachment won't take place immediately, and the new interface will be available in the guest once the migration is completed.
"},{"location":"network/hotplug_interfaces/#add-new-interface","title":"Add new interface","text":"Add the desired interface and network to the VM spec template:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # new interface\n - name: dyniface1\n bridge: {}\n networks:\n - name: defaultnetwork\n pod: {}\n # new network\n - name: dyniface1\n multus:\n networkName: new-fancy-net\n ...\n
At this point the interface and network will be added to the corresponding VMI object as well, but won't be attached to the guest.
"},{"location":"network/hotplug_interfaces/#migrate-the-vm","title":"Migrate the VM","text":"cat <<EOF kubectl apply -f -\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceMigration\nmetadata:\n name: migration-job\nspec:\n vmiName: vmi-fedora\nEOF\n
Please refer to the Live Migration documentation for more information. Once the migration is completed the VM will have the new interface attached.
Note: It is recommended to avoid performing migrations in parallel to a hotplug operation. It is safer to assure hotplug succeeded or at least reached the VMI specification before issuing a migration.
"},{"location":"network/hotplug_interfaces/#remove-interface","title":"Remove interface","text":"Set the desired interface state to absent
in the VM spec template:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # set the interface state to absent \n - name: dyniface1\n state: absent\n bridge: {}\n networks:\n - name: defaultnetwork\n pod: {}\n - name: dyniface1\n multus:\n networkName: new-fancy-net\n
At this point the subject interface should be detached from the guest but exist in the pod.
Note: Existing VMs from version v0.59.0 and below do not support hot-unplug interfaces.
"},{"location":"network/hotplug_interfaces/#migrate-the-vm_1","title":"Migrate the VM","text":"cat <<EOF kubectl apply -f -\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceMigration\nmetadata:\n name: migration-job\nspec:\n vmiName: vmi-fedora\nEOF\n
Please refer to the Live Migration documentation for more information. Once the VM is migrated, the interface will not exist in the migration target pod.
Note: It is recommended to avoid performing migrations in parallel to an unplug operation. It is safer to assure unplug succeeded or at least reached the VMI specification before issuing a migration.
"},{"location":"network/hotplug_interfaces/#sr-iov-interfaces","title":"SR-IOV interfaces","text":"Kubevirt supports hot-plugging of SR-IOV interfaces to running VMs.
Similar to bridge binding interfaces, edit the VM spec template and add the desired SR-IOV interface and network:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: vm-fedora\ntemplate:\n spec:\n domain:\n devices:\n interfaces:\n - name: defaultnetwork\n masquerade: {}\n # new interface\n - name: sriov-net\n sriov: {}\n networks:\n - name: defaultnetwork\n pod: {}\n # new network\n - name: sriov-net\n multus:\n networkName: sriov-net-1\n ...\n
Please refer to the Interface and Networks documentation for more information about SR-IOV networking. At this point the interface and network will be added to the corresponding VMI object as well, but won't be attached to the guest.
"},{"location":"network/hotplug_interfaces/#migrate-the-vm_2","title":"Migrate the VM","text":"cat <<EOF kubectl apply -f -\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceMigration\nmetadata:\n name: migration-job\nspec:\n vmiName: vmi-fedora\nEOF\n
Please refer to the Live Migration documentation for more information. Once the VM is migrated, the interface will not exist in the migration target pod. Due to limitation of Kubernetes device plugin API to allocate resources dynamically, the SR-IOV device plugin cannot allocate additional SR-IOV resources for Kubevirt to hotplug. Thus, SR-IOV interface hotplug is limited to migration based hotplug only, regardless of Multus \"thick\" version.
"},{"location":"network/hotplug_interfaces/#virtio-limitations","title":"Virtio Limitations","text":"The hotplugged interfaces have model: virtio
. This imposes several limitations: each interface will consume a PCI slot in the VM, and there are a total maximum of 32. Furthermore, other devices will also use these PCI slots (e.g. disks, guest-agent, etc).
Kubevirt reserves resources for 4 interface to allow later hotplug operations. The actual maximum amount of available resources depends on the machine type (e.g. q35 adds another PCI slot). For more information on maximum limits, see libvirt documentation.
Yet, upon a VM restart, the hotplugged interface will become part of the standard networks; this mitigates the maximum hotplug interfaces (per machine type) limitation.
Note: The user can execute this command against a stopped VM - i.e. a VM without an associated VMI. When this happens, KubeVirt mutates the VM spec template on behalf of the user.
"},{"location":"network/interfaces_and_networks/","title":"Interfaces and Networks","text":"Connecting a virtual machine to a network consists of two parts. First, networks are specified in spec.networks
. Then, interfaces backed by the networks are added to the VM by specifying them in spec.domain.devices.interfaces
.
Each interface must have a corresponding network with the same name.
An interface
defines a virtual network interface of a virtual machine (also called a frontend). A network
specifies the backend of an interface
and declares which logical or physical device it is connected to (also called as backend).
There are multiple ways of configuring an interface
as well as a network
.
All possible configuration options are available in the Interface API Reference and Network API Reference.
"},{"location":"network/interfaces_and_networks/#backend","title":"Backend","text":"Network backends are configured in spec.networks
. A network must have a unique name. Additional fields declare which logical or physical device the network relates to.
Each network should declare its type by defining one of the following fields:
Type Descriptionpod
Default Kubernetes network
multus
Secondary network provided using Multus
"},{"location":"network/interfaces_and_networks/#pod","title":"pod","text":"A pod
network represents the default pod eth0
interface configured by cluster network solution that is present in each pod.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: default\n masquerade: {}\n networks:\n - name: default\n pod: {} # Stock pod network\n
"},{"location":"network/interfaces_and_networks/#multus","title":"multus","text":"It is also possible to connect VMIs to secondary networks using Multus. This assumes that multus is installed across your cluster and a corresponding NetworkAttachmentDefinition
CRD was created.
The following example defines a network which uses the bridge CNI plugin, which will connect the VMI to Linux bridge br1
. Other CNI plugins such as ptp, ovs-cni, or Flannel might be used as well. For their installation and usage refer to the respective project documentation.
First the NetworkAttachmentDefinition
needs to be created. That is usually done by an administrator. Users can then reference the definition.
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: bridge-test\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"bridge-test\",\n \"type\": \"bridge\",\n \"bridge\": \"br1\",\n \"disableContainerInterface\": true\n }'\n
With following definition, the VMI will be connected to the default pod network and to the secondary Open vSwitch network.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: default\n masquerade: {}\n bootOrder: 1 # attempt to boot from an external tftp server\n dhcpOptions:\n bootFileName: default_image.bin\n tftpServerName: tftp.example.com\n - name: ovs-net\n bridge: {}\n bootOrder: 2 # if first attempt failed, try to PXE-boot from this L2 networks\n networks:\n - name: default\n pod: {} # Stock pod network\n - name: ovs-net\n multus: # Secondary multus network\n networkName: ovs-vlan-100\n
It is also possible to define a multus network as the default pod network with Multus. A version of multus after this Pull Request is required (currently master).
Note the following:
A multus default network and a pod network type are mutually exclusive.
The virt-launcher pod that starts the VMI will not have the pod network configured.
The multus delegate chosen as default must return at least one IP address.
Create a NetworkAttachmentDefinition
with IPAM.
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: bridge-test\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"bridge-test\",\n \"type\": \"bridge\",\n \"bridge\": \"br1\",\n \"ipam\": {\n \"type\": \"host-local\",\n \"subnet\": \"10.250.250.0/24\"\n }\n }'\n
Define a VMI with a Multus network as the default.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: test1\n bridge: {}\n networks:\n - name: test1\n multus: # Multus network as default\n default: true\n networkName: bridge-test\n
"},{"location":"network/interfaces_and_networks/#invalid-cnis-for-secondary-networks","title":"Invalid CNIs for secondary networks","text":"The following list of CNIs is known not to work for bridge interfaces - which are most common for secondary interfaces.
macvlan
ipvlan
The reason is similar: the bridge interface type moves the pod interface MAC address to the VM, leaving the pod interface with a different address. The aforementioned CNIs require the pod interface to have the original MAC address.
These issues are tracked individually:
macvlan
ipvlan
Feel free to discuss and / or propose fixes for them; we'd like to have these plugins as valid options on our ecosystem.
"},{"location":"network/interfaces_and_networks/#frontend","title":"Frontend","text":"Network interfaces are configured in spec.domain.devices.interfaces
. They describe properties of virtual interfaces as \"seen\" inside guest instances. The same network backend may be connected to a virtual machine in multiple different ways, each with their own connectivity guarantees and characteristics.
Each interface should declare its type by defining on of the following fields:
Type Descriptionbridge
Connect using a linux bridge
slirp
Connect using QEMU user networking mode
sriov
Pass through a SR-IOV PCI device via vfio
masquerade
Connect using Iptables rules to nat the traffic
Each interface may also have additional configuration fields that modify properties \"seen\" inside guest instances, as listed below:
Name Format Default value Descriptionmodel
One of: e1000
, e1000e
, ne2k_pci
, pcnet
, rtl8139
, virtio
virtio
NIC type
macAddress
ff:ff:ff:ff:ff:ff
or FF-FF-FF-FF-FF-FF
MAC address as seen inside the guest system, for example: de:ad:00:00:be:af
ports
empty
List of ports to be forwarded to the virtual machine.
pciAddress
0000:81:00.1
Set network interface PCI address, for example: 0000:81:00.1
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: default\n model: e1000 # expose e1000 NIC to the guest\n masquerade: {} # connect through a masquerade\n ports:\n - name: http\n port: 80\n networks:\n - name: default\n pod: {}\n
Note: For secondary interfaces, when a MAC address is specified for a virtual machine interface, it is passed to the underlying CNI plugin which is, in turn, expected to configure the backend to allow for this particular MAC. Not every plugin has native support for custom MAC addresses.
Note: For some CNI plugins without native support for custom MAC addresses, there is a workaround, which is to use the tuning
CNI plugin to adjust pod interface MAC address. This can be used as follows:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: ptp-mac\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"ptp-mac\",\n \"plugins\": [\n {\n \"type\": \"ptp\",\n \"ipam\": {\n \"type\": \"host-local\",\n \"subnet\": \"10.1.1.0/24\"\n }\n },\n {\n \"type\": \"tuning\"\n }\n ]\n }'\n
This approach may not work for all plugins. For example, OKD SDN is not compatible with tuning
plugin.
Plugins that handle custom MAC addresses natively: ovs
, bridge
.
Plugins that are compatible with tuning
plugin: flannel
, ptp
.
Plugins that don't need special MAC address treatment: sriov
(in vfio
mode).
Declare ports listen by the virtual machine
Note: When using the slirp interface only the configured ports will be forwarded to the virtual machine.
Name Format Required Descriptionname
no
Name
port
1 - 65535
yes
Port to expose
protocol
TCP,UDP
no
Connection protocol
Tip: Use e1000
model if your guest image doesn't ship with virtio drivers.
If spec.domain.devices.interfaces
is omitted, the virtual machine is connected using the default pod network interface of bridge
type. If you'd like to have a virtual machine instance without any network connectivity, you can use the autoattachPodInterface
field as follows:
kind: VM\nspec:\n domain:\n devices:\n autoattachPodInterface: false\n
"},{"location":"network/interfaces_and_networks/#mtu","title":"MTU","text":"There are two methods for the MTU to be propagated to the guest interface.
On Windows guest non virtio interfaces, MTU has to be set manually using netsh
or other tool since the Windows DHCP client doesn't request/read the MTU.
The table below is summarizing the MTU propagation to the guest.
masquerade bridge with CNI IP bridge with no CNI IP Windows virtio DHCP & libvirt DHCP & libvirt libvirt libvirt non-virtio DHCP DHCP X XIn bridge
mode, virtual machines are connected to the network backend through a linux \"bridge\". The pod network IPv4 address (if exists) is delegated to the virtual machine via DHCPv4. The virtual machine should be configured to use DHCP to acquire IPv4 addresses.
Note: If a specific MAC address is not configured in the virtual machine interface spec the MAC address from the relevant pod interface is delegated to the virtual machine.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: red\n bridge: {} # connect through a bridge\n networks:\n - name: red\n multus:\n networkName: red\n
At this time, bridge
mode doesn't support additional configuration fields.
Note: due to IPv4 address delegation, in bridge
mode the pod doesn't have an IP address configured, which may introduce issues with third-party solutions that may rely on it. For example, Istio may not work in this mode.
Note: admin can forbid using bridge
interface type for pod networks via a designated configuration flag. To achieve it, the admin should set the following option to false
:
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n configuration:\n network:\n permitBridgeInterfaceOnPodNetwork: false\n
Note: binding the pod network using bridge
interface type may cause issues. Other than the third-party issue mentioned in the above note, live migration is not allowed with a pod network binding of bridge
interface type, and also some CNI plugins might not allow to use a custom MAC address for your VM instances. If you think you may be affected by any of issues mentioned above, consider changing the default interface type to masquerade
, and disabling the bridge
type for pod network, as shown in the example above.
In slirp
mode, virtual machines are connected to the network backend using QEMU user networking mode. In this mode, QEMU allocates internal IP addresses to virtual machines and hides them behind NAT.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: red\n slirp: {} # connect using SLIRP mode\n networks:\n - name: red\n pod: {}\n
At this time, slirp
mode doesn't support additional configuration fields.
Note: in slirp
mode, the only supported protocols are TCP and UDP. ICMP is not supported.
More information about SLIRP mode can be found in QEMU Wiki.
Note: Since v1.1.0, Kubevirt delegates Slirp network configuration to the Slirp network binding plugin by default. In case the binding plugin is not registered, Kubevirt will use the following default image: quay.io/kubevirt/network-slirp-binding:20230830_638c60fc8
.
Note: In the next release (v1.2.0) no default image will be set by Kubevirt, registering an image will be mandatory.
Note: On disconnected clusters it will be necessary to mirror Slirp binding plugin image to the cluster registry.
"},{"location":"network/interfaces_and_networks/#masquerade","title":"masquerade","text":"In masquerade
mode, KubeVirt allocates internal IP addresses to virtual machines and hides them behind NAT. All the traffic exiting virtual machines is \"source NAT'ed\" using pod IP addresses; thus, cluster workloads should use the pod's IP address to contact the VM over this interface. This IP address is reported in the VMI's spec.status.interface
. A guest operating system should be configured to use DHCP to acquire IPv4 addresses.
To allow the VM to live-migrate or hard restart (both cause the VM to run on a different pod, with a different IP address) and still be reachable, it should be exposed by a Kubernetes service.
To allow traffic of specific ports into virtual machines, the template ports
section of the interface should be configured as follows. If the ports
section is missing, all ports forwarded into the VM.
kind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: red\n masquerade: {} # connect using masquerade mode\n ports:\n - port: 80 # allow incoming traffic on port 80 to get into the virtual machine\n networks:\n - name: red\n pod: {}\n
Note: Masquerade is only allowed to connect to the pod network.
Note: The network CIDR can be configured in the pod network section using the vmNetworkCIDR
attribute.
masquerade
mode can be used in IPv4 and IPv6 dual-stack clusters to provide a VM with an IP connectivity over both protocols.
As with the IPv4 masquerade
mode, the VM can be contacted using the pod's IP address - which will be in this case two IP addresses, one IPv4 and one IPv6. Outgoing traffic is also \"NAT'ed\" to the pod's respective IP address from the given family.
Unlike in IPv4, the configuration of the IPv6 address and the default route is not automatic; it should be configured via cloud init, as shown below:
kind: VM\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: red\n masquerade: {} # connect using masquerade mode\n ports:\n - port: 80 # allow incoming traffic on port 80 to get into the virtual machine\n networks:\n - name: red\n pod: {}\n volumes:\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n addresses: [ fd10:0:2::2/120 ]\n gateway6: fd10:0:2::1\n userData: |-\n #!/bin/bash\n echo \"fedora\" |passwd fedora --stdin\n
Note: The IPv6 address for the VM and default gateway must be the ones shown above.
"},{"location":"network/interfaces_and_networks/#masquerade-ipv6-single-stack-support","title":"masquerade - IPv6 single-stack support","text":"masquerade
mode can be used in IPv6 single stack clusters to provide a VM with an IPv6 only connectivity.
As with the IPv4 masquerade
mode, the VM can be contacted using the pod's IP address - which will be in this case the IPv6 one. Outgoing traffic is also \"NAT'ed\" to the pod's respective IPv6 address.
As with the dual-stack cluster, the configuration of the IPv6 address and the default route is not automatic; it should be configured via cloud init, as shown in the dual-stack section.
Unlike the dual-stack cluster, which has a DHCP server for IPv4, the IPv6 single stack cluster has no DHCP server at all. Therefore, the VM won't have the search domains information and reaching a destination using its FQDN is not possible. Tracking issue - https://github.com/kubevirt/kubevirt/issues/7184
"},{"location":"network/interfaces_and_networks/#passt","title":"passt","text":"Warning: The core binding is being deprecated and targeted for removal in v1.3 . As an alternative, the same functionality is introduced and available as a binding plugin.
passt
is a new approach for user-mode networking which can be used as a simple replacement for Slirp (which is practically dead).
passt
is a universal tool which implements a translation layer between a Layer-2 network interface and native Layer -4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host.
Its main benefits are: - doesn't require extra network capabilities as CAP_NET_RAW and CAP_NET_ADMIN. - allows integration with service meshes (which expect applications to run locally) out of the box. - supports IPv6 out of the box (in contrast to the existing bindings which require configuring IPv6 manually).
Masquerade Bridge Passt Supports migration Yes No No(will be supported in the future) VM uses Pod IP No Yes Yes(in the future it will be possible to configure the VM IP. Currently the default is the pod IP) Service Mesh out of the box No(only ISTIO is supported, adjustmets on both ISTIO and kubevirt had to be done to make it work) No Yes Doesn\u2019t require extra capabilities on the virt-launcher pod Yes(multiple workarounds had to be added to kuebivrt to make it work) No(Multiple workarounds had to be added to kuebivrt to make it work) Yes Doesn't require extra network devices on the virt-launcher pod No(bridge and tap device are created) No(bridge and tap device are created) Yes Supports IPv6 Yes(requires manual configuration on the VM) No Yeskind: VM\nspec:\n domain:\n devices:\n interfaces:\n - name: red\n passt: {} # connect using passt mode\n ports:\n - port: 8080 # allow incoming traffic on port 8080 to get into the virtual machine\n networks:\n - name: red\n pod: {}\n
"},{"location":"network/interfaces_and_networks/#requirementsrecommendations","title":"Requirements/Recommendations:","text":"sysctl -w net.core.rmem_max = 33554432\nsysctl -w net.core.wmem_max = 33554432\n
fs.file-max
should be increased (for a VM forwards all IPv4 and IPv6 ports, for TCP and UDP, passt needs to create ~2^18 sockets): sysctl -w fs.file-max = 9223372036854775807\n
NOTE: To achieve optimal memory consumption with Passt binding, specify ports required for your workload. When no ports are explicitly specified, all ports are forwarded, leading to memory overhead of up to 800 Mi.
"},{"location":"network/interfaces_and_networks/#temporary-restrictions","title":"Temporary restrictions:","text":"passt
currently only supported as primary network and doesn't allow extra multus networks to be configured on the VM.passt interfaces are feature gated; to enable the feature, follow these instructions, in order to activate the Passt
feature gate (case sensitive).
More information about passt mode can be found in passt Wiki.
"},{"location":"network/interfaces_and_networks/#virtio-net-multiqueue","title":"virtio-net multiqueue","text":"Setting the networkInterfaceMultiqueue
to true
will enable the multi-queue functionality, increasing the number of vhost queue, for interfaces configured with a virtio
model.
kind: VM\nspec:\n domain:\n devices:\n networkInterfaceMultiqueue: true\n
Users of a Virtual Machine with multiple vCPUs may benefit of increased network throughput and performance.
Currently, the number of queues is being determined by the number of vCPUs of a VM. This is because multi-queue support optimizes RX interrupt affinity and TX queue selection in order to make a specific queue private to a specific vCPU.
Without enabling the feature, network performance does not scale as the number of vCPUs increases. Guests cannot transmit or retrieve packets in parallel, as virtio-net has only one TX and RX queue.
Virtio interfaces advertise on their status.interfaces.interface entry a field named queueCount. The queueCount field indicates how many queues were assigned to the interface. Queue count value is derived from the domain XML. In case the number of queues can't be determined (i.e interface that is reported by quest-agent only), it will be omitted.
NOTE: Although the virtio-net multiqueue feature provides a performance benefit, it has some limitations and therefore should not be unconditionally enabled
"},{"location":"network/interfaces_and_networks/#some-known-limitations","title":"Some known limitations","text":"Guest OS is limited to ~200 MSI vectors. Each NIC queue requires a MSI vector, as well as any virtio device or assigned PCI device. Defining an instance with multiple virtio NICs and vCPUs might lead to a possibility of hitting the guest MSI limit.
virtio-net multiqueue works well for incoming traffic, but can occasionally cause a performance degradation, for outgoing traffic. Specifically, this may occur when sending packets under 1,500 bytes over the Transmission Control Protocol (TCP) stream.
Enabling virtio-net multiqueue increases the total network throughput, but in parallel it also increases the CPU consumption.
Enabling virtio-net multiqueue in the host QEMU config, does not enable the functionality in the guest OS. The guest OS administrator needs to manually turn it on for each guest NIC that requires this feature, using ethtool.
MSI vectors would still be consumed (wasted), if multiqueue was enabled in the host, but has not been enabled in the guest OS by the administrator.
In case the number of vNICs in a guest instance is proportional to the number of vCPUs, enabling the multiqueue feature is less important.
Each virtio-net queue consumes 64 KiB of kernel memory for the vhost driver.
NOTE: Virtio-net multiqueue should be enabled in the guest OS manually, using ethtool. For example: ethtool -L <NIC> combined #num_of_queues
More information please refer to KVM/QEMU MultiQueue.
"},{"location":"network/interfaces_and_networks/#sriov","title":"sriov","text":"In sriov
mode, virtual machines are directly exposed to an SR-IOV PCI device, usually allocated by Intel SR-IOV device plugin. The device is passed through into the guest operating system as a host device, using the vfio userspace interface, to maintain high networking performance.
To simplify procedure, please use SR-IOV network operator to deploy and configure SR-IOV components in your cluster. On how to use the operator, please refer to their respective documentation.
Note: KubeVirt relies on VFIO userspace driver to pass PCI devices into VMI guest. Because of that, when configuring SR-IOV operator policies, make sure you define a pool of VF resources that uses deviceType: vfio-pci
.
Once the operator is deployed, an SriovNetworkNodePolicy must be provisioned, in which the list of SR-IOV devices to expose (with respective configurations) is defined.
Please refer to the following SriovNetworkNodePolicy
for an example:
apiVersion: sriovnetwork.openshift.io/v1\nkind: SriovNetworkNodePolicy\nmetadata:\n name: policy-1\n namespace: sriov-network-operator\nspec:\n deviceType: vfio-pci\n mtu: 9000\n nicSelector:\n pfNames:\n - ens1f0\n nodeSelector:\n sriov: \"true\"\n numVfs: 8\n priority: 90\n resourceName: sriov-nic\n
The policy above will configure the SR-IOV
device plugin, allowing the PF named ens1f0
to be exposed in the SRIOV capable nodes as a resource named sriov-nic
.
Once all the SR-IOV components are deployed, it is needed to indicate how to configure the SR-IOV network. Refer to the following SriovNetwork
for an example:
apiVersion: sriovnetwork.openshift.io/v1\nkind: SriovNetwork\nmetadata:\n name: sriov-net\n namespace: sriov-network-operator\nspec:\n ipam: |\n {}\n networkNamespace: default\n resourceName: sriov-nic\n spoofChk: \"off\"\n
Finally, to create a VM that will attach to the aforementioned Network, refer to the following VMI spec:
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-perf\n name: vmi-perf\nspec:\n domain:\n cpu:\n sockets: 2\n cores: 1\n threads: 1\n dedicatedCpuPlacement: true\n resources:\n requests:\n memory: \"4Gi\"\n limits:\n memory: \"4Gi\"\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - masquerade: {}\n name: default\n - name: sriov-net\n sriov: {}\n rng: {}\n machine:\n type: \"\"\n networks:\n - name: default\n pod: {}\n - multus:\n networkName: default/sriov-net\n name: sriov-net\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: docker.io/kubevirt/fedora-cloud-container-disk-demo:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |\n #!/bin/bash\n echo \"centos\" |passwd centos --stdin\n dhclient eth1\n name: cloudinitdisk\n
Note: for some NICs (e.g. Mellanox), the kernel module needs to be installed in the guest VM.
Note: Placement on dedicated CPUs can only be achieved if the Kubernetes CPU manager is running on the SR-IOV capable workers. For further details please refer to the dedicated cpu resources documentation.
"},{"location":"network/interfaces_and_networks/#macvtap","title":"Macvtap","text":"Note: The core binding will be deprecated soon. As an alternative, the same functionality is introduced and available as a binding plugin.
In macvtap
mode, virtual machines are directly exposed to the Kubernetes nodes L2 network. This is achieved by 'extending' an existing network interface with a virtual device that has its own MAC address.
Macvtap interfaces are feature gated; to enable the feature, follow these instructions, in order to activate the Macvtap
feature gate (case sensitive).
Note: On KinD clusters, the user needs to adjust the cluster configuration, mounting dev
of the running host onto the KinD nodes, because of a known issue.
To simplify the procedure, please use the Cluster Network Addons Operator to deploy and configure the macvtap components in your cluster.
The aforementioned operator effectively deploys the macvtap-cni cni / device plugin combo.
There are two different alternatives to configure which host interfaces get exposed to the user, enabling them to create macvtap interfaces on top of:
Both options are configured via the macvtap-deviceplugin-config
ConfigMap, and more information on how to configure it can be found in the macvtap-cni repo.
You can find a minimal example, in which the eth0
interface of the Kubernetes nodes is exposed, via the lowerDevice
attribute.
kind: ConfigMap\napiVersion: v1\nmetadata:\n name: macvtap-deviceplugin-config\ndata:\n DP_MACVTAP_CONF: |\n [\n {\n \"name\" : \"dataplane\",\n \"lowerDevice\": \"eth0\",\n \"mode\" : \"bridge\",\n \"capacity\" : 50\n }\n ]\n
This step can be omitted, since the default configuration of the aforementioned ConfigMap
is to expose all host interfaces (which is represented by the following configuration):
kind: ConfigMap\napiVersion: v1\nmetadata:\n name: macvtap-deviceplugin-config\ndata:\n DP_MACVTAP_CONF: '[]'\n
"},{"location":"network/interfaces_and_networks/#start-a-vm-with-macvtap-interfaces","title":"Start a VM with macvtap interfaces","text":"Once the macvtap components are deployed, it is needed to indicate how to configure the macvtap network. Refer to the following NetworkAttachmentDefinition
for a simple example:
---\nkind: NetworkAttachmentDefinition\napiVersion: k8s.cni.cncf.io/v1\nmetadata:\n name: macvtapnetwork\n annotations:\n k8s.v1.cni.cncf.io/resourceName: macvtap.network.kubevirt.io/eth0\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"macvtapnetwork\",\n \"type\": \"macvtap\",\n \"mtu\": 1500\n }'\n
The requested k8s.v1.cni.cncf.io/resourceName
annotation must point to an exposed host interface (via the lowerDevice
attribute, on the macvtap-deviceplugin-config
ConfigMap
). Finally, to create a VM that will attach to the aforementioned Network, refer to the following VMI spec:
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-host-network\n name: vmi-host-network\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - macvtap: {}\n name: hostnetwork\n rng: {}\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n networks:\n - multus:\n networkName: macvtapnetwork\n name: hostnetwork\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: docker.io/kubevirt/fedora-cloud-container-disk-demo:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #!/bin/bash\n echo \"fedora\" |passwd fedora --stdin\n name: cloudinitdisk\n
The requested multus
networkName
- i.e. macvtapnetwork
- must match the name of the provisioned NetworkAttachmentDefinition
. Note: VMIs with macvtap interfaces can be migrated, but their MAC addresses must be statically set.
"},{"location":"network/interfaces_and_networks/#security","title":"Security","text":""},{"location":"network/interfaces_and_networks/#mac-spoof-check","title":"MAC spoof check","text":"MAC spoofing refers to the ability to generate traffic with an arbitrary source MAC address. An attacker may use this option to generate attacks on the network.
In order to protect against such scenarios, it is possible to enable the mac-spoof-check support in CNI plugins that support it.
The pod primary network which is served by the cluster network provider is not covered by this documentation. Please refer to the relevant provider to check how to enable spoofing check. The following text refers to the secondary networks, served using multus.
There are two known CNI plugins that support mac-spoof-check:
spoofchk
parameter .macspoofchk
parameter.The configuration is to be done on the NetworkAttachmentDefinition by the operator and any interface that refers to it, will have this feature enabled.
Below is an example of using the bridge
CNI with macspoofchk
enabled:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: br-spoof-check\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"br-spoof-check\",\n \"type\": \"bridge\",\n \"bridge\": \"br10\",\n \"disableContainerInterface\": true,\n \"macspoofchk\": true\n }'\n
On the VMI, the network section should point to this NetworkAttachmentDefinition by name:
networks:\n- name: default\n pod: {}\n- multus:\n networkName: br-spoof-check\n name: br10\n
"},{"location":"network/interfaces_and_networks/#limitations_1","title":"Limitations","text":"bridge
CNI supports mac-spoof-check through nftables, therefore the node must support nftables and have the nft
binary deployed.Service mesh allows to monitor, visualize and control traffic between pods. Kubevirt supports running VMs as a part of Istio service mesh.
"},{"location":"network/istio_service_mesh/#limitations","title":"Limitations","text":"Istio service mesh is only supported with a pod network masquerade or passt binding.
Istio uses a list of ports for its own purposes, these ports must not be explicitly specified in a VMI interface.
Istio only supports IPv4.
This guide assumes that Istio is already deployed and uses Istio CNI Plugin. See Istio documentation for more information.
Optionally, istioctl
binary for troubleshooting. See Istio installation inctructions.
The target namespace where the VM is created must be labelled with istio-injection=enabled
label.
If Multus is used to manage CNI, the following NetworkAttachmentDefinition
is required in the application namespace:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: istio-cni\n
The example below specifies a VMI with masquerade network interface and sidecar.istio.io/inject
annotation to register the VM to the service mesh.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n sidecar.istio.io/inject: \"true\"\n labels:\n app: vmi-istio\n name: vmi-istio\nspec:\n domain:\n devices:\n interfaces:\n - name: default\n masquerade: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n resources:\n requests:\n memory: 1024M\n networks:\n - name: default\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: registry:5000/kubevirt/fedora-cloud-container-disk-demo:devel\n
Istio expects each application to be associated with at least one Kubernetes service. Create the following Service exposing port 8080:
apiVersion: v1\nkind: Service\nmetadata:\n name: vmi-istio\nspec:\n selector:\n app: vmi-istio\n ports:\n - port: 8080\n name: http\n protocol: TCP\n
Note: Each Istio enabled VMI must feature the sidecar.istio.io/inject
annotation instructing KubeVirt to perform necessary network configuration.
Verify istio-proxy sidecar is deployed and able to synchronize with Istio control plane using istioctl proxy-status
command. See Istio Debbuging Envoy and Istiod documentation section for more information about proxy-status
subcommand.
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-istio-ncx7r 3/3 Running 0 7s\n\n$ kubectl get pods virt-launcher-vmi-istio-ncx7r -o jsonpath='{.spec.containers[*].name}'\ncompute volumecontainerdisk istio-proxy\n\n$ istioctl proxy-status\nNAME CDS LDS EDS RDS ISTIOD VERSION\n...\nvirt-launcher-vmi-istio-ncx7r.default SYNCED SYNCED SYNCED SYNCED istiod-7c4d8c7757-hshj5 1.10.0\n
"},{"location":"network/istio_service_mesh/#troubleshooting","title":"Troubleshooting","text":""},{"location":"network/istio_service_mesh/#istio-sidecar-is-not-deployed","title":"Istio sidecar is not deployed","text":"$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-istio-jnw6p 2/2 Running 0 37s\n\n$ kubectl get pods virt-launcher-vmi-istio-jnw6p -o jsonpath='{.spec.containers[*].name}'\ncompute volumecontainerdisk\n
Resolution: Make sure the istio-injection=enabled
is added to the target namespace. If the issue persists, consult relevant part of Istio documentation.
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-istio-lg5gp 2/3 Running 0 90s\n\n$ kubectl describe pod virt-launcher-vmi-istio-lg5gp\n ...\n Warning Unhealthy 2d8h (x3 over 2d8h) kubelet Readiness probe failed: Get \"http://10.244.186.222:15021/healthz/ready\": dial tcp 10.244.186.222:15021: connect: no route to host\n Warning Unhealthy 2d8h (x4 over 2d8h) kubelet Readiness probe failed: Get \"http://10.244.186.222:15021/healthz/ready\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\n
Resolution: Make sure the sidecar.istio.io/inject: \"true\"
annotation is defined in the created VMI and that masquerade or passt binding is used for pod network interface.
$ kubectl get pods\nNAME READY STATUS RESTARTS AGE\nvirt-launcher-vmi-istio-44mws 0/3 Init:0/3 0 29s\n\n$ kubectl describe pod virt-launcher-vmi-istio-44mws\n ...\n Multus: [default/virt-launcher-vmi-istio-44mws]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: getKubernetesDelegate: cannot find a network-attachment-definition (istio-cni) in namespace (default): network-attachment-definitions.k8s.cni.cncf.io \"istio-cni\" not found\n
Resolution: Make sure the istio-cni
NetworkAttachmentDefinition (provided in the Prerequisites section) is created in the target namespace.
[v1.1.0, Alpha feature]
A modular plugin which integrates with Kubevirt to implement a network binding.
"},{"location":"network/network_binding_plugins/#overview","title":"Overview","text":""},{"location":"network/network_binding_plugins/#network-connectivity","title":"Network Connectivity","text":"In order for a VM to have access to external network(s), several layers need to be defined and configured, depending on the connectivity characteristics needs.
These layers include:
This guide focuses on the Network Binding portion.
"},{"location":"network/network_binding_plugins/#network-binding","title":"Network Binding","text":"The network binding defines how the domain (VM) network interface is wired in the VM pod through the domain to the guest.
The network binding includes:
The network bindings have been part of Kubevirt core API and codebase. With the increase of the number of network bindings added and frequent requests to tweak and change the existing network bindings, a decision has been made to create a network binding plugin infrastructure.
The plugin infrastructure provides means to compose a network binding plugin and integrate it into Kubevirt in a modular manner.
Kubevirt is providing several network binding plugins as references. The following plugins are available:
A network binding plugin configuration consist of the following steps:
Deploy network binding optional components:
Binding CNI plugin.
Enable NetworkBindingPlugins
Feature Gate (FG).
Register network binding.
Depending on the plugin, some components need to be deployed in the cluster. Not all network binding plugins require all these components, therefore these steps are optional.
This binary needs to be deployed on each node of the cluster, like any other CNI plugin.
The binary can be built from source or consumed from an existing artifact.
Note: The location of the CNI plugins binaries depends on the platform used and its configuration. A frequently used path for such binaries is /opt/cni/bin/
.
Example:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: netbindingpasst\nspec:\n config: '{\n \"cniVersion\": \"1.0.0\",\n \"name\": \"netbindingpasst\",\n \"plugins\": [\n {\n \"type\": \"cni-passt-binding-plugin\"\n }\n ]\n }'\n
Note: It is possible to deploy the NetworkAttachmentDefinition on the default
namespace, where all other namespaces can access it. Nevertheless, it is recommended (for security reasons) to define the NetworkAttachmentDefinition in the same namespace the VM resides.
Multus: In order for the network binding CNI and the NetworkAttachmentDefinition to operate, there is a need to have Multus deployed on the cluster. For more information, check the Quickstart Intallation Guide.
Sidecar image: When a core domain-attachment is not a fit, a sidecar is used to configure the vNIC domain configuration. In a more complex scenarios, the sidecar also runs services like DHCP to deliver IP information to the guest.
The sidecar image is built and usually pushed to an image registry for consumption. Therefore, the cluster needs to have access to the image.
The image can be built from source and pushed to an accessible registry or used from a given registry that already contains it.
NetworkBindingPlugins
.It is therefore necessary to set the FG in the Kubevirt CR.
Example (valid when the FG subtree is already defined):
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/developerConfiguration/featureGates/-\", \"value\": \"NetworkBindingPlugins\"}]'\n
"},{"location":"network/network_binding_plugins/#register","title":"Register","text":"In order to use a network binding plugin, the cluster admin needs to register the binding. Registration includes the addition of the binding name with all its parameters to the Kubevirt CR.
The following (optional) parameters are currently supported:
From: v1.1.0
Use the format to specify the NetworkAttachementDefinition that defines the CNI plugin and the configuration the binding plugin uses. Used when the binding plugin needs to change the pod network namespace."},{"location":"network/network_binding_plugins/#sidecarimage","title":"sidecarImage","text":"
From: v1.1.0
Specify a container image in a registry. Used when the binding plugin needs to modify the domain vNIC configuration or when a service needs to be executed (e.g. DHCP server).
"},{"location":"network/network_binding_plugins/#domainattachmenttype","title":"domainAttachmentType","text":"From: v1.1.1
The Domain Attachment type is a pre-defined core kubevirt method to attach an interface to the domain.
Specify the name of a core domain attachment type. A possible alternative to a sidecar, to configure the domain vNIC.
Supported types:
tap
(from v1.1.1): The domain configuration is set to use an existing tap device. It also supports existing macvtap
devices.When both the domainAttachmentType
and sidecarImage
are specified, the domain will first be configured according to the domainAttachmentType
and then the sidecarImage
may modify it.
From: v1.2.0
Specify whether the network binding plugin supports migration. It is possible to specify a migration method. Supported migration method types: - link-refresh
(from v1.2.0): after migration, the guest nic will be deactivated and then activated again. It can be useful to renew the DHCP lease.
Note: In some deployments the Kubevirt CR is controlled by an external controller (e.g. HCO). In such cases, make sure to configure the wrapper operator/controller so the changes will get preserved.
Example (the passt
binding):
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/network\", \"value\": {\n \"binding\": {\n \"passt\": {\n \"networkAttachmentDefinition\": \"default/netbindingpasst\",\n \"sidecarImage\": \"quay.io/kubevirt/network-passt-binding:20231205_29a16d5c9\"\n \"migration\": {\n \"method\": \"link-refresh\"\n }\n }\n }\n }}]'\n
"},{"location":"network/network_binding_plugins/#vm-network-interface","title":"VM Network Interface","text":"When configuring the VM/VMI network interface, the binding plugin name can be specified. If it exists in the Kubevirt CR, it will be used to setup the network interface.
Example (passt
binding):
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-net-binding-passt\n name: vm-net-binding-passt\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-net-binding-passt\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: passtnet\n binding:\n name: passt\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: passtnet\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.1.0\n name: containerdisk\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n name: cloudinitdisk\n
"},{"location":"network/networkpolicy/","title":"NetworkPolicy","text":"Before creating NetworkPolicy objects, make sure you are using a networking solution which supports NetworkPolicy. Network isolation is controlled entirely by NetworkPolicy objects. By default, all vmis in a namespace are accessible from other vmis and network endpoints. To isolate one or more vmis in a project, you can create NetworkPolicy objects in that namespace to indicate the allowed incoming connections.
Note: vmis and pods are treated equally by network policies, since labels are passed through to the pods which contain the running vmi. With other words, labels on vmis can be matched by spec.podSelector
on the policy.
To make a project \"deny by default\" add a NetworkPolicy object that matches all vmis but accepts no traffic.
kind: NetworkPolicy\napiVersion: networking.k8s.io/v1\nmetadata:\n name: deny-by-default\nspec:\n podSelector: {}\n ingress: []\n
"},{"location":"network/networkpolicy/#create-networkpolicy-to-only-accept-connections-from-vmis-within-namespaces","title":"Create NetworkPolicy to only Accept connections from vmis within namespaces","text":"To make vmis accept connections from other vmis in the same namespace, but reject all other connections from vmis in other namespaces:
kind: NetworkPolicy\napiVersion: networking.k8s.io/v1\nmetadata:\n name: allow-same-namespace\nspec:\n podSelector: {}\n ingress:\n - from:\n - podSelector: {}\n
"},{"location":"network/networkpolicy/#create-networkpolicy-to-only-allow-http-and-https-traffic","title":"Create NetworkPolicy to only allow HTTP and HTTPS traffic","text":"To enable only HTTP and HTTPS access to the vmis, add a NetworkPolicy object similar to:
kind: NetworkPolicy\napiVersion: networking.k8s.io/v1\nmetadata:\n name: allow-http-https\nspec:\n podSelector: {}\n ingress:\n - ports:\n - protocol: TCP\n port: 8080\n - protocol: TCP\n port: 8443\n
"},{"location":"network/networkpolicy/#create-networkpolicy-to-deny-traffic-by-labels","title":"Create NetworkPolicy to deny traffic by labels","text":"To make one specific vmi with a label type: test
to reject all traffic from other vmis, create:
kind: NetworkPolicy\napiVersion: networking.k8s.io/v1\nmetadata:\n name: deny-by-label\nspec:\n podSelector:\n matchLabels:\n type: test\n ingress: []\n
Kubernetes NetworkPolicy Documentation can be found here: Kubernetes NetworkPolicy
"},{"location":"network/service_objects/","title":"Service objects","text":"Once the VirtualMachineInstance is started, in order to connect to a VirtualMachineInstance, you can create a Service
object for a VirtualMachineInstance. Currently, three types of service are supported: ClusterIP
, NodePort
and LoadBalancer
. The default type is ClusterIP
.
Note: Labels on a VirtualMachineInstance are passed through to the pod, so simply add your labels for service creation to the VirtualMachineInstance. From there on it works like exposing any other k8s resource, by referencing these labels in a service.
"},{"location":"network/service_objects/#expose-virtualmachineinstance-as-a-clusterip-service","title":"Expose VirtualMachineInstance as a ClusterIP Service","text":"Give a VirtualMachineInstance with the label special: key
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: vmi-ephemeral\n labels:\n special: key\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n resources:\n requests:\n memory: 64M\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-registry-disk-demo:latest\n
we can expose its SSH port (22) by creating a ClusterIP
service:
apiVersion: v1\nkind: Service\nmetadata:\n name: vmiservice\nspec:\n ports:\n - port: 27017\n protocol: TCP\n targetPort: 22\n selector:\n special: key\n type: ClusterIP\n
You just need to create this ClusterIP
service by using kubectl
:
$ kubectl create -f vmiservice.yaml\n
Alternatively, the VirtualMachineInstance could be exposed using the virtctl
command:
$ virtctl expose virtualmachineinstance vmi-ephemeral --name vmiservice --port 27017 --target-port 22\n
Notes: * If --target-port
is not set, it will be take the same value as --port
* The cluster IP is usually allocated automatically, but it may also be forced into a value using the --cluster-ip
flag (assuming value is in the valid range and not taken)
Query the service object:
$ kubectl get service\nNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\nvmiservice ClusterIP 172.30.3.149 <none> 27017/TCP 2m\n
You can connect to the VirtualMachineInstance by service IP and service port inside the cluster network:
$ ssh cirros@172.30.3.149 -p 27017\n
"},{"location":"network/service_objects/#expose-virtualmachineinstance-as-a-nodeport-service","title":"Expose VirtualMachineInstance as a NodePort Service","text":"Expose the SSH port (22) of a VirtualMachineInstance running on KubeVirt by creating a NodePort
service:
apiVersion: v1\nkind: Service\nmetadata:\n name: nodeport\nspec:\n externalTrafficPolicy: Cluster\n ports:\n - name: nodeport\n nodePort: 30000\n port: 27017\n protocol: TCP\n targetPort: 22\n selector:\n special: key\n type: NodePort\n
You just need to create this NodePort
service by using kubectl
:
$ kubectl -f nodeport.yaml\n
Alternatively, the VirtualMachineInstance could be exposed using the virtctl
command:
$ virtctl expose virtualmachineinstance vmi-ephemeral --name nodeport --type NodePort --port 27017 --target-port 22 --node-port 30000\n
Notes: * If --node-port
is not set, its value will be allocated dynamically (in the range above 30000) * If the --node-port
value is set, it must be unique across all services
The service can be listed by querying for the service objects:
$ kubectl get service\nNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\nnodeport NodePort 172.30.232.73 <none> 27017:30000/TCP 5m\n
Connect to the VirtualMachineInstance by using a node IP and node port outside the cluster network:
$ ssh cirros@$NODE_IP -p 30000\n
"},{"location":"network/service_objects/#expose-virtualmachineinstance-as-a-loadbalancer-service","title":"Expose VirtualMachineInstance as a LoadBalancer Service","text":"Expose the RDP port (3389) of a VirtualMachineInstance running on KubeVirt by creating LoadBalancer
service. Here is an example:
apiVersion: v1\nkind: Service\nmetadata:\n name: lbsvc\nspec:\n externalTrafficPolicy: Cluster\n ports:\n - port: 27017\n protocol: TCP\n targetPort: 3389\n selector:\n special: key\n type: LoadBalancer\n
You could create this LoadBalancer
service by using kubectl
:
$ kubectl -f lbsvc.yaml\n
Alternatively, the VirtualMachineInstance could be exposed using the virtctl
command:
$ virtctl expose virtualmachineinstance vmi-ephemeral --name lbsvc --type LoadBalancer --port 27017 --target-port 3389\n
Note that the external IP of the service could be forced to a value using the --external-ip
flag (no validation is performed on this value).
The service can be listed by querying for the service objects:
$ kubectl get svc\nNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE\nlbsvc LoadBalancer 172.30.27.5 172.29.10.235,172.29.10.235 27017:31829/TCP 5s\n
Use vinagre
client to connect your VirtualMachineInstance by using the public IP and port.
Note that here the external port here (31829) was dynamically allocated.
"},{"location":"network/net_binding_plugins/macvtap/","title":"Macvtap binding","text":""},{"location":"network/net_binding_plugins/macvtap/#overview","title":"Overview","text":"With the macvtap
binding plugin, virtual machines are directly exposed to the Kubernetes nodes L2 network. This is achieved by 'extending' an existing network interface with a virtual device that has its own MAC address.
Its main benefits are:
Warning: On KinD clusters, the user needs to adjust the cluster configuration, mounting dev
of the running host onto the KinD nodes, because of a known issue.
The macvtap
solution consists of a CNI and a DP.
In order to use macvtap
, the following points need to be covered:
To simplify the procedure, use the Cluster Network Addons Operator to deploy and configure the macvtap components in your cluster.
The aforementioned operator effectively deploys the macvtap cni and device plugin.
"},{"location":"network/net_binding_plugins/macvtap/#expose-node-interface-to-the-macvtap-device-plugin","title":"Expose node interface to the macvtap device plugin","text":"There are two different alternatives to configure which host interfaces get exposed to the user, enabling them to create macvtap interfaces on top of:
Both options are configured via the macvtap-deviceplugin-config
ConfigMap, and more information on how to configure it can be found in the macvtap-cni repo.
This is a minimal example, in which the eth0
interface of the Kubernetes nodes is exposed, via the lowerDevice
attribute.
kind: ConfigMap\napiVersion: v1\nmetadata:\n name: macvtap-deviceplugin-config\ndata:\n DP_MACVTAP_CONF: |\n [\n {\n \"name\" : \"dataplane\",\n \"lowerDevice\" : \"eth0\",\n \"mode\" : \"bridge\",\n \"capacity\" : 50\n },\n ]\n
This step can be omitted, since the default configuration of the aforementioned ConfigMap
is to expose all host interfaces (which is represented by the following configuration):
kind: ConfigMap\napiVersion: v1\nmetadata:\n name: macvtap-deviceplugin-config\ndata:\n DP_MACVTAP_CONF: '[]'\n
"},{"location":"network/net_binding_plugins/macvtap/#macvtap-networkattachmentdefinition","title":"Macvtap NetworkAttachmentDefinition","text":"The configuration needed for a macvtap network attachment can be minimalistic:
kind: NetworkAttachmentDefinition\napiVersion: k8s.cni.cncf.io/v1\nmetadata:\n name: macvtapnetwork\n annotations:\n k8s.v1.cni.cncf.io/resourceName: macvtap.network.kubevirt.io/eth0\nspec:\n config: '{\n \"cniVersion\": \"0.3.1\",\n \"name\": \"macvtapnetwork\",\n \"type\": \"macvtap\",\n \"mtu\": 1500\n }'\n
The object should be created in a \"default\" namespace where all other namespaces can access, or, in the same namespace the VMs reside in.
The requested k8s.v1.cni.cncf.io/resourceName
annotation must point to an exposed host interface (via the lowerDevice
attribute, on the macvtap-deviceplugin-config
ConfigMap
).
[v1.1.1]
The binding plugin replaces the experimental core macvtap binding implementation (including its API).
Note: The network binding plugin infrastructure and the macvtap plugin specifically are in Alpha stage. Please use them with care, preferably on a non-production deployment.
The macvtap binding plugin consists of the following components:
The plugin needs to:
And in detail:
"},{"location":"network/net_binding_plugins/macvtap/#feature-gate","title":"Feature Gate","text":"If not already set, add the NetworkBindingPlugins
FG.
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\n \"op\": \"add\",\n \"path\": \"/spec/configuration/developerConfiguration/featureGates/-\",\n \"value\": \"NetworkBindingPlugins\"\n}]'\n
Note: The specific macvtap plugin has no FG by its own. It is up to the cluster admin to decide if the plugin is to be available in the cluster. The macvtap binding is still in evaluation, use it with care.
"},{"location":"network/net_binding_plugins/macvtap/#macvtap-registration","title":"Macvtap Registration","text":"The macvtap binding plugin configuration needs to be added to the kubevirt CR in order to be used by VMs.
To register the macvtap binding, patch the kubevirt CR as follows:
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/network\", \"value\": {\n \"binding\": {\n \"macvtap\": {\n \"domainAttachmentType\": \"tap\"\n }\n }\n }}]'\n
"},{"location":"network/net_binding_plugins/macvtap/#vm-macvtap-network-interface","title":"VM Macvtap Network Interface","text":"Set the VM network interface binding name to reference the one defined in the kubevirt CR.
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-net-binding-macvtap\n name: vm-net-binding-macvtap\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-net-binding-macvtap\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: podnet\n masquerade: {}\n - name: hostnetwork\n binding:\n name: macvtap\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: podnet\n pod: {}\n - name: hostnetwork\n multus:\n networkName: macvtapnetwork\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.1.0\n name: containerdisk\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n name: cloudinitdisk\n
The multus networkName
value should correspond with the name used in the network attachment definition section.
The binding
value should correspond with the name used in the registration.
Plug A Simple Socket Transport is an enhanced alternative to SLIRP, providing user-space network connectivity.
passt
is a universal tool which implements a translation layer between a Layer-2 network interface and native Layer -4 sockets (TCP, UDP, ICMP/ICMPv6 echo) on a host.
Its main benefits are:
sysctl -w net.core.rmem_max = 33554432\nsysctl -w net.core.wmem_max = 33554432\n
fs.file-max
should be increased (for a VM forwards all IPv4 and IPv6 ports, for TCP and UDP, passt needs to create ~2^18 sockets): sysctl -w fs.file-max = 9223372036854775807\n
NOTE: To achieve optimal memory consumption with Passt binding, specify ports required for your workload. When no ports are explicitly specified, all ports are forwarded, leading to memory overhead of up to 800 Mi.
"},{"location":"network/net_binding_plugins/passt/#passt-network-binding-plugin","title":"Passt network binding plugin","text":"[v1.1.0]
The binding plugin replaces the experimental core passt binding implementation (including its API).
Note: The network binding plugin infrastructure and the passt plugin specifically are in Alpha stage. Please use them with care, preferably on a non-production deployment.
The passt binding plugin consists of the following components:
As described in the definition & flow section, the passt plugin needs to:
And in detail:
"},{"location":"network/net_binding_plugins/passt/#passt-cni-deployment-on-nodes","title":"Passt CNI deployment on nodes","text":"The CNI plugin binary can be retrieved directly from the kubevirt release assets (on GitHub) or to be built from its sources.
Note: The kubevirt project uses Bazel to build the binaries and container images. For more information in how to build the whole project, visit the developer getting started guide.
Once the binary is ready, you may rename it to a meaningful name (e.g. kubevirt-passt-binding
). This name is used in the NetworkAttachmentDefinition configuration.
Copy the binary to each node in your cluster. The location of the CNI plugins may vary between platforms and versions. One common path is /opt/cni/bin/
.
The configuration needed for passt is minimalistic:
apiVersion: \"k8s.cni.cncf.io/v1\"\nkind: NetworkAttachmentDefinition\nmetadata:\n name: netbindingpasst\nspec:\n config: '{\n \"cniVersion\": \"1.0.0\",\n \"name\": \"netbindingpasst\",\n \"plugins\": [\n {\n \"type\": \"kubevirt-passt-binding\"\n }\n ]\n }'\n
The object should be created in a \"default\" namespace where all other namespaces can access, or, in the same namespace the VMs reside in.
"},{"location":"network/net_binding_plugins/passt/#passt-sidecar-image","title":"Passt sidecar image","text":"Passt sidecar image is built and pushed to kubevirt quay repository.
The sidecar sources can be found here.
The relevant sidecar image needs to be accessible by the cluster and specified in the Kubevirt CR when registering the network binding plugin.
"},{"location":"network/net_binding_plugins/passt/#feature-gate","title":"Feature Gate","text":"If not already set, add the NetworkBindingPlugins
FG.
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/developerConfiguration/featureGates/-\", \"value\": \"NetworkBindingPlugins\"}]'\n
Note: The specific passt plugin has no FG by its own. It is up to the cluster admin to decide if the plugin is to be available in the cluster. The passt binding is still in evaluation, use it with care.
"},{"location":"network/net_binding_plugins/passt/#passt-registration","title":"Passt Registration","text":"As described in the registration section, passt binding plugin configuration needs to be added to the kubevirt CR.
To register the passt binding, patch the kubevirt CR as follows:
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/network\", \"value\": {\n \"binding\": {\n \"passt\": {\n \"networkAttachmentDefinition\": \"default/netbindingpasst\",\n \"sidecarImage\": \"quay.io/kubevirt/network-passt-binding:20231205_29a16d5c9\",\n \"migration\": {\n \"method\": \"link-refresh\"\n }\n }\n }\n }}]'\n
The NetworkAttachmentDefinition and sidecarImage values should correspond with the names used in the previous sections, here and here.
"},{"location":"network/net_binding_plugins/passt/#vm-passt-network-interface","title":"VM Passt Network Interface","text":"Set the VM network interface binding name to reference the one defined in the kubevirt CR.
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-net-binding-passt\n name: vm-net-binding-passt\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-net-binding-passt\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: passtnet\n binding:\n name: passt\n ports:\n - name: http\n port: 80\n protocol: TCP\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: passtnet\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.1.0\n name: containerdisk\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n name: cloudinitdisk\n
"},{"location":"network/net_binding_plugins/slirp/","title":"Slirp","text":""},{"location":"network/net_binding_plugins/slirp/#overview","title":"Overview","text":"SLIRP provides user-space network connectivity.
Note: in slirp
mode, the only supported protocols are TCP and UDP. ICMP is not supported.
[v1.1.0]
The binding plugin replaces the core slirp
binding API.
Note: The network binding plugin infrastructure is in Alpha stage. Please use them with care.
The slirp binding plugin consists of the following components:
As described in the definition & flow section, the slirp plugin needs to:
Note: In order for the core slirp binding to use the network binding plugin the registered name for this binding should be slirp
.
If not already set, add the NetworkBindingPlugins
FG.
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/developerConfiguration/featureGates/-\", \"value\": \"NetworkBindingPlugins\"}]'\n
Note: The specific slirp plugin has no FG by its own. It is up to the cluster admin to decide if the plugin is to be available in the cluster.
"},{"location":"network/net_binding_plugins/slirp/#slirp-registration","title":"Slirp Registration","text":"As described in the registration section, slirp binding plugin configuration needs to be added to the kubevirt CR.
To register the slirp binding, patch the kubevirt CR as follows:
kubectl patch kubevirts -n kubevirt kubevirt --type=json -p='[{\"op\": \"add\", \"path\": \"/spec/configuration/network\", \"value\": {\n \"binding\": {\n \"slirp\": {\n \"sidecarImage\": \"quay.io/kubevirt/network-slirp-binding:v1.1.0\"\n }\n }\n }}]'\n
"},{"location":"network/net_binding_plugins/slirp/#vm-slirp-network-interface","title":"VM Slirp Network Interface","text":"Set the VM network interface binding name to reference the one defined in the kubevirt CR.
---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-net-binding-slirp\n name: vm-net-binding-passt\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-net-binding-slirp\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - name: slirpnet\n binding:\n name: slirp\n rng: {}\n resources:\n requests:\n memory: 1024M\n networks:\n - name: slirpnet\n pod: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.1.0\n name: containerdisk\n - cloudInitNoCloud:\n networkData: |\n version: 2\n ethernets:\n eth0:\n dhcp4: true\n name: cloudinitdisk\n
"},{"location":"storage/clone_api/","title":"Clone API","text":"The clone.kubevirt.io
API Group defines resources for cloning KubeVirt objects. Currently, the only supported cloning type is VirtualMachine
, but more types are planned to be supported in the future (see future roadmap below).
Please bear in mind that the clone API is in version v1alpha1
. This means that this API is not fully stable yet and that APIs may change in the future.
Under the hood, the clone API relies upon Snapshot & Restore APIs. Therefore, in order to be able to use the clone API, please see Snapshot & Restore prerequesites.
"},{"location":"storage/clone_api/#snapshot-feature-gate","title":"Snapshot Feature Gate","text":"Currently, clone API is guarded by Snapshot feature gate. The feature gates field in the KubeVirt CR must be expanded by adding the Snapshot
to it.
Firstly, as written above, the clone API relies upon Snapshot & Restore APIs under the hood. Therefore, it might be helpful to look at Snapshot & Restore user-guide page for more info.
"},{"location":"storage/clone_api/#virtualmachineclone-object-overview","title":"VirtualMachineClone object overview","text":"In order to initiate cloning, a VirtualMachineClone
object (CRD) needs to be created on the cluster. An example for such an object is:
kind: VirtualMachineClone\napiVersion: \"clone.kubevirt.io/v1alpha1\"\nmetadata:\n name: testclone\n\nspec:\n # source & target definitions\n source:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: vm-cirros\n target:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: vm-clone-target\n\n # labels & annotations definitions\n labelFilters:\n - \"*\"\n - \"!someKey/*\"\n annotationFilters:\n - \"anotherKey/*\"\n\n # template labels & annotations definitions\n template:\n labelFilters:\n - \"*\"\n - \"!someKey/*\"\n annotationFilters:\n - \"anotherKey/*\"\n\n # other identity stripping specs:\n newMacAddresses:\n interfaceName: \"00-11-22\"\n newSMBiosSerial: \"new-serial\"\n
In the next section I will go through the different settings to elaborate them.
"},{"location":"storage/clone_api/#source-target","title":"Source & Target","text":"The source and target indicate the source/target API group, kind and name. A few important notes:
Currently, the only supported kinds are VirtualMachine
(of kubevirt.io
api group) and VirtualMachineSnapshot
( of snapshot.kubevirt.io
api group), but more types are expected to be supported in the future. See \"future roadmap\" below for more info.
The target name is optional. If unspecified, the clone controller will generate a name for the target automatically.
The target and source must reside in the same namespace.
These spec fields are intended to determine which labels / annotations are being copied to the target or stripped away.
The filters are a list of strings. Each string represents a key that may exist at the source. Every source key that matches to one of these values is being copied to the cloned target. In addition, special regular-expression-like characters can be used:
Setting label / annotation filters is optional. If unset, all labels / annotations will be copied as a default.
"},{"location":"storage/clone_api/#template-label-template-annotation-filters","title":"Template Label & Template Annotation filters","text":"Some network CNIs such as Kube-OVN or OVN-Kubernetes inject network information into the annotations of a VM. When cloning a VM from a target VM the cloned VM will use the same network. To avoid this you can use template labels and annotation filters.
"},{"location":"storage/clone_api/#newmacaddresses","title":"newMacAddresses","text":"This field is used to explicitly replace MAC addresses for certain interfaces. The field is a string to string map; the keys represent interface names and the values represent the new MAC address for the clone target.
This field is optional. By default, all mac addresses are stripped out. This suits situations when kube-mac-pool is deployed in the cluster which would automatically assign the target with a fresh valid MAC address.
"},{"location":"storage/clone_api/#newsmbiosserial","title":"newSMBiosSerial","text":"This field is used to explicitly set an SMBios serial for the target.
This field is optional. By default, the target would have an auto-generated serial that's based on the VM name.
"},{"location":"storage/clone_api/#creating-a-virtualmachineclone-object","title":"Creating a VirtualMachineClone object","text":"After the clone manifest is ready, we can create it:
kubectl create -f clone.yaml\n
To wait for a clone to complete, execute:
kubectl wait vmclone testclone --for condition=Ready\n
You can check the clone's phase in the clone's status. It can be one of the following:
SnapshotInProgress
CreatingTargetVM
RestoreInProgress
Succeeded
Failed
Unknown
After the clone is finished, the target can be inspected:
kubectl get vm vm-clone-target -o yaml\n
"},{"location":"storage/clone_api/#future-roadmap","title":"Future roadmap","text":"The clone API is in an early alpha version and may change dramatically. There are many improvements and features that are expected to be added, the most significant goals are:
VirtualMachineInstace
in the future.One of the great things that could be accomplished with the clone API when the source is of kind VirtualMachineSnapshot
is to create \"golden VM images\" (a.k.a. Templates / Bookmark VMs / etc). In other words, the following workflow would be available:
Create a golden image
Create a VM
Prepare a \"golden VM\" environment
This can mean different things in different contexts. For example, write files, install applications, apply configurations, or anything else.
Snapshot the VM
Delete the VM
Then, this \"golden image\" can be duplicated as many times as needed. To instantiate a VM from the snapshot:
This feature is still under discussions and may be implemented differently then explained here.
"},{"location":"storage/containerized_data_importer/","title":"Containerized Data Importer","text":"The Containerized Data Importer (CDI) project provides facilities for enabling Persistent Volume Claims (PVCs) to be used as disks for KubeVirt VMs by way of DataVolumes. The three main CDI use cases are:
This document deals with the third use case. So you should have CDI installed in your cluster, a VM disk that you'd like to upload, and virtctl in your path.
"},{"location":"storage/containerized_data_importer/#install-cdi","title":"Install CDI","text":"Install the latest CDI release here
export TAG=$(curl -s -w %{redirect_url} https://github.com/kubevirt/containerized-data-importer/releases/latest)\nexport VERSION=$(echo ${TAG##*/})\nkubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-operator.yaml\nkubectl create -f https://github.com/kubevirt/containerized-data-importer/releases/download/$VERSION/cdi-cr.yaml\n
"},{"location":"storage/containerized_data_importer/#expose-cdi-uploadproxy-service","title":"Expose cdi-uploadproxy service","text":"The cdi-uploadproxy
service must be accessible from outside the cluster. Here are some ways to do that:
NodePort Service
Ingress
Route
kubectl port-forward (not recommended for production clusters)
Look here for example manifests.
"},{"location":"storage/containerized_data_importer/#supported-image-formats","title":"Supported image formats","text":"CDI supports the raw
and qcow2
image formats which are supported by qemu. See the qemu documentation for more details. Bootable ISO images can also be used and are treated like raw
images. Images may be compressed with either the gz
or xz
format.
The example in this document uses this CirrOS image
"},{"location":"storage/containerized_data_importer/#virtctl-image-upload","title":"virtctl image-upload","text":"virtctl has an image-upload command with the following options:
virtctl image-upload --help\nUpload a VM image to a DataVolume/PersistentVolumeClaim.\n\nUsage:\n virtctl image-upload [flags]\n\nExamples:\n # Upload a local disk image to a newly created DataVolume:\n virtctl image-upload dv dv-name --size=10Gi --image-path=/images/fedora30.qcow2\n\n # Upload a local disk image to an existing DataVolume\n virtctl image-upload dv dv-name --no-create --image-path=/images/fedora30.qcow2\n\n # Upload a local disk image to an existing PersistentVolumeClaim\n virtctl image-upload pvc pvc-name --image-path=/images/fedora30.qcow2\n\n # Upload to a DataVolume with explicit URL to CDI Upload Proxy\n virtctl image-upload dv dv-name --uploadproxy-url=https://cdi-uploadproxy.mycluster.com --image-path=/images/fedora30.qcow2\n\nFlags:\n --access-mode string The access mode for the PVC. (default \"ReadWriteOnce\")\n --block-volume Create a PVC with VolumeMode=Block (default Filesystem).\n -h, --help help for image-upload\n --image-path string Path to the local VM image.\n --insecure Allow insecure server connections when using HTTPS.\n --no-create Don't attempt to create a new DataVolume/PVC.\n --pvc-name string DEPRECATED - The destination DataVolume/PVC name.\n --pvc-size string DEPRECATED - The size of the PVC to create (ex. 10Gi, 500Mi).\n --size string The size of the DataVolume to create (ex. 10Gi, 500Mi).\n --storage-class string The storage class for the PVC.\n --uploadproxy-url string The URL of the cdi-upload proxy service.\n --wait-secs uint Seconds to wait for upload pod to start. (default 60)\n\nUse \"virtctl options\" for a list of global command-line options (applies to all commands).\n
virtctl image-upload
works by creating a DataVolume of the requested size, sending an UploadTokenRequest
to the cdi-apiserver
, and uploading the file to the cdi-uploadproxy
.
virtctl image-upload dv cirros-vm-disk --size=500Mi --image-path=/home/mhenriks/images/cirros-0.4.0-x86_64-disk.img --uploadproxy-url=<url to upload proxy service>\n
"},{"location":"storage/containerized_data_importer/#addressing-certificate-issues-when-uploading-images","title":"Addressing Certificate Issues when Uploading Images","text":"Issues with the certificates can be circumvented by using the --insecure
flag to prevent the virtctl command from verifying the remote host. It is better to resolve certificate issues that prevent uploading images using the virtctl image-upload
command and not use the --insecure
flag.
The following are some common issues with certificates and some easy ways to fix them.
"},{"location":"storage/containerized_data_importer/#does-not-contain-any-ip-sans","title":"Does not contain any IP SANs","text":"This issue happens when trying to upload images using an IP address instead of a resolvable name. For example, trying to upload to the IP address 192.168.39.32 at port 31001 would produce the following error.
virtctl image-upload dv f33 \\\n --size 5Gi \\\n --image-path Fedora-Cloud-Base-33-1.2.x86_64.raw.xz \\\n --uploadproxy-url https://192.168.39.32:31001\n\nPVC default/f33 not found \nDataVolume default/f33 created\nWaiting for PVC f33 upload pod to be ready...\nPod now ready\nUploading data to https://192.168.39.32:31001\n\n 0 B / 193.89 MiB [-------------------------------------------------------] 0.00% 0s\n\nPost https://192.168.39.32:31001/v1beta1/upload: x509: cannot validate certificate for 192.168.39.32 because it doesn't contain any IP SANs\n
It is easily fixed by adding an entry it your local name resolution service. This could be a DNS server or the local hosts file. The URL used to upload the proxy should be changed to reflect the resolvable name.
The Subject
and the Subject Alternative Name
in the certificate contain valid names that can be used for resolution. Only one of these names needs to be resolvable. Use the openssl
command to view the names of the cdi-uploadproxy service.
echo | openssl s_client -showcerts -connect 192.168.39.32:31001 2>/dev/null \\\n | openssl x509 -inform pem -noout -text \\\n | sed -n -e '/Subject.*CN/p' -e '/Subject Alternative/{N;p}'\n\n Subject: CN = cdi-uploadproxy\n X509v3 Subject Alternative Name: \n DNS:cdi-uploadproxy, DNS:cdi-uploadproxy.cdi, DNS:cdi-uploadproxy.cdi.svc\n
Adding the following entry to the /etc/hosts file, if it provides name resolution, should fix this issue. Any service that provides name resolution for the system could be used.
echo \"192.168.39.32 cdi-uploadproxy\" >> /etc/hosts\n
The upload should now work.
virtctl image-upload dv f33 \\\n --size 5Gi \\\n --image-path Fedora-Cloud-Base-33-1.2.x86_64.raw.xz \\\n --uploadproxy-url https://cdi-uploadproxy:31001\n\nPVC default/f33 not found \nDataVolume default/f33 created\nWaiting for PVC f33 upload pod to be ready...\nPod now ready\nUploading data to https://cdi-uploadproxy:31001\n\n 193.89 MiB / 193.89 MiB [=============================================] 100.00% 1m38s\n\nUploading data completed successfully, waiting for processing to complete, you can hit ctrl-c without interrupting the progress\nProcessing completed successfully\nUploading Fedora-Cloud-Base-33-1.2.x86_64.raw.xz completed successfully\n
"},{"location":"storage/containerized_data_importer/#certificate-signed-by-unknown-authority","title":"Certificate Signed by Unknown Authority","text":"This happens because the cdi-uploadproxy certificate is self signed and the system does not trust the cdi-uploadproxy as a Certificate Authority.
virtctl image-upload dv f33 \\\n --size 5Gi \\\n --image-path Fedora-Cloud-Base-33-1.2.x86_64.raw.xz \\\n --uploadproxy-url https://cdi-uploadproxy:31001\n\nPVC default/f33 not found \nDataVolume default/f33 created\nWaiting for PVC f33 upload pod to be ready...\nPod now ready\nUploading data to https://cdi-uploadproxy:31001\n\n 0 B / 193.89 MiB [-------------------------------------------------------] 0.00% 0s\n\nPost https://cdi-uploadproxy:31001/v1beta1/upload: x509: certificate signed by unknown authority\n
This can be fixed by adding the certificate to the systems trust store. Download the cdi-uploadproxy-server-cert.
kubectl get secret -n cdi cdi-uploadproxy-server-cert \\\n -o jsonpath=\"{.data['tls\\.crt']}\" \\\n | base64 -d > cdi-uploadproxy-server-cert.crt\n
Add this certificate to the systems trust store. On Fedora, this can be done as follows.
sudo cp cdi-uploadproxy-server-cert.crt /etc/pki/ca-trust/source/anchors\n\nsudo update-ca-trust\n
The upload should now work.
virtctl image-upload dv f33 \\\n --size 5Gi \\\n --image-path Fedora-Cloud-Base-33-1.2.x86_64.raw.xz \\\n --uploadproxy-url https://cdi-uploadproxy:31001\n\nPVC default/f33 not found \nDataVolume default/f33 created\nWaiting for PVC f33 upload pod to be ready...\nPod now ready\nUploading data to https://cdi-uploadproxy:31001\n\n 193.89 MiB / 193.89 MiB [=============================================] 100.00% 1m36s\n\nUploading data completed successfully, waiting for processing to complete, you can hit ctrl-c without interrupting the progress\nProcessing completed successfully\nUploading Fedora-Cloud-Base-33-1.2.x86_64.raw.xz completed successfully\n
"},{"location":"storage/containerized_data_importer/#setting-the-url-of-the-cdi-upload-proxy-service","title":"Setting the URL of the cdi-upload Proxy Service","text":"Setting the URL for the cdi-upload proxy service allows the virtctl image-upload
command to upload the images without specifying the --uploadproxy-url
flag. Permanently setting the URL is done by patching the CDI configuration.
The following will set the default upload proxy to use port 31001 of cdi-uploadproxy. An IP address could also be used instead of the dns name.
See the section Addressing Certificate Issues when Uploading for why cdi-uploadproxy was chosen and issues that can be encountered when using an IP address.
kubectl patch cdi cdi \\\n --type merge \\\n --patch '{\"spec\":{\"config\":{\"uploadProxyURLOverride\":\"https://cdi-uploadproxy:31001\"}}}'\n
"},{"location":"storage/containerized_data_importer/#create-a-virtualmachineinstance","title":"Create a VirtualMachineInstance","text":"To create a VirtualMachineInstance
from a DataVolume, you can execute the following:
cat <<EOF | kubectl apply -f -\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: cirros-vm\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: dvdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: dvdisk\n dataVolume:\n name: cirros-vm-disk\nstatus: {}\nEOF\n
"},{"location":"storage/containerized_data_importer/#connect-to-virtualmachineinstance-console","title":"Connect to VirtualMachineInstance console","text":"Use virtctl
to connect to the newly create VirtualMachineInstance
.
virtctl console cirros-vm\n
"},{"location":"storage/disks_and_volumes/","title":"Filesystems, Disks and Volumes","text":"Making persistent storage in the cluster (volumes) accessible to VMs consists of three parts. First, volumes are specified in spec.volumes
. Second, disks are added to the VM by specifying them in spec.domain.devices.disks
. Finally, a reference to the specified volume is added to the disk specification by name.
Like all other vmi devices a spec.domain.devices.disks
element has a mandatory name
, and furthermore, the disk's name
must reference the name
of a volume inside spec.volumes
.
A disk can be made accessible via four different types:
lun
disk
cdrom
fileystems
All possible configuration options are available in the Disk API Reference.
All types allow you to specify the bus
attribute. The bus
attribute determines how the disk will be presented to the guest operating system.
A lun
disk will expose the volume as a LUN device to the VM. This allows the VM to execute arbitrary iSCSI command passthrough.
A minimal example which attaches a PersistentVolumeClaim
named mypvc
as a lun
device to the VM:
metadata:\n name: testvmi-lun\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n # This makes it a lun device\n lun: {}\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#persistent-reservation","title":"persistent reservation","text":"It is possible to reserve a LUN through the the SCSI Persistent Reserve commands. In order to issue privileged SCSI ioctls, the VM requires activation of the persistent resevation flag:
devices:\n disks:\n - name: mypvcdisk\n lun:\n reservation: true\n
This feature is enabled by the feature gate PersistentReservation
:
configuration:\n developerConfiguration:\n featureGates:\n - PersistentReservation\n
Note: The persistent reservation feature enables an additional privileged component to be deployed together with virt-handler. Because this feature allows for sensitive security procedures, it is disabled by default and requires cluster administrator configuration.
"},{"location":"storage/disks_and_volumes/#disk","title":"disk","text":"A disk
disk will expose the volume as an ordinary disk to the VM.
A minimal example which attaches a PersistentVolumeClaim
named mypvc
as a disk
device to the VM:
metadata:\n name: testvmi-disk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n # This makes it a disk\n disk: {}\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
You can set the disk bus
type, overriding the defaults, which in turn depends on the chipset the VM is configured to use:
metadata:\n name: testvmi-disk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n # This makes it a disk\n disk:\n # This makes it exposed as /dev/vda, being the only and thus first\n # disk attached to the VM\n bus: virtio\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#cdrom","title":"cdrom","text":"A cdrom
disk will expose the volume as a cdrom drive to the VM. It is read-only by default.
A minimal example which attaches a PersistentVolumeClaim
named mypvc
as a cdrom
device to the VM:
metadata:\n name: testvmi-cdrom\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n # This makes it a cdrom\n cdrom:\n # This makes the cdrom writeable\n readOnly: false\n # This makes the cdrom be exposed as SATA device\n bus: sata\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#filesystems","title":"filesystems","text":"A filesystem
device will expose the volume as a filesystem to the VM. filesystems
rely on virtiofs
to make visible external filesystems to KubeVirt
VMs. Further information about virtiofs
can be found at the Official Virtiofs Site.
Compared with disk
, filesystems
allow changes in the source to be dynamically reflected in the volumes inside the VM. For instance, if a given configMap
is shared with filesystems
any change made on it will be reflected in the VMs. However, it is important to note that filesystems
do not allow live migration.
Additionally, filesystem
devices must be mounted inside the VM. This can be done through cloudInitNoCloud or manually connecting to the VM shell and targeting the same command. The main challenge is to understand how the device tag used to identify the new filesystem and mount it with the mount -t virtiofs [device tag] [path]
command. For that purpose, the tag is assigned to the filesystem in the VM spec spec.domain.devices.filesystems.name
. For instance, if in a given VM spec is spec.domain.devices.filesystems.name: foo
, the required command inside the VM to mount the filesystem in the /tmp/foo
path will be mount -t virtiofs foo /tmp/foo
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-filesystems\nspec:\n domain:\n devices:\n filesystems:\n - name: foo\n virtiofs: {}\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk \n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n bootcmd:\n - \"sudo mkdir /tmp/foo\"\n - \"sudo mount -t virtiofs foo /tmp/foo\"\n - persistentVolumeClaim:\n claimName: mypvc\n name: foo\n
Note: As stated, filesystems
rely on virtiofs
. Moreover, virtiofs
requires kernel linux support to work in the VM. To check if the linux image of the VM has the required support, you can address the following command: modprobe virtiofs
. If the command output is modprobe: FATAL: Module virtiofs not found
, the linux image of the VM does not support virtiofs. Also, you can check if the kernel version is up to 5.4 in any linux distribution or up to 4.18 in centos/rhel. To check this, you can target the following command: uname -r
.
Refer to section Sharing Directories with VMs for usage examples of filesystems
.
The error policy controls how the hypervisor should behave when an IO error occurs on a disk read or write. The default behaviour is to stop the guest and a Kubernetes event is generated. However, it is possible to change the value to either:
report
: the error is reported in the guestignore
: the error is ignored, but the read/write failure goes undetectedenospace
: error when there isn't enough space on the diskThe error policy can be specified per disk or lun.
Example:
spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n errorPolicy: \"report\"\n - lun:\n bus: scsi\n name: scsi-disk\n errorPolicy: \"report\"\n
"},{"location":"storage/disks_and_volumes/#volumes","title":"Volumes","text":"Supported volume sources are
cloudInitNoCloud
cloudInitConfigDrive
persistentVolumeClaim
dataVolume
ephemeral
containerDisk
emptyDisk
hostDisk
configMap
secret
serviceAccount
downwardMetrics
All possible configuration options are available in the Volume API Reference.
"},{"location":"storage/disks_and_volumes/#cloudinitnocloud","title":"cloudInitNoCloud","text":"Allows attaching cloudInitNoCloud
data-sources to the VM. If the VM contains a proper cloud-init setup, it will pick up the disk as a user-data source.
A simple example which attaches a Secret
as a cloud-init disk
datasource may look like this:
metadata:\n name: testvmi-cloudinitnocloud\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mybootdisk\n lun: {}\n - name: mynoclouddisk\n disk: {}\n volumes:\n - name: mybootdisk\n persistentVolumeClaim:\n claimName: mypvc\n - name: mynoclouddisk\n cloudInitNoCloud:\n secretRef:\n name: testsecret\n
"},{"location":"storage/disks_and_volumes/#cloudinitconfigdrive","title":"cloudInitConfigDrive","text":"Allows attaching cloudInitConfigDrive
data-sources to the VM. If the VM contains a proper cloud-init setup, it will pick up the disk as a user-data source.
A simple example which attaches a Secret
as a cloud-init disk
datasource may look like this:
metadata:\n name: testvmi-cloudinitconfigdrive\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mybootdisk\n lun: {}\n - name: myconfigdrivedisk\n disk: {}\n volumes:\n - name: mybootdisk\n persistentVolumeClaim:\n claimName: mypvc\n - name: myconfigdrivedisk\n cloudInitConfigDrive:\n secretRef:\n name: testsecret\n
The cloudInitConfigDrive
can also be used to configure VMs with Ignition. You just need to replace the cloud-init data by the Ignition data.
Allows connecting a PersistentVolumeClaim
to a VM disk.
Use a PersistentVolumeClaim when the VirtualMachineInstance's disk needs to persist after the VM terminates. This allows for the VM's data to remain persistent between restarts.
A PersistentVolume
can be in \"filesystem\" or \"block\" mode:
Filesystem: For KubeVirt to be able to consume the disk present on a PersistentVolume's filesystem, the disk must be named disk.img
and be placed in the root path of the filesystem. Currently the disk is also required to be in raw format. > Important: The disk.img
image file needs to be owned by the user-id 107
in order to avoid permission issues.
Note: If the disk.img
image file has not been created manually before starting a VM then it will be created automatically with the PersistentVolumeClaim
size. Since not every storage provisioner provides volumes with the exact usable amount of space as requested (e.g. due to filesystem overhead), KubeVirt tolerates up to 10% less available space. This can be configured with the developerConfiguration.pvcTolerateLessSpaceUpToPercent
value in the KubeVirt CR (kubectl edit kubevirt kubevirt -n kubevirt
).
Block: Use a block volume for consuming raw block devices. Note: you need to enable the BlockVolume
feature gate.
A simple example which attaches a PersistentVolumeClaim
as a disk
may look like this:
metadata:\n name: testvmi-pvc\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#thick-and-thin-volume-provisioning","title":"Thick and thin volume provisioning","text":"Sparsification can make a disk thin-provisioned, in other words it allows to convert the freed space within the disk image into free space back on the host. The fstrim utility can be used on a mounted filesystem to discard the blocks not used by the filesystem. In order to be able to sparsify a disk inside the guest, the disk needs to be configured in the libvirt xml with the option discard=unmap
. In KubeVirt, every disk is passed as default with this option enabled. It is possible to check if the trim configuration is supported in the guest by runninglsblk -D
, and check the discard options supported on every disk.
Example:
$ lsblk -D\nNAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO\nloop0 0 4K 4G 0\nloop1 0 64K 4M 0\nsr0 0 0B 0B 0\nrbd0 0 64K 4M 0\nvda 512 512B 2G 0\n\u2514\u2500vda1 0 512B 2G 0\n
However, in certain cases like preallocaton or when the disk is thick provisioned, the option needs to be disabled. The disk's PVC has to be marked with an annotation that contains /storage.preallocation
or /storage.thick-provisioned
, and set to true. If the volume is preprovisioned using CDI and the preallocation is enabled, then the PVC is automatically annotated with: cdi.kubevirt.io/storage.preallocation: true
and the discard passthrough option is disabled.
Example of a PVC definition with the annotation to disable discard passthrough:
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: pvc\n annotations:\n user.custom.annotation/storage.thick-provisioned: \"true\"\nspec:\n storageClassName: local\n accessModes:\n - ReadWriteOnce\n volumeMode: Filesystem\n resources:\n requests:\n storage: 1Gi\n
"},{"location":"storage/disks_and_volumes/#disk-expansion","title":"disk expansion","text":"For some storage methods, Kubernetes may support expanding storage in-use (allowVolumeExpansion feature). KubeVirt can respond to it by making the additional storage available for the virtual machines. This feature is currently off by default, and requires enabling a feature gate. To enable it, add the ExpandDisks feature gate in the kubevirt object:
spec:\n configuration:\n developerConfiguration:\n featureGates:\n - ExpandDisks\n
Enabling this feature does two things: - Notify the virtual machine about size changes - If the disk is a Filesystem PVC, the matching file is expanded to the remaining size (while reserving some space for file system overhead).
"},{"location":"storage/disks_and_volumes/#statically-provisioned-block-pvcs","title":"Statically provisioned block PVCs","text":"To use an externally managed local block device from a host ( e.g. /dev/sdb , zvol, LVM, etc... ) in a VM directly, you would need a provisioner that supports block devices, such as OpenEBS LocalPV.
Alternatively, local volumes can be provisioned by hand. I.e. the following PVC:
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: myblock\nspec:\n storageClassName: local-device\n volumeMode: Block\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 100Gi\n
can claim a PersistentVolume pre-created by a cluster admin like so:
apiVersion: storage.k8s.io/v1\nkind: StorageClass\nmetadata:\n name: local-device\nprovisioner: kubernetes.io/no-provisioner\n---\napiVersion: v1\nkind: PersistentVolume\nmetadata:\n name: myblock\nspec:\n volumeMode: Block\n storageClassName: local-device\n nodeAffinity:\n required:\n nodeSelectorTerms:\n - matchExpressions:\n - key: kubernetes.io/hostname\n operator: In\n values:\n - my-node\n accessModes:\n - ReadWriteOnce\n capacity:\n storage: 100Gi\n local:\n path: /dev/sdb\n
"},{"location":"storage/disks_and_volumes/#datavolume","title":"dataVolume","text":"DataVolumes are a way to automate importing virtual machine disks onto PVCs during the virtual machine's launch flow. Without using a DataVolume, users have to prepare a PVC with a disk image before assigning it to a VM or VMI manifest. With a DataVolume, both the PVC creation and import is automated on behalf of the user.
"},{"location":"storage/disks_and_volumes/#datavolume-vm-behavior","title":"DataVolume VM Behavior","text":"DataVolumes can be defined in the VM spec directly by adding the DataVolumes to the dataVolumeTemplates
list. Below is an example.
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-alpine-datavolume\n name: vm-alpine-datavolume\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-alpine-datavolume\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: datavolumedisk1\n resources:\n requests:\n memory: 64M\n volumes:\n - dataVolume:\n name: alpine-dv\n name: datavolumedisk1\n dataVolumeTemplates:\n - metadata:\n name: alpine-dv\n spec:\n storage:\n resources:\n requests:\n storage: 2Gi\n source:\n http:\n url: http://cdi-http-import-server.kubevirt/images/alpine.iso\n
You can see the DataVolume defined in the dataVolumeTemplates section has two parts. The source and pvc
The source part declares that there is a disk image living on an http server that we want to use as a volume for this VM. The pvc part declares the spec that should be used to create the PVC that hosts the source data.
When this VM manifest is posted to the cluster, as part of the launch flow a PVC will be created using the spec provided and the source data will be automatically imported into that PVC before the VM starts. When the VM is deleted, the storage provisioned by the DataVolume will automatically be deleted as well.
"},{"location":"storage/disks_and_volumes/#datavolume-vmi-behavior","title":"DataVolume VMI Behavior","text":"For a VMI object, DataVolumes can be referenced as a volume source for the VMI. When this is done, it is expected that the referenced DataVolume exists in the cluster. The VMI will consume the DataVolume, but the DataVolume's life-cycle will not be tied to the VMI.
Below is an example of a DataVolume being referenced by a VMI. It is expected that the DataVolume alpine-datavolume was created prior to posting the VMI manifest to the cluster. It is okay to post the VMI manifest to the cluster while the DataVolume is still having data imported. KubeVirt knows not to start the VMI until all referenced DataVolumes have finished their clone and import phases.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-alpine-datavolume\n name: vmi-alpine-datavolume\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: disk1\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: disk1\n dataVolume:\n name: alpine-datavolume\n
"},{"location":"storage/disks_and_volumes/#enabling-datavolume-support","title":"Enabling DataVolume support.","text":"A DataVolume is a custom resource provided by the Containerized Data Importer (CDI) project. KubeVirt integrates with CDI in order to provide users a workflow for dynamically creating PVCs and importing data into those PVCs.
In order to take advantage of the DataVolume volume source on a VM or VMI, CDI must be installed.
Installing CDI
Go to the CDI release page
Pick the latest stable release and post the corresponding cdi-controller-deployment.yaml manifest to your cluster.
"},{"location":"storage/disks_and_volumes/#ephemeral","title":"ephemeral","text":"An ephemeral volume is a local COW (copy on write) image that uses a network volume as a read-only backing store. With an ephemeral volume, the network backing store is never mutated. Instead all writes are stored on the ephemeral image which exists on local storage. KubeVirt dynamically generates the ephemeral images associated with a VM when the VM starts, and discards the ephemeral images when the VM stops.
Ephemeral volumes are useful in any scenario where disk persistence is not desired. The COW image is discarded when VM reaches a final state (e.g., succeeded, failed).
Currently, only PersistentVolumeClaim
may be used as a backing store of the ephemeral volume.
Up-to-date information on supported backing stores can be found in the KubeVirt API.
metadata:\n name: testvmi-ephemeral-pvc\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: mypvcdisk\n lun: {}\n volumes:\n - name: mypvcdisk\n ephemeral:\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#containerdisk","title":"containerDisk","text":"containerDisk was originally registryDisk, please update your code when needed.
The containerDisk
feature provides the ability to store and distribute VM disks in the container image registry. containerDisks
can be assigned to VMs in the disks section of the VirtualMachineInstance spec.
No network shared storage devices are utilized by containerDisks
. The disks are pulled from the container registry and reside on the local node hosting the VMs that consume the disks.
containerDisks
are ephemeral storage devices that can be assigned to any number of active VirtualMachineInstances. This makes them an ideal tool for users who want to replicate a large number of VM workloads that do not require persistent data. containerDisks
are commonly used in conjunction with VirtualMachineInstanceReplicaSets.
containerDisks
are not a good solution for any workload that requires persistent root disks across VM restarts.
Users can inject a VirtualMachineInstance disk into a container image in a way that is consumable by the KubeVirt runtime. Disks must be placed into the /disk
directory inside the container. Raw and qcow2 formats are supported. Qcow2 is recommended in order to reduce the container image's size. containerdisks
can and should be based on scratch
. No content except the image is required.
Note: Prior to kubevirt 0.20, the containerDisk image needed to have kubevirt/container-disk-v1alpha as base image.
Note: The containerDisk needs to be readable for the user with the UID 107 (qemu).
Example: Inject a local VirtualMachineInstance disk into a container image.
cat << END > Dockerfile\nFROM scratch\nADD --chown=107:107 fedora25.qcow2 /disk/\nEND\n\ndocker build -t vmidisks/fedora25:latest .\n
Example: Inject a remote VirtualMachineInstance disk into a container image.
cat << END > Dockerfile\nFROM scratch\nADD --chown=107:107 https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2 /disk/\nEND\n
Example: Upload the ContainerDisk container image to a registry.
docker push vmidisks/fedora25:latest\n
Example: Attach the ContainerDisk as an ephemeral disk to a VM.
metadata:\n name: testvmi-containerdisk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk: {}\n volumes:\n - name: containerdisk\n containerDisk:\n image: vmidisks/fedora25:latest\n
Note that a containerDisk
is file-based and therefore cannot be attached as a lun
device to the VM.
ContainerDisk also allows to store disk images in any folder, when required. The process is the same as previous. The main difference is, that in custom location, kubevirt does not scan for any image. It is your responsibility to provide full path for the disk image. Providing image path
is optional. When no path
is provided, kubevirt searches for disk images in default location: /disk
.
Example: Build container disk image:
cat << END > Dockerfile\nFROM scratch\nADD fedora25.qcow2 /custom-disk-path/fedora25.qcow2\nEND\n\ndocker build -t vmidisks/fedora25:latest .\ndocker push vmidisks/fedora25:latest\n
Create VMI with container disk pointing to the custom location:
metadata:\n name: testvmi-containerdisk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk: {}\n volumes:\n - name: containerdisk\n containerDisk:\n image: vmidisks/fedora25:latest\n path: /custom-disk-path/fedora25.qcow2\n
"},{"location":"storage/disks_and_volumes/#emptydisk","title":"emptyDisk","text":"An emptyDisk
works similar to an emptyDir
in Kubernetes. An extra sparse qcow2
disk will be allocated and it will live as long as the VM. Thus it will survive guest side VM reboots, but not a VM re-creation. The disk capacity
needs to be specified.
Example: Boot cirros with an extra emptyDisk
with a size of 2GiB
:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: emptydisk\n disk:\n bus: virtio\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-registry-disk-demo:latest\n - name: emptydisk\n emptyDisk:\n capacity: 2Gi\n
"},{"location":"storage/disks_and_volumes/#when-to-use-an-emptydisk","title":"When to use an emptyDisk","text":"Ephemeral VMs very often come with read-only root images and limited tmpfs space. In many cases this is not enough to install application dependencies and provide enough disk space for the application data. While this data is not critical and thus can be lost, it is still needed for the application to function properly during its lifetime. This is where an emptyDisk
can be useful. An emptyDisk is often used and mounted somewhere in /var/lib
or /var/run
.
A hostDisk
volume type provides the ability to create or use a disk image located somewhere on a node. It works similar to a hostPath
in Kubernetes and provides two usage types:
DiskOrCreate
if a disk image does not exist at a given location then create one
Disk
a disk image must exist at a given location
Note: you need to enable the HostDisk feature gate.
Example: Create a 1Gi disk image located at /data/disk.img and attach it to a VM.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-host-disk\n name: vmi-host-disk\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: host-disk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - hostDisk:\n capacity: 1Gi\n path: /data/disk.img\n type: DiskOrCreate\n name: host-disk\nstatus: {}\n
Note: This does not always work as expected. Instead you may want to consider creating a PersistentVolume
"},{"location":"storage/disks_and_volumes/#configmap","title":"configMap","text":"A configMap
is a reference to a ConfigMap in Kubernetes. A configMap
can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk
does not support dynamic change propagation and filesystem
does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.
By using disk, an extra iso
disk will be allocated which has to be mounted on a VM. To mount the configMap
users can use cloudInit
and the disk's serial number. The name
needs to be set for a reference to the created kubernetes ConfigMap
.
Note: Currently, ConfigMap update is not propagate into the VMI. If a ConfigMap is updated, only a pod will be aware of changes, not running VMIs.
Note: Due to a Kubernetes CRD issue, you cannot control the paths within the volume where ConfigMap keys are projected.
Example: Attach the configMap
to a VM and use cloudInit
to mount the iso
disk:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n name: app-config-disk\n # set serial\n serial: CVLY623300HK240D\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n bootcmd:\n # mount the ConfigMap\n - \"sudo mkdir /mnt/app-config\"\n - \"sudo mount /dev/$(lsblk --nodeps -no name,serial | grep CVLY623300HK240D | cut -f1 -d' ') /mnt/app-config\"\n name: cloudinitdisk\n - configMap:\n name: app-config\n name: app-config-disk\nstatus: {}\n
"},{"location":"storage/disks_and_volumes/#as-a-filesystem","title":"As a filesystem","text":"By using filesystem, configMaps
are shared through virtiofs
. In contrast with using disk for sharing configMaps
, filesystem
allows you to dynamically propagate changes on configMaps
to VMIs (i.e. the VM does not need to be rebooted).
Note: Currently, VMIs can not be live migrated since virtiofs
does not support live migration.
To share a given configMap
, the following VM definition could be used:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n filesystems:\n - name: config-fs\n virtiofs: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n bootcmd:\n # mount the ConfigMap\n - \"sudo mkdir /mnt/app-config\"\n - \"sudo mount -t virtiofs config-fs /mnt/app-config\"\n name: cloudinitdisk \n - configMap:\n name: app-config\n name: config-fs\n
"},{"location":"storage/disks_and_volumes/#secret","title":"secret","text":"A secret
is a reference to a Secret in Kubernetes. A secret
can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk
does not support dynamic change propagation and filesystem
does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.
By using disk, an extra iso
disk will be allocated which has to be mounted on a VM. To mount the secret
users can use cloudInit
and the disks serial number. The secretName
needs to be set for a reference to the created kubernetes Secret
.
Note: Currently, Secret update propagation is not supported. If a Secret is updated, only a pod will be aware of changes, not running VMIs.
Note: Due to a Kubernetes CRD issue, you cannot control the paths within the volume where Secret keys are projected.
Example: Attach the secret
to a VM and use cloudInit
to mount the iso
disk:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n name: app-secret-disk\n # set serial\n serial: D23YZ9W6WA5DJ487\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n bootcmd:\n # mount the Secret\n - \"sudo mkdir /mnt/app-secret\"\n - \"sudo mount /dev/$(lsblk --nodeps -no name,serial | grep D23YZ9W6WA5DJ487 | cut -f1 -d' ') /mnt/app-secret\"\n name: cloudinitdisk\n - secret:\n secretName: app-secret\n name: app-secret-disk\nstatus: {}\n
"},{"location":"storage/disks_and_volumes/#as-a-filesystem_1","title":"As a filesystem","text":"By using filesystem, secrets
are shared through virtiofs
. In contrast with using disk for sharing secrets
, filesystem
allows you to dynamically propagate changes on secrets
to VMIs (i.e. the VM does not need to be rebooted).
Note: Currently, VMIs can not be live migrated since virtiofs
does not support live migration.
To share a given secret
, the following VM definition could be used:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n filesystems:\n - name: app-secret-fs\n virtiofs: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n bootcmd:\n # mount the Secret\n - \"sudo mkdir /mnt/app-secret\"\n - \"sudo mount -t virtiofs app-secret-fs /mnt/app-secret\"\n name: cloudinitdisk\n - secret:\n secretName: app-secret\n name: app-secret-fs\n
"},{"location":"storage/disks_and_volumes/#serviceaccount","title":"serviceAccount","text":"A serviceAccount
volume references a Kubernetes ServiceAccount
. A serviceAccount
can be presented to the VM as disks or as a filesystem. Each method is described in the following sections and both have some advantages and disadvantages, e.g. disk
does not support dynamic change propagation and filesystem
does not support live migration. Therefore, depending on the use-case, one or the other may be more suitable.
By using disk, a new iso
disk will be allocated with the content of the service account (namespace
, token
and ca.crt
), which needs to be mounted in the VM. For automatic mounting, see the configMap
and secret
examples above.
Note: Currently, ServiceAccount update propagation is not supported. If a ServiceAccount is updated, only a pod will be aware of changes, not running VMIs.
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n disks:\n - disk:\n name: containerdisk\n - disk:\n name: serviceaccountdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - name: serviceaccountdisk\n serviceAccount:\n serviceAccountName: default\n
"},{"location":"storage/disks_and_volumes/#as-a-filesystem_2","title":"As a filesystem","text":"By using filesystem, serviceAccounts
are shared through virtiofs
. In contrast with using disk for sharing serviceAccounts
, filesystem
allows you to dynamically propagate changes on serviceAccounts
to VMIs (i.e. the VM does not need to be rebooted).
Note: Currently, VMIs can not be live migrated since virtiofs
does not support live migration.
To share a given serviceAccount
, the following VM definition could be used:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n filesystems:\n - name: serviceaccount-fs\n virtiofs: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: fedora\n user: fedora\n bootcmd:\n # mount the ConfigMap\n - \"sudo mkdir /mnt/serviceaccount\"\n - \"sudo mount -t virtiofs serviceaccount-fs /mnt/serviceaccount\"\n name: cloudinitdisk\n - name: serviceaccount-fs\n serviceAccount:\n serviceAccountName: default\n
"},{"location":"storage/disks_and_volumes/#downwardmetrics","title":"downwardMetrics","text":"downwardMetrics
expose a limited set of VM and host metrics to the guest. The format is compatible with vhostmd.
Getting a limited set of host and VM metrics is in some cases required to allow third-parties diagnosing performance issues on their appliances. One prominent example is SAP HANA.
In order to expose downwardMetrics
to VMs, the methods disk
and virtio-serial port
are supported.
Note: The DownwardMetrics feature gate must be enabled to use the metrics. Available starting with KubeVirt v0.42.0.
"},{"location":"storage/disks_and_volumes/#disk_1","title":"Disk","text":"A volume is created, and it is exposed to the guest as a raw block volume. KubeVirt will update it periodically (by default, every 5 seconds).
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: metrics\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - name: metrics\n downwardMetrics: {}\n
"},{"location":"storage/disks_and_volumes/#virtio-serial-port","title":"Virtio-serial port","text":"This method uses a virtio-serial port to expose the metrics data to the VM. KubeVirt creates a port named /dev/virtio-ports/org.github.vhostmd.1
inside the VM, in which the Virtio Transport protocol is supported. downwardMetrics
can be retrieved from this port. See vhostmd documentation under Virtio Transport
for further information.
To expose the metrics using a virtio-serial port, a downwardMetrics
device must be added (i.e., spec.domain.devices.downwardMetrics: {}
).
Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-fedora\n name: vmi-fedora\nspec:\n domain:\n devices:\n downwardMetrics: {}\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n
"},{"location":"storage/disks_and_volumes/#accessing-metrics-data","title":"Accessing Metrics Data","text":"To access the DownwardMetrics shared with a disk or a virtio-serial port, the vm-dump-metrics
tool can be used:
$ sudo dnf install -y vm-dump-metrics\n$ sudo vm-dump-metrics\n<metrics>\n <metric type=\"string\" context=\"host\">\n <name>HostName</name>\n <value>node01</value>\n[...]\n <metric type=\"int64\" context=\"host\" unit=\"s\">\n <name>Time</name>\n <value>1619008605</value>\n </metric>\n <metric type=\"string\" context=\"host\">\n <name>VirtualizationVendor</name>\n <value>kubevirt.io</value>\n </metric>\n</metrics>\n
vm-dump-metrics
is useful as a standalone tool to verify the serial port is working and to inspect the metrics. However, applications that consume metrics will usually connect to the virtio-serial port themselves.
Note: The tool vm-dump-metrics
provides the option --virtio
in case the virtio-serial port is used. Please, refer to vm-dump-metrics --help
for further information.
Libvirt has the ability to use IOThreads for dedicated disk access (for supported devices). These are dedicated event loop threads that perform block I/O requests and improve scalability on SMP systems. KubeVirt exposes this libvirt feature through the ioThreadsPolicy
setting. Additionally, each Disk
device exposes a dedicatedIOThread
setting. This is a boolean that indicates the specified disk should be allocated an exclusive IOThread that will never be shared with other disks.
Currently valid policies are shared
and auto
. If ioThreadsPolicy
is omitted entirely, use of IOThreads will be disabled. However, if any disk requests a dedicated IOThread, ioThreadsPolicy
will be enabled and default to shared
.
An ioThreadsPolicy
of shared
indicates that KubeVirt should use one thread that will be shared by all disk devices. This policy stems from the fact that large numbers of IOThreads is generally not useful as additional context switching is incurred for each thread.
Disks with dedicatedIOThread
set to true
will not use the shared thread, but will instead be allocated an exclusive thread. This is generally useful if a specific Disk is expected to have heavy I/O traffic, e.g. a database spindle.
auto
IOThreads indicates that KubeVirt should use a pool of IOThreads and allocate disks to IOThreads in a round-robin fashion. The pool size is generally limited to twice the number of VCPU's allocated to the VM. This essentially attempts to dedicate disks to separate IOThreads, but only up to a reasonable limit. This would come in to play for systems with a large number of disks and a smaller number of CPU's for instance.
As a caveat to the size of the IOThread pool, disks with dedicatedIOThread
will always be guaranteed their own thread. This effectively diminishes the upper limit of the number of threads allocated to the rest of the disks. For example, a VM with 2 CPUs would normally use 4 IOThreads for all disks. However if one disk had dedicatedIOThread
set to true, then KubeVirt would only use 3 IOThreads for the shared pool.
There is always guaranteed to be at least one thread for disks that will use the shared IOThreads pool. Thus if a sufficiently large number of disks have dedicated IOThreads assigned, auto
and shared
policies would essentially result in the same layout.
When guest's vCPUs are pinned to a host's physical CPUs, it is also best to pin the IOThreads to specific CPUs to prevent these from floating between the CPUs. KubeVirt will automatically calculate and pin each IOThread to a CPU or a set of CPUs, depending on the ration between them. In case there are more IOThreads than CPUs, each IOThread will be pinned to a CPU, in a round-robin fashion. Otherwise, when there are fewer IOThreads than CPU, each IOThread will be pinned to a set of CPUs.
"},{"location":"storage/disks_and_volumes/#iothreads-with-qemu-emulator-thread-and-dedicated-pinned-cpus","title":"IOThreads with QEMU Emulator thread and Dedicated (pinned) CPUs","text":"To further improve the vCPUs latency, KubeVirt can allocate an additional dedicated physical CPU1, exclusively for the emulator thread, to which it will be pinned. This will effectively \"isolate\" the emulator thread from the vCPUs of the VMI. When ioThreadsPolicy
is set to auto
IOThreads will also be \"isolated\" from the vCPUs and placed on the same physical CPU as the QEMU emulator thread.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-shared\n name: vmi-shared\nspec:\n domain:\n ioThreadsPolicy: shared\n cpu:\n cores: 2\n devices:\n disks:\n - disk:\n bus: virtio\n name: vmi-shared_disk\n - disk:\n bus: virtio\n name: emptydisk\n dedicatedIOThread: true\n - disk:\n bus: virtio\n name: emptydisk2\n dedicatedIOThread: true\n - disk:\n bus: virtio\n name: emptydisk3\n - disk:\n bus: virtio\n name: emptydisk4\n - disk:\n bus: virtio\n name: emptydisk5\n - disk:\n bus: virtio\n name: emptydisk6\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n volumes:\n - name: vmi-shared_disk\n persistentVolumeClaim:\n claimName: vmi-shared_pvc\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk2\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk3\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk4\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk5\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk6\n
In this example, emptydisk and emptydisk2 both request a dedicated IOThread. vmi-shared_disk, and emptydisk 3 through 6 will all shared one IOThread.
mypvc: 1\nemptydisk: 2\nemptydisk2: 3\nemptydisk3: 1\nemptydisk4: 1\nemptydisk5: 1\nemptydisk6: 1\n
"},{"location":"storage/disks_and_volumes/#auto-iothreads","title":"Auto IOThreads","text":"apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-shared\n name: vmi-shared\nspec:\n domain:\n ioThreadsPolicy: auto\n cpu:\n cores: 2\n devices:\n disks:\n - disk:\n bus: virtio\n name: mydisk\n - disk:\n bus: virtio\n name: emptydisk\n dedicatedIOThread: true\n - disk:\n bus: virtio\n name: emptydisk2\n dedicatedIOThread: true\n - disk:\n bus: virtio\n name: emptydisk3\n - disk:\n bus: virtio\n name: emptydisk4\n - disk:\n bus: virtio\n name: emptydisk5\n - disk:\n bus: virtio\n name: emptydisk6\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n volumes:\n - name: mydisk\n persistentVolumeClaim:\n claimName: mypvc\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk2\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk3\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk4\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk5\n - emptyDisk:\n capacity: 1Gi\n name: emptydisk6\n
This VM is identical to the first, except it requests auto IOThreads. emptydisk
and emptydisk2
will still be allocated individual IOThreads, but the rest of the disks will be split across 2 separate iothreads (twice the number of CPU cores is 4).
Disks will be assigned to IOThreads like this:
mypvc: 1\nemptydisk: 3\nemptydisk2: 4\nemptydisk3: 2\nemptydisk4: 1\nemptydisk5: 2\nemptydisk6: 1\n
"},{"location":"storage/disks_and_volumes/#virtio-block-multi-queue","title":"Virtio Block Multi-Queue","text":"Block Multi-Queue is a framework for the Linux block layer that maps Device I/O queries to multiple queues. This splits I/O processing up across multiple threads, and therefor multiple CPUs. libvirt recommends that the number of queues used should match the number of CPUs allocated for optimal performance.
This feature is enabled by the BlockMultiQueue
setting under Devices
:
spec:\n domain:\n devices:\n blockMultiQueue: true\n disks:\n - disk:\n bus: virtio\n name: mydisk\n
Note: Due to the way KubeVirt implements CPU allocation, blockMultiQueue can only be used if a specific CPU allocation is requested. If a specific number of CPUs hasn't been allocated to a VirtualMachine, KubeVirt will use all CPU's on the node on a best effort basis. In that case the amount of CPU allocation to a VM at the host level could change over time. If blockMultiQueue were to request a number of queues to match all the CPUs on a node, that could lead to over-allocation scenarios. To avoid this, KubeVirt enforces that a specific slice of CPU resources is requested in order to take advantage of this feature.
"},{"location":"storage/disks_and_volumes/#example","title":"Example","text":"metadata:\n name: testvmi-disk\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nspec:\n domain:\n resources:\n requests:\n memory: 64M\n cpu: 4\n devices:\n blockMultiQueue: true\n disks:\n - name: mypvcdisk\n disk:\n bus: virtio\n volumes:\n - name: mypvcdisk\n persistentVolumeClaim:\n claimName: mypvc\n
This example will enable Block Multi-Queue for the disk mypvcdisk
and allocate 4 queues (to match the number of CPUs requested).
KubeVirt supports none
, writeback
, and writethrough
KVM/QEMU cache modes.
none
I/O from the guest is not cached on the host. Use this option for guests with large I/O requirements. This option is generally the best choice.
writeback
I/O from the guest is cached on the host and written through to the physical media when the guest OS issues a flush.
writethrough
I/O from the guest is cached on the host but must be written through to the physical medium before the write operation completes.
Important: none
cache mode is set as default if the file system supports direct I/O, otherwise, writethrough
is used.
Note: It is possible to force a specific cache mode, although if none
mode has been chosen and the file system does not support direct I/O then started VMI will return an error.
Example: force writethrough
cache mode
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-pvc\n name: vmi-pvc\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: pvcdisk\n cache: writethrough\n machine:\n type: \"\"\n resources:\n requests:\n memory: 64M\n terminationGracePeriodSeconds: 0\n volumes:\n - name: pvcdisk\n persistentVolumeClaim:\n claimName: disk-alpine\nstatus: {}\n
"},{"location":"storage/disks_and_volumes/#disk-sharing","title":"Disk sharing","text":"Shareable disks allow multiple VMs to share the same underlying storage. In order to use this feature, special care is required because this could lead to data corruption and the loss of important data. Shareable disks demand either data synchronization at the application level or the use of clustered filesystems. These advanced configurations are not within the scope of this documentation and are use-case specific.
If the shareable
option is set, it indicates to libvirt/QEMU that the disk is going to be accessed by multiple VMs and not to create a lock for the writes.
In this example, we use Rook Ceph in order to dynamically provisioning the PVC.
apiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: block-pvc\nspec:\n accessModes:\n - ReadWriteMany\n volumeMode: Block\n resources:\n requests:\n storage: 1Gi\n storageClassName: rook-ceph-block\n
$ kubectl get pvc\nNAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE\nblock-pvc Bound pvc-0a161bb2-57c7-4d97-be96-0a20ff0222e2 1Gi RWO rook-ceph-block 51s\n
Then, we can declare 2 VMs and set the shareable
option to true for the shared disk. apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-block-1\n name: vm-block-1\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-block-1\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n bus: virtio\n shareable: true\n name: block-disk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 2G\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n - name: block-disk\n persistentVolumeClaim:\n claimName: block-pvc\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-block-2\n name: vm-block-2\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-block-2\n spec:\n affinity:\n podAffinity:\n requiredDuringSchedulingIgnoredDuringExecution:\n - labelSelector:\n matchExpressions:\n - key: kubevirt.io/vm\n operator: In\n values:\n - vm-block-1\n topologyKey: \"kubernetes.io/hostname\"\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n bus: virtio\n shareable: true\n name: block-disk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 2G\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/fedora-with-test-tooling-container-disk:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n - name: block-disk\n persistentVolumeClaim:\n claimName: block-pvc \n
We can now attempt to write a string from the first guest and then read the string from the second guest to test that the sharing is working. $ virtctl console vm-block-1\n$ printf \"Test awesome shareable disks\" | sudo dd of=/dev/vdc bs=1 count=150 conv=notrunc\n28+0 records in\n28+0 records out\n28 bytes copied, 0.0264182 s, 1.1 kB/s\n# Log into the second guest\n$ virtctl console vm-block-2\n$ sudo dd if=/dev/vdc bs=1 count=150 conv=notrunc\nTest awesome shareable disks150+0 records in\n150+0 records out\n150 bytes copied, 0.136753 s, 1.1 kB/s\n
If you are using local devices or RWO PVCs, setting the affinity on the VMs that share the storage guarantees they will be scheduled on the same node. In the example, we set the affinity on the second VM using the label used on the first VM. If you are using shared storage with RWX PVCs, then the affinity rule is not necessary as the storage can be attached simultaneously on multiple nodes.
"},{"location":"storage/disks_and_volumes/#sharing-directories-with-vms","title":"Sharing Directories with VMs","text":"Virtiofs
allows to make visible external filesystems to KubeVirt
VMs. Virtiofs
is a shared file system that lets VMs access a directory tree on the host. Further details can be found at Official Virtiofs Site.
KubeVirt supports two PVC sharing modes: non-privileged and privileged.
The non-privileged mode is enabled by default. This mode has the advantage of not requiring any administrative privileges for creating the VM. However, it has some limitations:
To switch to the privileged mode, the feature gate ExperimentalVirtiofsSupport has to be enabled. Take into account that this mode requires privileges to run rootful containers.
"},{"location":"storage/disks_and_volumes/#sharing-persistent-volume-claims","title":"Sharing Persistent Volume Claims","text":""},{"location":"storage/disks_and_volumes/#cluster-configuration","title":"Cluster Configuration","text":"We need to create a new VM definition including the spec.devices.disk.filesystems.virtiofs
and a PVC. Example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-fs\nspec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n filesystems:\n - name: virtiofs-disk\n virtiofs: {}\n resources:\n requests:\n memory: 1024Mi\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\n - name: virtiofs-disk\n persistentVolumeClaim:\n claimName: mypvc\n
"},{"location":"storage/disks_and_volumes/#configuration-inside-the-vm","title":"Configuration Inside the VM","text":"The following configuration can be done in using startup script. See cloudInitNoCloud section for more details. However, we can do it manually by logging in to the VM and mounting it. Here are examples of how to mount it in a linux and windows VMs:
$ sudo mkdir -p /mnt/disks/virtio\n$ sudo mount -t virtiofs virtiofs-disk /mnt/disks/virtio\n
See this guide for details on startup steps needed for Windows VMs.
"},{"location":"storage/disks_and_volumes/#sharing-node-directories","title":"Sharing Node Directories","text":"It is allowed using hostpaths. The following configuration example is shown for illustrative purposes. However, the PVCs method is preferred since using hostpath is generally discouraged for security reasons.
"},{"location":"storage/disks_and_volumes/#configuration-inside-the-node","title":"Configuration Inside the Node","text":"To share the directory with the VMs, we need to log in to the node, create the shared directory (if it does not already exist), and set the proper SELinux context label container_file_t
to the shared directory. In this example we are going to share a new directory /mnt/data
(if the desired directory is an existing one, you can skip the mkdir
command):
$ mkdir /tmp/data\n$ sudo chcon -t container_file_t /tmp/data\n
Note: If you are attempting to share an existing directory, you must first check the SELinux context label with the command ls -Z <directory>
. In the case that the label is not present or is not container_file_t
you need to label it with the chcon
command.
We need a StorageClass
which uses the provider no-provisioner
:
apiVersion: storage.k8s.io/v1\nkind: StorageClass\nmetadata:\n name: no-provisioner-storage-class\nprovisioner: kubernetes.io/no-provisioner\nreclaimPolicy: Delete\nvolumeBindingMode: WaitForFirstConsumer\n
To make the shared directory available for VMs, we need to create a PV and a PVC that could be consumed by the VMs:
kind: PersistentVolume\napiVersion: v1\nmetadata:\n name: hostpath\nspec:\n capacity:\n storage: 10Gi\n accessModes:\n - ReadWriteMany\n hostPath:\n path: \"/tmp/data\"\n storageClassName: \"no-provisioner-storage-class\"\n nodeAffinity:\n required:\n nodeSelectorTerms:\n - matchExpressions:\n - key: kubernetes.io/hostname\n operator: In\n values:\n - node01\n--- \napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: hostpath-claim\nspec:\n accessModes:\n - ReadWriteMany\n storageClassName: \"no-provisioner-storage-class\"\n resources:\n requests:\n storage: 10Gi\n
Note: Change the node01
value for the node name where you want the shared directory will be located.
The VM definitions have to request the PVC hostpath-claim
and attach it as a virtiofs filesystem:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: hostpath-vm\n name: hostpath\nspec:\n running: true\n template:\n metadata:\n labels:\n kubevirt.io/domain: hostpath\n kubevirt.io/vm: hostpath\n spec:\n domain:\n cpu:\n cores: 1\n sockets: 1\n threads: 1\n devices:\n filesystems:\n - name: vm-hostpath\n virtiofs: {}\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n interfaces:\n - name: default\n masquerade: {}\n rng: {}\n resources:\n requests:\n memory: 1Gi\n networks:\n - name: default\n pod: {}\n terminationGracePeriodSeconds: 180\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n chpasswd:\n expire: false\n password: password\n user: fedora\n name: cloudinitdisk\n - name: vm-hostpath\n persistentVolumeClaim:\n claimName: hostpath-claim\n
"},{"location":"storage/disks_and_volumes/#configuration-inside-the-vm_1","title":"Configuration Inside the VM","text":"We need to log in to the VM and mount the shared directory:
$ sudo mount -t virtiofs vm-hostpath /mnt\n
"},{"location":"storage/export_api/","title":"Export API","text":"It can be desirable to export a Virtual Machine and its related disks out of a cluster so you can import that Virtual Machine into another system or cluster. The Virtual Machine disks are the most prominent things you will want to export. The export API makes it possible to declaratively export Virtual Machine disks. It is also possible to export individual PVCs and their contents, for instance when you have created a memory dump from a VM or are using virtio-fs to have a Virtual Machine populate a PVC.
In order not to overload the kubernetes API server the data is transferred through a dedicated export proxy server. The proxy server can then be exposed to the outside world through a service associated with an Ingress/Route or NodePort. As an alternative, the port-forward
flag can be used with the virtctl integration to bypass the need of an Ingress/Route.
VMExport support must be enabled in the feature gates to be available. The feature gates field in the KubeVirt CR must be expanded by adding the VMExport
to it.
In order to securely export a Virtual Machine Disk, you must create a token that is used to authorize users accessing the export endpoint. This token must be in the same namespace as the Virtual Machine. The contents of the secret can be passed as a token header or parameter to the export URL. The name of the header or argument is x-kubevirt-export-token
with a value that matches the content of the secret. The secret can be named any valid secret in the namespace. We recommend you generate an alpha numeric token of at least 12 characters. The data key should be token
. For example:
apiVersion: v1\nkind: Secret\nmetadata:\n name: example-token\nstringData:\n token: 1234567890ab\n
"},{"location":"storage/export_api/#export-virtual-machine-volumes","title":"Export Virtual Machine volumes","text":"After you have created the token you can now create a VMExport CR that identifies the Virtual Machine you want to export. You can create a VMExport that looks like this:
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n tokenSecretRef: example-token\n source:\n apiGroup: \"kubevirt.io\"\n kind: VirtualMachine\n name: example-vm\n
The following volumes present in the VM will be exported:
All other volume types are not exported. To avoid the export of inconsistent data, a Virtual Machine can only be exported while it is powered off. Any active VM exports will be terminated if the Virtual Machine is started. To export data from a running Virtual Machine you must first create a Virtual Machine Snapshot (see below).
If the VM contains multiple volumes that can be exported, each volume will get its own URL links. If the VM contains no volumes that can be exported, the VMExport will go into a Skipped
phase, and no export server is started.
You can create a VMExport CR that identifies the Virtual Machine Snapshot you want to export. You can create a VMExport that looks like this:
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n tokenSecretRef: example-token\n source:\n apiGroup: \"snapshot.kubevirt.io\"\n kind: VirtualMachineSnapshot\n name: example-vmsnapshot\n
When you create a VMExport based on a Virtual Machine Snapshot, the controller will attempt to create PVCs from the volume snapshots contained in Virtual Machine Snapshot. Once all the PVCs are ready, the export server will start and you can begin the export. If the Virtual Machine Snapshot contains multiple volumes that can be exported, each volume will get its own URL links. If the Virtual Machine snapshot contains no volumes that can be exported, the VMExport will go into a skipped
phase, and no export server is started.
You can create a VMExport CR that identifies the Persistent Volume Claim (PVC) you want to export. You can create a VMExport that looks like this:
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n tokenSecretRef: example-token\n source:\n apiGroup: \"\"\n kind: PersistentVolumeClaim\n name: example-pvc\n
In this example the PVC name is example-pvc
. Note the PVC doesn't need to contain a Virtual Machine Disk, it can contain any content, but the main use case is exporting Virtual Machine Disks. After you post this yaml to the cluster, a new export server is created in the same namespace as the PVC. If the source PVC is in use by another pod (such as the virt-launcher pod) then the export will remain pending until the PVC is no longer in use. If the exporter server is active and another pod starts using the PVC, the exporter server will be terminated until the PVC is not in use anymore.
The VirtualMachineExport CR will contain a status with internal and external links to the export service. The internal links are only valid inside the cluster, and the external links are valid for external access through an Ingress or Route. The cert
field will contain the CA that signed the certificate of the export server for internal links, or the CA that signed the Route or Ingress.
The following is an example of exporting a PVC that contains a KubeVirt disk image. The controller determines if the PVC contains a kubevirt disk by checking if there is a special annotation on the PVC, or if there is a DataVolume ownerReference on the PVC, or if the PVC has a volumeMode of block.
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\n namespace: example\nspec:\n source:\n apiGroup: \"\"\n kind: PersistentVolumeClaim\n name: example-pvc\n tokenSecretRef: example-token\nstatus:\n conditions:\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:10:09Z\"\n reason: podReady\n status: \"True\"\n type: Ready\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:09:02Z\"\n reason: pvcBound\n status: \"True\"\n type: PVCReady\n links:\n external:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: raw\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example-disk/disk.img\n - format: gzip\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example-disk/disk.img.gz\n name: example-disk\n internal:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: raw\n url: https://virt-export-example-export.example.svc/volumes/example-disk/disk.img\n - format: gzip\n url: https://virt-export-example-export.example.svc/volumes/example-disk/disk.img.gz\n name: example-disk\n phase: Ready\n serviceName: virt-export-example-export\n
"},{"location":"storage/export_api/#archive-content-type","title":"Archive content-type","text":"Archive content-type is automatically selected if we are unable to determine the PVC contains a KubeVirt disk. The archive will contain all the files that are in the PVC.
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\n namespace: example\nspec:\n source:\n apiGroup: \"\"\n kind: PersistentVolumeClaim\n name: example-pvc\n tokenSecretRef: example-token\nstatus:\n conditions:\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:10:09Z\"\n reason: podReady\n status: \"True\"\n type: Ready\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:09:02Z\"\n reason: pvcBound\n status: \"True\"\n type: PVCReady\n links:\n external:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: dir\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example/dir\n - format: tar.gz\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example/disk.tar.gz\n name: example-disk\n internal:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: dir\n url: https://virt-export-example-export.example.svc/volumes/example/dir\n - format: tar.gz\n url: https://virt-export-example-export.example.svc/volumes/example/disk.tar.gz\n name: example-disk\n phase: Ready\n serviceName: virt-export-example-export\n
"},{"location":"storage/export_api/#manifests","title":"Manifests","text":"The VirtualMachine manifests can be retrieved by accessing the manifests
in the VirtualMachineExport status. The all
type will return the VirtualMachine manifest, any DataVolumes, and a configMap that contains the public CA certificate of the Ingress/Route of the external URL, or the CA of the export server of the internal URL. The auth-header-secret
will be a secret that contains a Containerized Data Importer (CDI) compatible header. This header contains a text version of the export token.
Both internal and external links will contain a manifests
field. If there are no external links, then there will not be any external manifests either. The virtualMachine manifests
field is only available if the source is a VirtualMachine
or VirtualMachineSnapshot
. Exporting a PersistentVolumeClaim
will not generate a Virtual Machine manifest.
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\n namespace: example\nspec:\n source:\n apiGroup: \"\"\n kind: PersistentVolumeClaim\n name: example-pvc\n tokenSecretRef: example-token\nstatus:\n conditions:\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:10:09Z\"\n reason: podReady\n status: \"True\"\n type: Ready\n - lastProbeTime: null\n lastTransitionTime: \"2022-06-21T14:09:02Z\"\n reason: pvcBound\n status: \"True\"\n type: PVCReady\n links:\n external:\n ...\n manifests:\n - type: all\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/external/manifests/all\n - type: auth-header-secret\n url: https://vmexport-proxy.test.net/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/external/manifests/secret\n internal:\n ...\n manifests:\n - type: all\n url: https://virt-export-export-pvc.default.svc/internal/manifests/all\n - type: auth-header-secret\n url: https://virt-export-export-pvc.default.svc/internal/manifests/secret\n phase: Ready\n serviceName: virt-export-example-export\n
"},{"location":"storage/export_api/#format-types","title":"Format types","text":"There are 4 format types that are possible:
Raw and Gzip will be selected if the PVC is determined to be a KubeVirt disk. KubeVirt disks contain a single disk.img file (or are a block device). Dir will return a list of the files in the PVC, to download a specific file you can replace /dir
in the URL with the path and file name. For instance if the PVC contains the file /example/data.txt
you can replace /dir
with /example/data.txt
to download just data.txt file. Or you can use the tar.gz URL to get all the contents of the PVC in a tar file.
The export server certificate is valid for 7 days after which it is rotated by deleting the export server pod and associated secret and generating a new one. If for whatever reason the export server pod dies, the associated secret is also automatically deleted and a new pod and secret are generated. The VirtualMachineExport object status will be automatically updated to reflect the new certificate.
"},{"location":"storage/export_api/#external-link-certificates","title":"External link certificates","text":"The external link certificates are associated with the Ingress/Route that points to the service created by the KubeVirt operator. The CA that signed the Ingress/Route will part of the certificates.
"},{"location":"storage/export_api/#ttl-time-to-live-for-an-export","title":"TTL (Time to live) for an Export","text":"For various reasons (security being one), users should be able to specify a TTL for the VMExport objects that limits the lifetime of an export. This is done via the ttlDuration
field which accepts a k8s duration, which defaults to 2 hours when not specified.
apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n source:\n apiGroup: \"kubevirt.io\"\n kind: VirtualMachine\n name: example-vm\n tokenSecretRef: example-token\n ttlDuration: 1h\n
"},{"location":"storage/export_api/#virtctl-integration-vmexport","title":"virtctl integration: vmexport","text":"The virtctl vmexport
command allows users to interact with the export API in an easy-to-use way.
vmexport
uses two mandatory arguments:
These three functions are:
"},{"location":"storage/export_api/#create","title":"Create","text":"# Creates a VMExport object according to the specified flag.\n\n# The flag should either be:\n\n# --pvc, to specify the name of the pvc to export.\n# --snapshot, to specify the name of the VM snapshot to export.\n# --vm, to specify the name of the Virtual Machine to export.\n\n$ virtctl vmexport create name [flags]\n
"},{"location":"storage/export_api/#delete","title":"Delete","text":"# Deletes the specified VMExport object.\n\n$ virtctl vmexport delete name\n
"},{"location":"storage/export_api/#download","title":"Download","text":"# Downloads a volume from the defined VMExport object.\n\n# The main available flags are:\n\n# --output, mandatory flag to specify the output file.\n# --volume, optional flag to specify the name of the downloadable volume.\n# --vm|--snapshot|--pvc, if specified, are used to create the VMExport object assuming it doesn't exist. The name of the object to export has to be specified.\n# --format, optional flag to specify wether to download the file in compressed (default) or raw format.\n# --port-forward, optional flag to easily download the volume without the need of an ingress or route. Also, the local port can be optionally specified with the --local-port flag.\n\n$ virtctl vmexport download name [flags]\n
By default, the volume will be downloaded in compressed format. Users can specify the desired format (gzip or raw) by using the format
flag, as shown below:
# Downloads a volume from the defined VMExport object and, if necessary, decompresses it.\n$ virtctl vmexport download name --format=raw [flags]\n
"},{"location":"storage/export_api/#ttl-time-to-live","title":"TTL (Time to live)","text":"TTL can also be added when creating a VMExport via virtctl
$ virtctl vmexport create name --ttl=1h\n
For more information about usage and examples:
$ virtctl vmexport --help\n\nExport a VM volume.\n\nUsage:\n virtctl vmexport [flags]\n\nExamples:\n # Create a VirtualMachineExport to export a volume from a virtual machine:\n virtctl vmexport create vm1-export --vm=vm1\n\n # Create a VirtualMachineExport to export a volume from a virtual machine snapshot\n virtctl vmexport create snap1-export --snapshot=snap1\n\n # Create a VirtualMachineExport to export a volume from a PVC\n virtctl vmexport create pvc1-export --pvc=pvc1\n\n # Delete a VirtualMachineExport resource\n virtctl vmexport delete snap1-export\n\n # Download a volume from an already existing VirtualMachineExport (--volume is optional when only one volume is available)\n virtctl vmexport download vm1-export --volume=volume1 --output=disk.img.gz\n\n # Create a VirtualMachineExport and download the requested volume from it\n virtctl vmexport download vm1-export --vm=vm1 --volume=volume1 --output=disk.img.gz\n\nFlags:\n -h, --help help for vmexport\n --insecure When used with the 'download' option, specifies that the http request should be insecure.\n --keep-vme When used with the 'download' option, specifies that the vmexport object should not be deleted after the download finishes.\n --output string Specifies the output path of the volume to be downloaded.\n --pvc string Sets PersistentVolumeClaim as vmexport kind and specifies the PVC name.\n --snapshot string Sets VirtualMachineSnapshot as vmexport kind and specifies the snapshot name.\n --vm string Sets VirtualMachine as vmexport kind and specifies the vm name.\n --volume string Specifies the volume to be downloaded.\n\nUse \"virtctl options\" for a list of global command-line options (applies to all commands).\n
"},{"location":"storage/export_api/#use-cases","title":"Use cases","text":""},{"location":"storage/export_api/#clone-vm-from-one-cluster-to-another-cluster","title":"Clone VM from one cluster to another cluster","text":"If you want to transfer KubeVirt disk images from a source cluster to another target cluster, you can use the VMExport in the source to expose the disks and use Containerized Data Importer (CDI) in the target cluster to import the image into the target cluster. Let's assume we have an Ingress or Route in the source cluster that exposes the export proxy with the following example domain virt-exportproxy-example.example.com
and we have a Virtual Machine in the source cluster with one disk, which looks like this:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-example-datavolume\n name: example-vm\nspec:\n dataVolumeTemplates:\n - metadata:\n creationTimestamp: null\n name: example-dv\n spec:\n storage:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 20Gi\n storageClassName: local\n source:\n registry:\n url: docker://quay.io/containerdisks/centos-stream:9\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-example-datavolume\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: datavolumedisk1\n resources:\n requests:\n memory: 2Gi\n terminationGracePeriodSeconds: 0\n volumes:\n - dataVolume:\n name: example-dv\n name: datavolumedisk1\n
This is a VM that has a DataVolume (DV) example-dv
that is populated from a container disk and we want to export that disk to the target cluster. To export this VM we have to create a token that we can use in the target cluster to get access to the export, or we can let the export controller generate one for us. For example
apiVersion: v1\nkind: Secret\nmetadata:\n name: example-token\nstringData:\n token: 1234567890ab\n
The value of the token is 1234567890ab
hardly a secure token, but it is an example. We can now create a VMExport that looks like this: apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\nspec:\n tokenSecretRef: example-token #optional, if omitted the export controller will generate a token\n source:\n apiGroup: \"kubevirt.io\"\n kind: VirtualMachine\n name: example-vm\n
If the VM is not running the status of the VMExport object will get updated once the export-server pod is running to look something like this: apiVersion: export.kubevirt.io/v1beta1\nkind: VirtualMachineExport\nmetadata:\n name: example-export\n namespace: example\nspec:\n tokenSecretRef: example-token\n source:\n apiGroup: \"kubevirt.io\"\n kind: VirtualMachine\n name: example-vm\nstatus:\n conditions:\n - lastProbeTime: null\n reason: podReady\n status: \"True\"\n type: Ready\n - lastProbeTime: null\n reason: pvcBound\n status: \"True\"\n type: PVCReady\n links:\n external:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: raw\n url: https://virt-exportproxy-example.example.com/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example-dv/disk.img\n - format: gzip\n url: https://virt-exportproxy-example.example.com/api/export.kubevirt.io/v1beta1/namespaces/example/virtualmachineexports/example-export/volumes/example-dv/disk.img.gz\n name: example-disk\n internal:\n cert: |-\n -----BEGIN CERTIFICATE-----\n ...\n -----END CERTIFICATE-----\n volumes:\n - formats:\n - format: raw\n url: https://virt-export-example-export.example.svc/volumes/example-dv/disk.img\n - format: gzip\n url: https://virt-export-example-export.example.svc/volumes/example-dv/disk.img.gz\n name: example-disk\n phase: Ready\n serviceName: virt-export-example-export\n
Note in this example we are in the example
namespace in the source cluster, which is why the internal links domain ends with .example.svc
. The external links are what will be visible to outside of the source cluster, so we can use that for when we import into the target cluster. Now we are ready to import this disk into the target cluster. In order for CDI to import, we will need to provide appropriate yaml that contains the following: - CA cert (as config map) - The token needed to access the disk images in a CDI compatible format - The VM yaml - DataVolume yaml (optional if not part of the VM definition)
virtctl provides an additional argument to the download command called --manifest
that will retrieve the appropriate information from the export server, and either save it to a file with the --output
argument or write to standard out. By default this output will not contain the header secret as it contains the token in plaintext. To get the header secret you specify the --include-secret
argument. The default output format is yaml
but it is possible to get json
output as well.
Assuming there is a running VirtualMachineExport called example-export
and the same namespace exists in the target cluster. The name of the kubeconfig of the target cluster is named kubeconfig-target
, to clone the vm into the target cluster run the following commands:
$ virtctl vmexport download example-export --manifest --include-secret --output=import.yaml\n$ kubectl apply -f import.yaml --kubeconfig=kubeconfig-target\n
The first command generates the yaml and writes it to import.yaml
. The second command applies the generated yaml to the target cluster. It is possible to combine the two commands writing to standard out
with the first command, and piping it into the second command. Use this option if the export token should not be written to a file anywhere. This will create the VM in the target cluster, and provides CDI in the target cluster with everything required to import the disk images.
After the import completes you should be able to start the VM in the target cluster.
"},{"location":"storage/export_api/#download-a-vm-volume-locally-using-virtctl-vmexport","title":"Download a VM volume locally using virtctl vmexport","text":"Several steps from the previous section can be simplified considerably by using the vmexport
command.
Again, let's assume we have an Ingress or Route in our cluster that exposes the export proxy, and that we have a Virtual Machine in the cluster with one disk like this:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n kubevirt.io/vm: vm-example-datavolume\n name: example-vm\nspec:\n dataVolumeTemplates:\n - metadata:\n creationTimestamp: null\n name: example-dv\n spec:\n storage:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 20Gi\n storageClassName: local\n source:\n registry:\n url: docker://quay.io/containerdisks/centos-stream:9\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/vm: vm-example-datavolume\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: datavolumedisk1\n resources:\n requests:\n memory: 2Gi\n terminationGracePeriodSeconds: 0\n volumes:\n - dataVolume:\n name: example-dv\n name: datavolumedisk1\n
Once we meet these requirements, the process of downloading the volume locally can be accomplished by different means:
"},{"location":"storage/export_api/#performing-each-step-separately","title":"Performing each step separately","text":"We can download the volume by performing every single step in a different command. We start by creating the export object:
# We use an arbitrary name for the VMExport object, but specify our VM name in the flag.\n\n$ virtctl vmexport create vmexportname --vm=example-vm\n
Then, we download the volume in the specified output:
# Since our virtual machine only has one volume, there's no need to specify the volume name with the --volume flag.\n\n# After the download, the VMExport object is deleted by default, so we are using the optional --keep-vme flag to delete it manually.\n\n$ virtctl vmexport download vmexportname --output=/tmp/disk.img --keep-vme\n
Lastly, we delete the VMExport object:
$ virtctl vmexport delete vmexportname\n
"},{"location":"storage/export_api/#performing-one-single-step","title":"Performing one single step","text":"All the previous steps can be simplified in one, single command:
# Since we are using a create flag (--vm) with download, the command creates the object assuming the VMExport doesn't exist.\n\n# Also, since we are not using --keep-vme, the VMExport object is deleted after the download.\n\n$ virtctl vmexport download vmexportname --vm=example-vm --output=/tmp/disk.img\n
After the download finishes, we can find our disk in /tmp/disk.img
.
Libguestfs tools are a set of utilities for accessing and modifying VM disk images. The command virtctl guestfs
helps to deploy an interactive container with the libguestfs-tools and the PVC attached to it. This command is particularly useful if the users need to modify, inspect or debug VM disks on a PVC.
$ virtctl guestfs -h\nCreate a pod with libguestfs-tools, mount the pvc and attach a shell to it. The pvc is mounted under the /disks directory inside the pod for filesystem-based pvcs, or as /dev/vda for block-based pvcs\n\nUsage:\n virtctl guestfs [flags]\n\nExamples:\n # Create a pod with libguestfs-tools, mount the pvc and attach a shell to it:\n virtctl guestfs <pvc-name>\n\nFlags:\n -h, --help help for guestfs\n --image string libguestfs-tools container image\n --kvm Use kvm for the libguestfs-tools container (default true)\n --pull-policy string pull policy for the libguestfs image (default \"IfNotPresent\")\n\nUse \"virtctl options\" for a list of global command-line options (applies to all commands).\n
By default virtctl guestfs
sets up kvm
for the interactive container. This considerably speeds up the execution of the libguestfs-tools since they use QEMU. If the cluster doesn't have any kvm supporting nodes, the user must disable kvm by setting the option --kvm=false
. If not set, the libguestfs-tools pod will remain pending because it cannot be scheduled on any node.
The command automatically uses the image exposed by KubeVirt under the http endpoint /apis/subresources.kubevirt.io/<kubevirt-version>/guestfs
, but it can be configured to use a custom image by using the option --image
. Users can also overwrite the pull policy of the image by setting the option pull-policy
.
The command checks if a PVC is used by another pod in which case it will fail. However, once libguestfs-tools has started, the setup doesn't prevent a new pod starting and using the same PVC. The user needs to verify that there are no active virtctl guestfs pods before starting the VM which accesses the same PVC.
Currently, virtctl guestfs
supports only a single PVC. Future versions might support multiple PVCs attached to the interactive pod.
Generally, the user can take advantage of the virtctl guestfs
command for all typical usage of libguestfs-tools. It is strongly recommended to consult the official documentation. This command simply aims to help in configuring the correct containerized environment in the Kubernetes cluster where KubeVirt is installed.
For all the examples, the user has to start the interactive container by referencing the PVC in the virtctl guestfs
command. This will deploy the interactive pod and attach the stdin and stdout.
Example:
$ virtctl guestfs pvc-test\nUse image: registry:5000/kubevirt/libguestfs-tools@sha256:6644792751b2ba9442e06475a809448b37d02d1937dbd15ad8da4d424b5c87dd \nThe PVC has been mounted at /disk \nWaiting for container libguestfs still in pending, reason: ContainerCreating, message: \nWaiting for container libguestfs still in pending, reason: ContainerCreating, message: \nbash-5.0#\n
Once the libguestfs-tools pod has been deployed, the user can access the disk and execute the desired commands. Later, once the user has completed the operations on the disk, simply exit
the container and the pod be will automatically terminated. bash-5.0# virt-cat -a disk.img /etc/os-release \nNAME=Fedora\nVERSION=\"34 (Cloud Edition)\"\nID=fedora\nVERSION_ID=34\nVERSION_CODENAME=\"\"\nPLATFORM_ID=\"platform:f34\"\nPRETTY_NAME=\"Fedora 34 (Cloud Edition)\"\nANSI_COLOR=\"0;38;2;60;110;180\"\nLOGO=fedora-logo-icon\nCPE_NAME=\"cpe:/o:fedoraproject:fedora:34\"\nHOME_URL=\"https://fedoraproject.org/\"\nDOCUMENTATION_URL=\"https://docs.fedoraproject.org/en-US/fedora/34/system-administrators-guide/\"\nSUPPORT_URL=\"https://fedoraproject.org/wiki/Communicating_and_getting_help\"\nBUG_REPORT_URL=\"https://bugzilla.redhat.com/\"\nREDHAT_BUGZILLA_PRODUCT=\"Fedora\"\nREDHAT_BUGZILLA_PRODUCT_VERSION=34\nREDHAT_SUPPORT_PRODUCT=\"Fedora\"\nREDHAT_SUPPORT_PRODUCT_VERSION=34\nPRIVACY_POLICY_URL=\"https://fedoraproject.org/wiki/Legal:PrivacyPolicy\"\nVARIANT=\"Cloud Edition\"\nVARIANT_ID=cloud\n
bash-5.0# virt-customize -a disk.img --run-command 'useradd -m test-user -s /bin/bash' --password 'test-user:password:test-password'\n[ 0.0] Examining the guest ...\n[ 4.1] Setting a random seed\n[ 4.2] Setting the machine ID in /etc/machine-id\n[ 4.2] Running: useradd -m test-user -s /bin/bash\n[ 4.3] Setting passwords\n[ 5.3] Finishing off\n
Run virt-rescue and repair a broken partition or initrd (for example by running dracut)
bash-5.0# virt-rescue -a disk.img\n[...]\nThe virt-rescue escape key is \u2018^]\u2019. Type \u2018^] h\u2019 for help.\n\n------------------------------------------------------------\n\nWelcome to virt-rescue, the libguestfs rescue shell.\n\nNote: The contents of / (root) are the rescue appliance.\nYou have to mount the guest\u2019s partitions under /sysroot\nbefore you can examine them.\n><rescue> fdisk -l\nDisk /dev/sda: 6 GiB, 6442450944 bytes, 12582912 sectors\nDisk model: QEMU HARDDISK \nUnits: sectors of 1 * 512 = 512 bytes\nSector size (logical/physical): 512 bytes / 512 bytes\nI/O size (minimum/optimal): 512 bytes / 512 bytes\nDisklabel type: gpt\nDisk identifier: F8DC0844-9194-4B34-B432-13FA4B70F278\n\nDevice Start End Sectors Size Type\n/dev/sda1 2048 4095 2048 1M BIOS boot\n/dev/sda2 4096 2101247 2097152 1G Linux filesystem\n/dev/sda3 2101248 12580863 10479616 5G Linux filesystem\n\n\nDisk /dev/sdb: 4 GiB, 4294967296 bytes, 8388608 sectors\nDisk model: QEMU HARDDISK \nUnits: sectors of 1 * 512 = 512 bytes\nSector size (logical/physical): 512 bytes / 512 bytes\nI/O size (minimum/optimal): 512 bytes / 512 bytes\n><rescue> mount /dev/sda3 sysroot/\n><rescue> mount /dev/sda2 sysroot/boot\n><rescue> chroot sysroot/\n><rescue> ls boot/\nSystem.map-5.11.12-300.fc34.x86_64\nconfig-5.11.12-300.fc34.x86_64\nefi\ngrub2\ninitramfs-0-rescue-8afb5b540fab48728e48e4196a3a48ee.img\ninitramfs-5.11.12-300.fc34.x86_64.img\nloader\nvmlinuz-0-rescue-8afb5b540fab48728e48e4196a3a48ee\n><rescue> dracut -f boot/initramfs-5.11.12-300.fc34.x86_64.img 5.11.12-300.fc34.x86_64\n[...]\n><rescue> exit # <- exit from chroot\n><rescue> umount sysroot/boot\n><rescue> umount sysroot\n><rescue> exit\n
Install an OS from scratch
bash-5.0# virt-builder centos-8.2 -o disk.img --root-password password:password-test\n[ 1.5] Downloading: http://builder.libguestfs.org/centos-8.2.xz\n######################################################################## 100.0%#=#=# ######################################################################## 100.0%\n[ 58.3] Planning how to build this image\n[ 58.3] Uncompressing\n[ 65.7] Opening the new disk\n[ 70.8] Setting a random seed\n[ 70.8] Setting passwords\n[ 72.0] Finishing off\n Output file: disk.img\n Output size: 6.0G\n Output format: raw\n Total usable space: 5.3G\n Free space: 4.0G (74%)\n
bash-5.0# virt-filesystems -a disk.img --partitions --filesystem --long\nName Type VFS Label MBR Size Parent\n/dev/sda2 filesystem ext4 - - 1023303680 -\n/dev/sda4 filesystem xfs - - 4710203392 -\n/dev/sda1 partition - - - 1048576 /dev/sda\n/dev/sda2 partition - - - 1073741824 /dev/sda\n/dev/sda3 partition - - - 644874240 /dev/sda\n/dev/sda4 partition - - - 4720689152 /dev/sda\n
Currently, it is not possible to resize the xfs filesystem.
"},{"location":"storage/hotplug_volumes/","title":"Hotplug Volumes","text":"KubeVirt now supports hotplugging volumes into a running Virtual Machine Instance (VMI). The volume must be either a block volume or contain a disk image. When a VM that has hotplugged volumes is rebooted, the hotplugged volumes will be attached to the restarted VM. If the volumes are persisted they will become part of the VM spec, and will not be considered hotplugged. If they are not persisted, the volumes will be reattached as hotplugged volumes
"},{"location":"storage/hotplug_volumes/#enabling-hotplug-volume-support","title":"Enabling hotplug volume support","text":"Hotplug volume support must be enabled in the feature gates to be supported. The feature gates field in the KubeVirt CR must be expanded by adding the HotplugVolumes
to it.
In order to hotplug a volume, you must first prepare a volume. This can be done by using a DataVolume (DV). In the example we will use a blank DV in order to add some extra storage to a running VMI
apiVersion: cdi.kubevirt.io/v1beta1\nkind: DataVolume\nmetadata:\n name: example-volume-hotplug\nspec:\n source:\n blank: {}\n storage:\n resources:\n requests:\n storage: 5Gi\n
In this example we are using ReadWriteOnce
accessMode, and the default FileSystem volume mode. Volume hotplugging supports all combinations of block volume mode and ReadWriteMany
/ReadWriteOnce
/ReadOnlyMany
accessModes, if your storage supports the combination."},{"location":"storage/hotplug_volumes/#addvolume","title":"Addvolume","text":"Now lets assume we have started a VMI like the Fedora VMI in examples and the name of the VMI is 'vmi-fedora'. We can add the above blank volume to this running VMI by using the 'addvolume' command available with virtctl
$ virtctl addvolume vmi-fedora --volume-name=example-volume-hotplug\n
This will hotplug the volume into the running VMI, and set the serial of the disk to the volume name. In this example it is set to example-hotplug-volume.
"},{"location":"storage/hotplug_volumes/#why-virtio-scsi","title":"Why virtio-scsi","text":"The bus of hotplug disk is specified as a scsi
disk. Why is it not specified as virtio
instead, like regular disks? The reason is a limitation of virtio
disks that each disk uses a pcie slot in the virtual machine and there is a maximum of 32 slots. This means there is a low limit on the maximum number of disks you can hotplug especially given that other things will also need pcie slots. Another issue is these slots need to be reserved ahead of time. So if the number of hotplugged disks is not known ahead of time, it is impossible to properly reserve the required number of slots. To work around this issue, each VM has a virtio-scsi controller, which allows the use of a scsi
bus for hotplugged disks. This controller allows for hotplugging of over 4 million disks. virtio-scsi
is very close in performance to virtio
You can change the serial of the disk by specifying the --serial parameter, for example:
$ virtctl addvolume vmi-fedora --volume-name=example-volume-hotplug --serial=1234567890\n
The serial will be used in the guest so you can identify the disk inside the guest by the serial. For instance in Fedora the disk by id will contain the serial.
$ virtctl console vmi-fedora\n\nFedora 32 (Cloud Edition)\nKernel 5.6.6-300.fc32.x86_64 on an x86_64 (ttyS0)\n\nSSH host key: SHA256:c8ik1A9F4E7AxVrd6eE3vMNOcMcp6qBxsf8K30oC/C8 (ECDSA)\nSSH host key: SHA256:fOAKptNAH2NWGo2XhkaEtFHvOMfypv2t6KIPANev090 (ED25519)\neth0: 10.244.196.144 fe80::d8b7:51ff:fec4:7099\nvmi-fedora login:fedora\nPassword:fedora\n[fedora@vmi-fedora ~]$ ls /dev/disk/by-id\nscsi-0QEMU_QEMU_HARDDISK_1234567890\n[fedora@vmi-fedora ~]$ \n
As you can see the serial is part of the disk name, so you can uniquely identify it. The format and length of serials are specified according to the libvirt documentation:
If present, this specify serial number of virtual hard drive. For example, it may look like <serial>WD-WMAP9A966149</serial>. Not supported for scsi-block devices, that is those using disk type 'block' using device 'lun' on bus 'scsi'. Since 0.7.1\n\n Note that depending on hypervisor and device type the serial number may be truncated silently. IDE/SATA devices are commonly limited to 20 characters. SCSI devices depending on hypervisor version are limited to 20, 36 or 247 characters.\n\n Hypervisors may also start rejecting overly long serials instead of truncating them in the future so it's advised to avoid the implicit truncation by testing the desired serial length range with the desired device and hypervisor combination.\n
"},{"location":"storage/hotplug_volumes/#supported-disk-types","title":"Supported Disk types","text":"Kubevirt supports hotplugging disk devices of type disk and lun. As with other volumes, using type disk
will expose the hotplugged volume as a regular disk, while using lun
allows additional functionalities like the execution of iSCSI commands.
You can specify the desired type by using the --disk-type parameter, for example:
# Allowed values are lun and disk. If no option is specified, we use disk by default.\n$ virtctl addvolume vmi-fedora --volume-name=example-lun-hotplug --disk-type=lun\n
"},{"location":"storage/hotplug_volumes/#retain-hotplugged-volumes-after-restart","title":"Retain hotplugged volumes after restart","text":"In many cases it is desirable to keep hotplugged volumes after a VM restart. It may also be desirable to be able to unplug these volumes after the restart. The persist
option makes it impossible to unplug the disks after a restart. If you don't specify persist
the default behaviour is to retain hotplugged volumes as hotplugged volumes after a VM restart. This makes the persist
flag mostly obsolete unless you want to make a volume permanent on restart.
In some cases you want a hotplugged volume to become part of the standard disks after a restart of the VM. For instance if you added some permanent storage to the VM. We also assume that the running VMI has a matching VM that defines it specification. You can call the addvolume command with the --persist flag. This will update the VM domain disks section in addition to updating the VMI domain disks. This means that when you restart the VM, the disk is already defined in the VM, and thus in the new VMI.
$ virtctl addvolume vm-fedora --volume-name=example-volume-hotplug --persist\n
In the VM spec this will now show as a new disk
spec:\ndomain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n - disk:\n bus: scsi\n name: example-volume-hotplug\n machine:\n type: \"\"\n
"},{"location":"storage/hotplug_volumes/#removevolume","title":"Removevolume","text":"In addition to hotplug plugging the volume, you can also unplug it by using the 'removevolume' command available with virtctl
$ virtctl removevolume vmi-fedora --volume-name=example-volume-hotplug\n
NOTE You can only unplug volumes that were dynamically added with addvolume, or using the API.
"},{"location":"storage/hotplug_volumes/#volumestatus","title":"VolumeStatus","text":"VMI objects have a new status.VolumeStatus
field. This is an array containing each disk, hotplugged or not. For example, after hotplugging the volume in the addvolume example, the VMI status will contain this:
volumeStatus:\n- name: cloudinitdisk\n target: vdb\n- name: containerdisk\n target: vda\n- hotplugVolume:\n attachPodName: hp-volume-7fmz4\n attachPodUID: 62a7f6bf-474c-4e25-8db5-1db9725f0ed2\n message: Successfully attach hotplugged volume volume-hotplug to VM\n name: example-volume-hotplug\n phase: Ready\n reason: VolumeReady\n target: sda\n
Vda is the container disk that contains the Fedora OS, vdb is the cloudinit disk. As you can see those just contain the name and target used when assigning them to the VM. The target is the value passed to QEMU when specifying the disks. The value is unique for the VM and does NOT represent the naming inside the guest. For instance for a Windows Guest OS the target has no meaning. The same will be true for hotplugged volumes. The target is just a unique identifier meant for QEMU, inside the guest the disk can be assigned a different name. The hotplugVolume has some extra information that regular volume statuses do not have. The attachPodName is the name of the pod that was used to attach the volume to the node the VMI is running on. If this pod is deleted it will also stop the VMI as we cannot guarantee the volume will remain attached to the node. The other fields are similar to conditions and indicate the status of the hot plug process. Once a Volume is ready it can be used by the VM.
"},{"location":"storage/hotplug_volumes/#live-migration","title":"Live Migration","text":"Currently Live Migration is enabled for any VMI that has volumes hotplugged into it.
NOTE However there is a known issue that the migration may fail for VMIs with hotplugged block volumes if the target node uses CPU manager with static policy and runc
prior to version v1.0.0
.
The snapshot.kubevirt.io
API Group defines resources for snapshotting and restoring KubeVirt VirtualMachines
KubeVirt leverages the VolumeSnapshot
functionality of Kubernetes CSI drivers for capturing persistent VirtualMachine
state. So, you should make sure that your VirtualMachine
uses DataVolumes
or PersistentVolumeClaims
backed by a StorageClass
that supports VolumeSnapshots
and a VolumeSnapshotClass
is properly configured for that StorageClass
.
KubeVirt looks for Kubernetes Volume Snapshot related APIs/resources in the v1
version. To make sure that KubeVirt's snapshot controller is able to snapshot the VirtualMachine and referenced volumes as expected, Kubernetes Volume Snapshot APIs must be served from v1
version.
To list VolumeSnapshotClasses
:
kubectl get volumesnapshotclass\n
Make sure the provisioner
property of your StorageClass
matches the driver
property of the VolumeSnapshotClass
Even if you have no VolumeSnapshotClasses
in your cluster, VirtualMachineSnapshots
are not totally useless. They will still backup your VirtualMachine
configuration.
Snapshot/Restore support must be enabled in the feature gates to be supported. The feature gates field in the KubeVirt CR must be expanded by adding the Snapshot
to it.
Snapshotting a virtualMachine is supported for online and offline vms.
When snapshotting a running vm the controller will check for qemu guest agent in the vm. If the agent exists it will freeze the vm filesystems before taking the snapshot and unfreeze after the snapshot. It is recommended to take online snapshots with the guest agent for a better snapshot, if not present a best effort snapshot will be taken.
Note To check if your vm has a qemu-guest-agent check for 'AgentConnected' in the vm status.
There will be an indication in the vmSnapshot status if the snapshot was taken online and with or without guest agent participation.
Note Online snapshot with hotplugged disks is supported, only persistent hotplugged disks will be included in the snapshot.
To snapshot a VirtualMachine
named larry
, apply the following yaml.
apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineSnapshot\nmetadata:\n name: snap-larry\nspec:\n source:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: larry\n
To wait for a snapshot to complete, execute:
kubectl wait vmsnapshot snap-larry --for condition=Ready\n
You can check the vmSnapshot phase in the vmSnapshot status. It can be one of the following:
The vmSnapshot has a default deadline of 5 minutes. If the vmSnapshot has not succeessfully completed before the deadline, it will be marked as Failed. The VM will be unfrozen and the created snapshot content will be cleaned up if necessary. The vmSnapshot object will remain in Failed state until deleted by the user. To change the default deadline add 'FailureDeadline' to the VirtualMachineSnapshot spec with a new value. The allowed format is a duration string which is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as \"300ms\", \"-1.5h\" or \"2h45m\"
apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineSnapshot\nmetadata:\n name: snap-larry\nspec:\n source:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: larry\n failureDeadline: 1m\n
In order to set an infinite deadline you can set it to 0 (not recommended).
"},{"location":"storage/snapshot_restore_api/#restoring-a-virtualmachine","title":"Restoring a VirtualMachine","text":"To restore the VirtualMachine
larry
from VirtualMachineSnapshot
snap-larry
, Stop the VM, wait for it to be stopped and then apply the following yaml.
apiVersion: snapshot.kubevirt.io/v1beta1\nkind: VirtualMachineRestore\nmetadata:\n name: restore-larry\nspec:\n target:\n apiGroup: kubevirt.io\n kind: VirtualMachine\n name: larry\n virtualMachineSnapshotName: snap-larry\n
To wait for a restore to complete, execute:
kubectl wait vmrestore restore-larry --for condition=Ready\n
"},{"location":"storage/snapshot_restore_api/#cleanup","title":"Cleanup","text":"Keep VirtualMachineSnapshots
(and their corresponding VirtualMachineSnapshotContents
) around as long as you may want to restore from them again.
Feel free to delete restore-larry
as it is not needed once the restore is complete.
Once a virtual machine is started you are able to connect to the consoles it exposes. Usually there are two types of consoles:
Note: You need to have virtctl
installed to gain access to the VirtualMachineInstance.
The serial console of a virtual machine can be accessed by using the console
command:
virtctl console testvm\n
"},{"location":"user_workloads/accessing_virtual_machines/#accessing-the-graphical-console-vnc","title":"Accessing the Graphical Console (VNC)","text":"To access the graphical console of a virtual machine the VNC protocol is typically used. This requires remote-viewer
to be installed. Once the tool is installed, you can access the graphical console using:
virtctl vnc testvm\n
If you only want to open a vnc-proxy without executing the remote-viewer
command, it can be accomplished with:
virtctl vnc --proxy-only testvm\n
This would print the port number on your machine where you can manually connect using any VNC viewer.
"},{"location":"user_workloads/accessing_virtual_machines/#debugging-console-access","title":"Debugging console access","text":"If the connection fails, you can use the -v
flag to get more verbose output from both virtctl
and the remote-viewer
tool to troubleshoot the problem.
virtctl vnc testvm -v 4\n
Note: If you are using virtctl via SSH on a remote machine, you need to forward the X session to your machine. Look up the -X and -Y flags of ssh
if you are not familiar with that. As an alternative you can proxy the API server port with SSH to your machine (either direct or in combination with kubectl proxy
).
A common operational pattern used when managing virtual machines is to inject SSH public keys into the virtual machines at boot. This allows automation tools (like Ansible) to provision the virtual machine. It also gives operators a way of gaining secure and passwordless access to a virtual machine.
KubeVirt provides multiple ways to inject SSH public keys into a virtual machine.
In general, these methods fall into two categories: - Static key injection, which places keys on the virtual machine the first time it is booted. - Dynamic key injection, which allows keys to be dynamically updated both at boot and during runtime.
Once a SSH public key is injected into the virtual machine, it can be accessed via virtctl
.
Users creating virtual machines can provide startup scripts to their virtual machines, allowing multiple customization operations.
One option for injecting public SSH keys into a VM is via cloud-init startup script. However, there are more flexible options available.
The virtual machine's access credential API allows statically injecting SSH public keys at startup time independently of the cloud-init user data by placing the SSH public key into a Kubernetes Secret
. This allows keeping the application data in the cloud-init user data separate from the credentials used to access the virtual machine.
A Kubernetes Secret
can be created from an SSH public key like this:
# Place SSH public key into a Secret\nkubectl create secret generic my-pub-key --from-file=key1=id_rsa.pub\n
The Secret
containing the public key is then assigned to a virtual machine using the access credentials API with the noCloud
propagation method.
KubeVirt injects the SSH public key into the virtual machine by using the generated cloud-init metadata instead of the user data. This separates the application user data and user credentials.
Note: The cloud-init userData
is not touched.
# Create a VM referencing the Secret using propagation method noCloud\nkubectl create -f - <<EOF\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: testvm\nspec:\n running: true\n template:\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n accessCredentials:\n - sshPublicKey:\n source:\n secret:\n secretName: my-pub-key\n propagationMethod:\n noCloud: {}\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n name: cloudinitdisk\nEOF\n
"},{"location":"user_workloads/accessing_virtual_machines/#dynamic-ssh-public-key-injection-via-qemu-guest-agent","title":"Dynamic SSH public key injection via qemu-guest-agent","text":"KubeVirt allows the dynamic injection of SSH public keys into a VirtualMachine with the access credentials API.
Utilizing the qemuGuestAgent
propagation method, configured Secrets are attached to a VirtualMachine when the VM is started. This allows for dynamic injection of SSH public keys at runtime by updating the attached Secrets.
Please note that new Secrets cannot be attached to a running VM: You must restart the VM to attach the new Secret.
Note: This requires the qemu-guest-agent to be installed within the guest.
Note: When using qemuGuestAgent propagation, the /home/$USER/.ssh/authorized_keys
file will be owned by the guest agent. Changes to the file not made by the guest agent will be lost.
Note: More information about the motivation behind the access credentials API can be found in the pull request description that introduced the API.
In the example below the Secret
containing the SSH public key is attached to the virtual machine via the access credentials API with the qemuGuestAgent
propagation method. This allows updating the contents of the Secret
at any time, which will result in the changes getting applied to the running virtual machine immediately. The Secret
may also contain multiple SSH public keys.
# Place SSH public key into a secret\nkubectl create secret generic my-pub-key --from-file=key1=id_rsa.pub\n
Now reference this secret in the VirtualMachine
spec with the access credentials API using qemuGuestAgent
propagation.
# Create a VM referencing the Secret using propagation method qemuGuestAgent\nkubectl create -f - <<EOF\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: testvm\nspec:\n running: true\n template:\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n accessCredentials:\n - sshPublicKey:\n source:\n secret:\n secretName: my-pub-key\n propagationMethod:\n qemuGuestAgent:\n users:\n - fedora\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n # Disable SELinux for now, so qemu-guest-agent can write the authorized_keys file\n # The selinux-policy is too restrictive currently, see open bugs:\n # - https://bugzilla.redhat.com/show_bug.cgi?id=1917024\n # - https://bugzilla.redhat.com/show_bug.cgi?id=2028762\n # - https://bugzilla.redhat.com/show_bug.cgi?id=2057310\n bootcmd:\n - setenforce 0\n name: cloudinitdisk\nEOF\n
"},{"location":"user_workloads/accessing_virtual_machines/#accessing-the-vmi-using-virtctl","title":"Accessing the VMI using virtctl","text":"The user can create a websocket backed network tunnel to a port inside the instance by using the virtualmachineinstances/portforward
subresource of the VirtualMachineInstance
.
One use-case for this subresource is to forward SSH traffic into the VirtualMachineInstance
either from the CLI or a web-UI.
To connect to a VirtualMachineInstance
from your local machine, virtctl
provides a lightweight SSH client with the ssh
command, that uses port forwarding. Refer to the command's help for more details.
virtctl ssh\n
To transfer files from or to a VirtualMachineInstance
virtctl
also provides a lightweight SCP client with the scp
command. Its usage is similar to the ssh
command. Refer to the command's help for more details.
virtctl scp\n
"},{"location":"user_workloads/accessing_virtual_machines/#using-virtctl-as-proxy","title":"Using virtctl as proxy","text":"If you prefer to use your local OpenSSH client, there are two ways of doing that in combination with virtctl.
Note: Most of this applies to the virtctl scp
command too.
virtctl ssh
command has a --local-ssh
option. With this option virtctl
wraps the local OpenSSH client transparently to the user. The executed SSH command can be viewed by increasing the verbosity (-v 3
).virtctl ssh --local-ssh -v 3 testvm\n
virtctl port-forward
command provides an option to tunnel a single port to your local stdout/stdin. This allows the command to be used in combination with the OpenSSH client's ProxyCommand
option.ssh -o 'ProxyCommand=virtctl port-forward --stdio=true vmi/testvm.mynamespace 22' fedora@testvm.mynamespace\n
To provide easier access to arbitrary virtual machines you can add the following lines to your SSH config
:
Host vmi/*\n ProxyCommand virtctl port-forward --stdio=true %h %p\nHost vm/*\n ProxyCommand virtctl port-forward --stdio=true %h %p\n
This allows you to simply call ssh user@vmi/testvmi.mynamespace
and your SSH config and virtctl will do the rest. Using this method it becomes easy to set up different identities for different namespaces inside your SSH config
.
This feature can also be used with Ansible to automate configuration of virtual machines running on KubeVirt. You can put the snippet above into its own file (e.g. ~/.ssh/virtctl-proxy-config
) and add the following lines to your .ansible.cfg
:
[ssh_connection]\nssh_args = -F ~/.ssh/virtctl-proxy-config\n
Note that all port forwarding traffic will be sent over the Kubernetes control plane. A high amount of connections and traffic can increase pressure on the API server. If you regularly need a high amount of connections and traffic consider using a dedicated Kubernetes Service
instead.
Create virtual machine and inject SSH public key as explained above
SSH into virtual machine
# Add --local-ssh to transparently use local OpenSSH client\nvirtctl ssh -i id_rsa fedora@testvm\n
or
ssh -o 'ProxyCommand=virtctl port-forward --stdio=true vmi/testvm.mynamespace 22' -i id_rsa fedora@vmi/testvm.mynamespace\n
# Add --local-ssh to transparently use local OpenSSH client\nvirtctl scp -i id_rsa testfile fedora@testvm:/tmp\n
or
scp -o 'ProxyCommand=virtctl port-forward --stdio=true vmi/testvm.mynamespace 22' -i id_rsa testfile fedora@testvm.mynamespace:/tmp\n
"},{"location":"user_workloads/accessing_virtual_machines/#rbac-permissions-for-consolevncssh-access","title":"RBAC permissions for Console/VNC/SSH access","text":""},{"location":"user_workloads/accessing_virtual_machines/#using-default-rbac-cluster-roles","title":"Using default RBAC cluster roles","text":"Every KubeVirt installation starting with version v0.5.1 ships a set of default RBAC cluster roles that can be used to grant users access to VirtualMachineInstances.
The kubevirt.io:admin
and kubevirt.io:edit
cluster roles have console, VNC and SSH respectively port-forwarding access permissions built into them. By binding either of these roles to a user, they will have the ability to use virtctl to access the console, VNC and SSH.
The default KubeVirt cluster roles grant access to more than just the console, VNC and port-forwarding. The ClusterRole
below demonstrates how to craft a custom role, that only allows access to the console, VNC and port-forwarding.
apiVersion: rbac.authorization.k8s.io/v1beta1\nkind: ClusterRole\nmetadata:\n name: allow-console-vnc-port-forward-access\nrules:\n - apiGroups:\n - subresources.kubevirt.io\n resources:\n - virtualmachineinstances/console\n - virtualmachineinstances/vnc\n verbs:\n - get\n - apiGroups:\n - subresources.kubevirt.io\n resources:\n - virtualmachineinstances/portforward\n verbs:\n - update\n
When bound with a ClusterRoleBinding
the ClusterRole
above grants access to virtual machines across all namespaces.
In order to reduce the scope to a single namespace, bind this ClusterRole
using a RoleBinding
that targets a single namespace.
Using KubeVirt should be fairly natural if you are used to working with Kubernetes.
The primary way of using KubeVirt is by working with the KubeVirt kinds in the Kubernetes API:
$ kubectl create -f vmi.yaml\n$ kubectl wait --for=condition=Ready vmis/my-vmi\n$ kubectl get vmis\n$ kubectl delete vmis testvmi\n
The following pages describe how to use and discover the API, manage, and access virtual machines.
"},{"location":"user_workloads/basic_use/#user-interface","title":"User Interface","text":"KubeVirt does not come with a UI, it is only extending the Kubernetes API with virtualization functionality.
"},{"location":"user_workloads/boot_from_external_source/","title":"Booting From External Source","text":"When installing a new guest virtual machine OS, it is often useful to boot directly from a kernel and initrd stored in the host physical machine OS, allowing command line arguments to be passed directly to the installer.
Booting from an external source is supported in Kubevirt starting from version v0.42.0-rc.0. This enables the capability to define a Virtual Machine that will use a custom kernel / initrd binary, with possible custom arguments, during its boot process.
The binaries are provided though a container image. The container is pulled from the container registry and resides on the local node hosting the VMs.
"},{"location":"user_workloads/boot_from_external_source/#use-cases","title":"Use cases","text":"Some use cases for this may be: - For a kernel developer it may be very convenient to launch VMs that are defined to boot from the latest kernel binary that is often being changed. - Initrd can be set with files that need to reside on-memory during all the VM's life-cycle.
"},{"location":"user_workloads/boot_from_external_source/#workflow","title":"Workflow","text":"Defining an external boot source can be done in the following way:
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: ext-kernel-boot-vm\nspec:\n runStrategy: Manual\n template:\n spec:\n domain:\n devices: {}\n firmware:\n kernelBoot:\n container:\n image: vmi_ext_boot/kernel_initrd_binaries_container:latest\n initrdPath: /boot/initramfs-virt\n kernelPath: /boot/vmlinuz-virt\n imagePullPolicy: Always\n imagePullSecret: IfNotPresent\n kernelArgs: console=ttyS0\n resources:\n requests:\n memory: 1Gi\n
Notes:
initrdPath
and kernelPath
define the path for the binaries inside the container.
Kernel and Initrd binaries must be owned by qemu
user & group.
To change ownership: chown qemu:qemu <binary>
when <binary>
is the binary file.
kernelArgs
can only be provided if a kernel binary is provided (i.e. kernelPath
not defined). These arguments will be passed to the default kernel the VM boots from.
imagePullSecret
and imagePullPolicy
are optional
if imagePullPolicy
is Always
and the container image is updated then the VM will be booted into the new kernel when VM restarts
All KubeVirt system-components expose Prometheus metrics at their /metrics
REST endpoint.
You can consult the complete and up-to-date metric list at kubevirt/monitoring.
"},{"location":"user_workloads/component_monitoring/#custom-service-discovery","title":"Custom Service Discovery","text":"Prometheus supports service discovery based on Pods and Endpoints out of the box. Both can be used to discover KubeVirt services.
All Pods which expose metrics are labeled with prometheus.kubevirt.io
and contain a port-definition which is called metrics
. In the KubeVirt release-manifests, the default metrics
port is 8443
.
The above labels and port informations are collected by a Service
called kubevirt-prometheus-metrics
. Kubernetes automatically creates a corresponding Endpoint
with an equal name:
$ kubectl get endpoints -n kubevirt kubevirt-prometheus-metrics -o yaml\napiVersion: v1\nkind: Endpoints\nmetadata:\n labels:\n kubevirt.io: \"\"\n prometheus.kubevirt.io: \"\"\n name: kubevirt-prometheus-metrics\n namespace: kubevirt\nsubsets:\n- addresses:\n - ip: 10.244.0.5\n nodeName: node01\n targetRef:\n kind: Pod\n name: virt-handler-cjzg6\n namespace: kubevirt\n resourceVersion: \"4891\"\n uid: c67331f9-bfcf-11e8-bc54-525500d15501\n - ip: 10.244.0.6\n [...]\n ports:\n - name: metrics\n port: 8443\n protocol: TCP\n
By watching this endpoint for added and removed IPs to subsets.addresses
and appending the metrics
port from subsets.ports
, it is possible to always get a complete list of ready-to-be-scraped Prometheus targets.
The prometheus-operator can make use of the kubevirt-prometheus-metrics
service to automatically create the appropriate Prometheus config.
KubeVirt's virt-operator
checks if the ServiceMonitor
custom resource exists when creating an install strategy for deployment. KubeVirt will automatically create a ServiceMonitor
resource in the monitorNamespace
, as well as an appropriate role and rolebinding in KubeVirt's namespace.
Three settings are exposed in the KubeVirt
custom resource to direct KubeVirt to create these resources correctly:
monitorNamespace
: The namespace that prometheus-operator runs in. Defaults to openshift-monitoring
.
monitorAccount
: The serviceAccount that prometheus-operator runs with. Defaults to prometheus-k8s
.
serviceMonitorNamespace
: The namespace that the serviceMonitor runs in. Defaults to be monitorNamespace
Please note that if you decide to set serviceMonitorNamespace
than this namespace must be included in serviceMonitorNamespaceSelector
field of Prometheus spec.
If the prometheus-operator for a given deployment uses these defaults, then these values can be omitted.
An example of the KubeVirt resource depicting these default values:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nmetadata:\n name: kubevirt\nspec:\n monitorNamespace: openshift-monitoring\n monitorAccount: prometheus-k8s\n
"},{"location":"user_workloads/component_monitoring/#integrating-with-the-okd-cluster-monitoring-operator","title":"Integrating with the OKD cluster-monitoring-operator","text":"After the cluster-monitoring-operator is up and running, KubeVirt will detect the existence of the ServiceMonitor
resource. Because the definition contains the openshift.io/cluster-monitoring
label, it will automatically be picked up by the cluster monitor.
The endpoints report metrics related to the runtime behaviour of the Virtual Machines. All the relevant metrics are prefixed with kubevirt_vmi
.
The metrics have labels that allow to connect to the VMI objects they refer to. At minimum, the labels will expose node
, name
and namespace
of the related VMI object.
For example, reported metrics could look like
kubevirt_vmi_memory_resident_bytes{domain=\"default_vm-test-01\",name=\"vm-test-01\",namespace=\"default\",node=\"node01\"} 2.5595904e+07\nkubevirt_vmi_network_traffic_bytes_total{domain=\"default_vm-test-01\",interface=\"vnet0\",name=\"vm-test-01\",namespace=\"default\",node=\"node01\",type=\"rx\"} 8431\nkubevirt_vmi_network_traffic_bytes_total{domain=\"default_vm-test-01\",interface=\"vnet0\",name=\"vm-test-01\",namespace=\"default\",node=\"node01\",type=\"tx\"} 1835\nkubevirt_vmi_vcpu_seconds_total{domain=\"default_vm-test-01\",id=\"0\",name=\"vm-test-01\",namespace=\"default\",node=\"node01\",state=\"1\"} 19\n
Please note the domain
label in the above example. This label is deprecated and it will be removed in a future release. You should identify the VMI using the node
, namespace
, name
labels instead.
Use the following query to get a counter for all REST call which indicate connection issues:
rest_client_requests_total{code=\"<error>\"}\n
If this counter is continuously increasing, it is an indicator that the corresponding KubeVirt component has general issues to connect to the apiserver
"},{"location":"user_workloads/creating_vms/","title":"Creating VirtualMachines","text":"The virtctl sub command create vm
allows easy creation of VirtualMachine manifests from the command line. It leverages instance types and preferences and inference by default (see Specifying or inferring instance types and preferences) and provides several flags to control details of the created virtual machine.
For example there are flags to specify the name or run strategy of a virtual machine or flags to add volumes to a virtual machine. Instance types and preferences can either be specified directly or it is possible to let KubeVirt infer those from the volume used to boot the virtual machine.
For a full set of flags and their description use the following command:
virtctl create vm -h\n
"},{"location":"user_workloads/creating_vms/#creating-virtualmachines-on-a-cluster","title":"Creating VirtualMachines on a cluster","text":"The output of virtctl create vm
can be piped into kubectl
to directly create a VirtualMachine on a cluster, e.g.:
# Create a VM with name my-vm on the cluster\nvirtctl create vm --name my-vm | kubectl create -f -\nvirtualmachine.kubevirt.io/my-vm created\n
"},{"location":"user_workloads/creating_vms/#creating-instance-types","title":"Creating Instance Types","text":"The virtctl subcommand create instancetype
allows easy creation of an instance type manifest from the command line. The command also provides several flags that can be used to create your desired manifest.
There are two required flags that need to be specified: the number of vCPUs and the amount of memory to be requested. Additionally, there are several optional flags that can be used, such as specifying a list of GPUs for passthrough, choosing the desired IOThreadsPolicy, or simply providing the name of our instance type.
By default, the command creates the cluster-wide resource. If the user wants to create the namespaced version, they need to provide the namespaced flag. The namespace name can be specified by using the namespace flag.
For a complete list of flags and their descriptions, use the following command:
virtctl create instancetype -h\n
"},{"location":"user_workloads/creating_vms/#examples","title":"Examples","text":"Create a manifest for a VirtualMachineClusterInstancetype with the required --cpu and --memory flags
virtctl create instancetype --cpu 2 --memory 256Mi\n
Create a manifest for a VirtualMachineInstancetype with a specified namespace
virtctl create instancetype --cpu 2 --memory 256Mi --namespace my-namespace\n
Create a manifest for a VirtualMachineInstancetype without a specified namespace name
virtctl create instancetype --cpu 2 --memory 256Mi --namespaced\n
"},{"location":"user_workloads/creating_vms/#creating-preferences","title":"Creating Preferences","text":"The virtctl subcommand create preference
allows easy creation of a preference manifest from the command line. This command serves as a starting point to create the basic structure of a manifest, as it does not allow specifying all of the options that are supported in preferences.
The current set of flags allows us, for example, to specify the preferred CPU topology, machine type or a storage class.
By default, the command creates the cluster-wide resource. If the user wants to create the namespaced version, they need to provide the namespaced flag. The namespace name can be specified by using the namespace flag.
For a complete list of flags and their descriptions, use the following command:
virtctl create preference -h\n
"},{"location":"user_workloads/creating_vms/#examples_1","title":"Examples","text":"Create a manifest for a VirtualMachineClusterPreference with a preferred cpu topology
virtctl create preference --cpu-topology preferSockets\n
Create a manifest for a VirtualMachinePreference with a specified namespace
virtctl create preference --namespace my-namespace\n
Create a manifest for a VirtualMachinePreference with the preferred storage class
virtctl create preference --namespaced --volume-storage-class my-storage\n
"},{"location":"user_workloads/creating_vms/#specifying-or-inferring-instance-types-and-preferences","title":"Specifying or inferring instance types and preferences","text":"Instance types and preference can be specified with the appropriate flags, e.g.:
virtctl create vm --instancetype my-instancetype --preference my-preference\n
The type of the instance type or preference (namespaced or cluster scope) can be controlled by prefixing the instance type or preference name with the corresponding CRD name, e.g.:
# Using a cluster scoped instancetype and a namespaced preference\nvirtctl create vm \\\n --instancetype virtualmachineclusterinstancetype/my-instancetype \\\n --preference virtualmachinepreference/my-preference\n
If a prefix was not supplied the cluster scoped resources will be used by default.
To explicitly infer instance types and/or preferences from the volume used to boot the virtual machine add the following flags:
virtctl create vm --infer-instancetype --infer-preference\n
The implicit default is to always try inferring an instancetype and preference from the boot volume. This feature makes use of the IgnoreInferFromVolumeFailure
policy, which suppresses failures on inference of instancetypes and preferences. If one of the above switches was provided explicitly, then the RejectInferFromVolumeFailure
policy is used instead. This way users are made aware of potential issues during the virtual machine creation.
Please note that volumes of different kinds currently have the following fixed boot order regardless of the order their flags were specified on the command line:
If multiple volumes of the same kind were specified their order is determined by the order in which their flags were specified.
"},{"location":"user_workloads/creating_vms/#specifying-cloud-init-user-data","title":"Specifying cloud-init user data","text":"To pass cloud-init user data to virtctl it needs to be encoded into a base64 string. Here is an example how to do it:
# Put your cloud-init user data into a file.\n# This will add an authorized key to the default user.\n# To get the default username read the documentation for the cloud image\n$ cat cloud-init.txt\n#cloud-config\nssh_authorized_keys:\n - ssh-rsa AAAA...\n\n# Base64 encode the contents of the file without line wraps and store it in a variable\n$ CLOUD_INIT_USERDATA=$(base64 -w 0 cloud-init.txt)\n\n# Show the contents of the variable\n$ echo $CLOUD_INIT_USERDATA I2Nsb3VkLWNvbmZpZwpzc2hfYXV0aG9yaXplZF9rZXlzOgogIC0gc3NoLXJzYSBBQUFBLi4uCg==\n
You can now use this variable as an argument to the --cloud-init-user-data
flag:
virtctl create vm --cloud-init-user-data $CLOUD_INIT_USERDATA\n
"},{"location":"user_workloads/creating_vms/#examples_2","title":"Examples","text":"Create a manifest for a VirtualMachine with a random name:
virtctl create vm\n
Create a manifest for a VirtualMachine with a specified name and RunStrategy Always
virtctl create vm --name=my-vm --run-strategy=Always\n
Create a manifest for a VirtualMachine with a specified VirtualMachineClusterInstancetype
virtctl create vm --instancetype=my-instancetype\n
Create a manifest for a VirtualMachine with a specified VirtualMachineInstancetype (namespaced)
virtctl create vm --instancetype=virtualmachineinstancetype/my-instancetype\n
Create a manifest for a VirtualMachine with a specified VirtualMachineClusterPreference
virtctl create vm --preference=my-preference\n
Create a manifest for a VirtualMachine with a specified VirtualMachinePreference (namespaced)
virtctl create vm --preference=virtualmachinepreference/my-preference\n
Create a manifest for a VirtualMachine with an ephemeral containerdisk volume
virtctl create vm --volume-containerdisk=src:my.registry/my-image:my-tag\n
Create a manifest for a VirtualMachine with a cloned DataSource in namespace and specified size
virtctl create vm --volume-datasource=src:my-ns/my-ds,size:50Gi\n
Create a manifest for a VirtualMachine with a cloned DataSource and inferred instancetype and preference
virtctl create vm --volume-datasource=src:my-annotated-ds --infer-instancetype --infer-preference\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and cloned PVC
virtctl create vm --volume-clone-pvc=my-ns/my-pvc\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and directly used PVC
virtctl create vm --volume-pvc=my-pvc\n
Create a manifest for a VirtualMachine with a clone DataSource and a blank volume
virtctl create vm --volume-datasource=src:my-ns/my-ds --volume-blank=size:50Gi\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and cloned DataSource
virtctl create vm --instancetype=my-instancetype --preference=my-preference --volume-datasource=src:my-ds\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and two cloned DataSources (flag can be provided multiple times)
virtctl create vm --instancetype=my-instancetype --preference=my-preference --volume-datasource=src:my-ds1 --volume-datasource=src:my-ds2\n
Create a manifest for a VirtualMachine with a specified VirtualMachineCluster{Instancetype,Preference} and directly used PVC
virtctl create vm --instancetype=my-instancetype --preference=my-preference --volume-pvc=my-pvc\n
"},{"location":"user_workloads/deploy_common_instancetypes/","title":"Deploy common-instancetypes","text":"The kubevirt/common-instancetypes
provide a set of instancetypes and preferences to help create KubeVirt VirtualMachines
.
Beginning with the 1.1 release of KubeVirt, cluster wide resources can be deployed directly through KubeVirt, without another operator. This allows deployment of a set of default instancetypes and preferences along side KubeVirt.
"},{"location":"user_workloads/deploy_common_instancetypes/#enable-automatic-deployment-of-common-instancetypes","title":"Enable automatic deployment of common-instancetypes","text":"To enable the deployment of cluster-wide common-instancetypes through the KubeVirt virt-operator
, the CommonInstancetypesDeploymentGate
feature gate needs to be enabled.
See Activating feature gates on how to enable it.
"},{"location":"user_workloads/deploy_common_instancetypes/#deploy-common-instancetypes-manually","title":"Deploy common-instancetypes manually","text":"For customization purposes or to install namespaced resources, common-instancetypes can also be deployed by hand.
To install all resources provided by the kubevirt/common-instancetypes
project without further customizations, simply apply with kustomize
enabled (-k flag):
$ kubectl apply -k https://github.com/kubevirt/common-instancetypes.git\n
Alternatively, targets for each of the available custom resource types (e.g. namespaced instancetypes) are available.
For example, to deploy VirtualMachineInstancetypes
run the following command:
$ kubectl apply -k https://github.com/kubevirt/common-instancetypes.git/VirtualMachineInstancetypes\n
"},{"location":"user_workloads/guest_agent_information/","title":"Guest Agent information","text":"Guest Agent (GA) is an optional component that can run inside of Virtual Machines. The GA provides plenty of additional runtime information about the running operating system (OS). More technical detail about available GA commands is available here.
"},{"location":"user_workloads/guest_agent_information/#guest-agent-info-in-virtual-machine-status","title":"Guest Agent info in Virtual Machine status","text":"GA presence in the Virtual Machine is signaled with a condition in the VirtualMachineInstance status. The condition tells that the GA is connected and can be used.
GA condition on VirtualMachineInstance
status:\n conditions:\n - lastProbeTime: \"2020-02-28T10:22:59Z\"\n lastTransitionTime: null\n status: \"True\"\n type: AgentConnected\n
When the GA is connected, additional OS information is shown in the status. This information comprises:
Below is the example of the information shown in the VirtualMachineInstance status.
GA info with merged into status
status:\n guestOSInfo:\n id: fedora\n kernelRelease: 4.18.16-300.fc29.x86_64\n kernelVersion: '#1 SMP Sat Oct 20 23:24:08 UTC 2018'\n name: Fedora\n prettyName: Fedora 29 (Cloud Edition)\n version: \"29\"\n versionId: \"29\"\n interfaces:\n - infoSource: domain, guest-agent\n interfaceName: eth0\n ipAddress: 10.244.0.23/24\n ipAddresses:\n - 10.244.0.23/24\n - fe80::858:aff:fef4:17/64\n mac: 0a:58:0a:f4:00:17\n name: default\n
When the Guest Agent is not present in the Virtual Machine, the Guest Agent information is not shown. No error is reported because the Guest Agent is an optional component.
The infoSource field indicates where the info is gathered from. Valid values:
The data shown in the VirtualMachineInstance status are a subset of the information available. The rest of the data is available via the REST API exposed in the Kubernetes kube-api
server.
There are three new subresources added to the VirtualMachineInstance object:
- guestosinfo\n- userlist\n- filesystemlist\n
The whole GA data is returned via guestosinfo
subresource available behind the API endpoint.
/apis/subresources.kubevirt.io/v1/namespaces/{namespace}/virtualmachineinstances/{name}/guestosinfo\n
GuestOSInfo sample data:
{\n \"fsInfo\": {\n \"disks\": [\n {\n \"diskName\": \"vda1\",\n \"fileSystemType\": \"ext4\",\n \"mountPoint\": \"/\",\n \"totalBytes\": 0,\n \"usedBytes\": 0\n }\n ]\n },\n \"guestAgentVersion\": \"2.11.2\",\n \"hostname\": \"testvmi6m5krnhdlggc9mxfsrnhlxqckgv5kqrwcwpgr5mdpv76grrk\",\n \"metadata\": {\n \"creationTimestamp\": null\n },\n \"os\": {\n \"id\": \"fedora\",\n \"kernelRelease\": \"4.18.16-300.fc29.x86_64\",\n \"kernelVersion\": \"#1 SMP Sat Oct 20 23:24:08 UTC 2018\",\n \"machine\": \"x86_64\",\n \"name\": \"Fedora\",\n \"prettyName\": \"Fedora 29 (Cloud Edition)\",\n \"version\": \"29 (Cloud Edition)\",\n \"versionId\": \"29\"\n },\n \"timezone\": \"UTC, 0\"\n}\n
Items FSInfo and UserList are capped to the max capacity of 10 items, as a precaution for VMs with thousands of users.
Full list of Filesystems is available through the subresource filesystemlist
which is available as endpoint.
/apis/subresources.kubevirt.io/v1/namespaces/{namespace}/virtualmachineinstances/{name}/filesystemlist\n
Filesystem sample data:
{\n \"items\": [\n {\n \"diskName\": \"vda1\",\n \"fileSystemType\": \"ext4\",\n \"mountPoint\": \"/\",\n \"totalBytes\": 3927900160,\n \"usedBytes\": 1029201920\n }\n ],\n \"metadata\": {}\n}\n
Full list of the Users is available through the subresource userlist
which is available as endpoint.
/apis/subresources.kubevirt.io/v1/namespaces/{namespace}/virtualmachineinstances/{name}/userlist\n
Userlist sample data:
{\n \"items\": [\n {\n \"loginTime\": 1580467675.876078,\n \"userName\": \"fedora\"\n }\n ],\n \"metadata\": {}\n}\n
User LoginTime is in fractional seconds since epoch time. It is left for the consumer to convert to the desired format.
"},{"location":"user_workloads/guest_operating_system_information/","title":"Guest Operating System Information","text":"Guest operating system identity for the VirtualMachineInstance will be provided by the label kubevirt.io/os
:
metadata:\n name: myvmi\n labels:\n kubevirt.io/os: win2k12r2\n
The kubevirt.io/os
label is based on the short OS identifier from libosinfo database. The following Short IDs are currently supported:
win2k12r2
Microsoft Windows Server 2012 R2
6.3
winnt
https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-2012-r2
"},{"location":"user_workloads/guest_operating_system_information/#use-with-presets","title":"Use with presets","text":"A VirtualMachineInstancePreset representing an operating system with a kubevirt.io/os
label could be applied on any given VirtualMachineInstance that have and match the kubevirt.io/os
label.
Default presets for the OS identifiers above are included in the current release.
"},{"location":"user_workloads/guest_operating_system_information/#windows-server-2012r2-virtualmachineinstancepreset-example","title":"Windows Server 2012R2VirtualMachineInstancePreset
Example","text":"apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: windows-server-2012r2\n selector:\n matchLabels:\n kubevirt.io/os: win2k12r2\nspec:\n domain:\n cpu:\n cores: 2\n resources:\n requests:\n memory: 2G\n features:\n acpi: {}\n apic: {}\n hyperv:\n relaxed: {}\n vapic: {}\n spinlocks:\n spinlocks: 8191\n clock:\n utc: {}\n timer:\n hpet:\n present: false\n pit:\n tickPolicy: delay\n rtc:\n tickPolicy: catchup\n hyperv: {}\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/os: win2k12r2\n name: windows2012r2\nspec:\n terminationGracePeriodSeconds: 0\n domain:\n firmware:\n uuid: 5d307ca9-b3ef-428c-8861-06e72d69f223\n devices:\n disks:\n - name: server2012r2\n disk:\n dev: vda\n volumes:\n - name: server2012r2\n persistentVolumeClaim:\n claimName: my-windows-image\n\nOnce the `VirtualMachineInstancePreset` is applied to the\n`VirtualMachineInstance`, the resulting resource would look like this:\n\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/windows-server-2012r2: kubevirt.io/v1\n labels:\n kubevirt.io/os: win2k12r2\n name: windows2012r2\nspec:\n terminationGracePeriodSeconds: 0\n domain:\n cpu:\n cores: 2\n resources:\n requests:\n memory: 2G\n features:\n acpi: {}\n apic: {}\n hyperv:\n relaxed: {}\n vapic: {}\n spinlocks:\n spinlocks: 8191\n clock:\n utc: {}\n timer:\n hpet:\n present: false\n pit:\n tickPolicy: delay\n rtc:\n tickPolicy: catchup\n hyperv: {}\n firmware:\n uuid: 5d307ca9-b3ef-428c-8861-06e72d69f223\n devices:\n disks:\n - name: server2012r2\n disk:\n dev: vda\n volumes:\n - name: server2012r2\n persistentVolumeClaim:\n claimName: my-windows-image\n
For more information see VirtualMachineInstancePresets
"},{"location":"user_workloads/guest_operating_system_information/#hyperv-optimizations","title":"HyperV optimizations","text":"KubeVirt supports quite a lot of so-called \"HyperV enlightenments\", which are optimizations for Windows Guests. Some of these optimization may require an up to date host kernel support to work properly, or to deliver the maximum performance gains.
KubeVirt can perform extra checks on the hosts before to run Hyper-V enabled VMs, to make sure the host has no known issues with Hyper-V support, properly expose all the required features and thus we can expect optimal performance. These checks are disabled by default for backward compatibility and because they depend on the node-feature-discovery and on extra configuration.
To enable strict host checking, the user may expand the featureGates
field in the KubeVirt CR by adding the HypervStrictCheck
to it.
apiVersion: kubevirt.io/v1\nkind: Kubevirt\nmetadata:\n name: kubevirt\n namespace: kubevirt\nspec:\n ...\n configuration:\n developerConfiguration:\n featureGates:\n - \"HypervStrictCheck\"\n
Alternatively, users can edit an existing kubevirt CR:
kubectl edit kubevirt kubevirt -n kubevirt
...\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - \"HypervStrictCheck\"\n - \"CPUManager\"\n
"},{"location":"user_workloads/hook-sidecar/","title":"Hook Sidecar Container","text":""},{"location":"user_workloads/hook-sidecar/#introduction","title":"Introduction","text":"In KubeVirt, a Hook Sidecar container is a sidecar container (a secondary container that runs along with the main application container within the same Pod) used to apply customizations before the Virtual Machine is initialized. This ability is provided since configurable elements in the VMI specification do not cover all of the libvirt domain XML elements.
The sidecar containers communicate with the main container over a socket with a gRPC protocol. There are two main sidecar hooks:
onDefineDomain
: This hook helps to customize libvirt's XML and return the new XML over gRPC for the VM creation.preCloudInitIso
: This hook helps to customize the cloud-init configuration. It operates on and returns JSON formatted cloud-init data.Sidecar
feature gate","text":"Sidecar
feature gate can be enabled by following the steps mentioned in Activating feature gates.
In case of a development cluster created using kubevirtci, follow the steps mentioned in the developer doc to enable the feature gate.
"},{"location":"user_workloads/hook-sidecar/#sidecar-shim-container-image","title":"Sidecar-shim container image","text":"To run a VM with custom modifications, the sidecar-shim-image takes care of implementing the communication with the main container.
The image contains the sidecar-shim
binary built using sidecar_shim.go
which should be kept as the entrypoint of the container. This binary will search in $PATH
for binaries named after the hook names (e.g onDefineDomain
and preCloudInitIso
) and run them. Users must provide the necessary arguments as command line options (flags).
In the case of onDefineDomain
, the arguments will be the VMI information as JSON string, (e.g --vmi vmiJSON
) and the current domain XML (e.g --domain domainXML
). It outputs the modified domain XML on the standard output.
In the case of preCloudInitIso
, the arguments will be the VMI information as JSON string, (e.g --vmi vmiJSON
) and the CloudInitData (e.g --cloud-init cloudInitJSON
). It outputs the modified CloudInitData (as JSON) on the standard ouput.
Shell or python scripts can be used as alternatives to the binary, by making them available at the expected location (/usr/bin/onDefineDomain
or /usr/bin/preCloudInitIso
depending upon the hook).
A prebuilt image named sidecar-shim
capable of running Shell or Python scripts is shipped as part of KubeVirt releases.
Although a binary doesn't strictly need to be generated from Go code, and a script doesn't strictly need to be one among Shell or Python, for the purpose of this guide, we will use those as examples.
"},{"location":"user_workloads/hook-sidecar/#go-binary","title":"Go binary","text":"Example Go code modifiying the SMBIOS system information can be found in the KubeVirt repo. Binary generated from this code, when available under /usr/bin/ondefinedomain
in the sidecar-shim-image, is run right before VMI creation and the baseboard manufacturer value is modified to reflect what's provided in the smbios.vm.kubevirt.io/baseBoardManufacturer
annotation in VMI spec.
If you pefer writing a shell or python script instead of a Go program, create a Kubernetes ConfigMap and use annotations to make sure the script is run before the VMI creation. The flow would be as below:
hooks.kubevirt.io/hookSidecars
and mention the ConfigMap information in it.apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: my-config-map\ndata:\n my_script.sh: |\n #!/bin/sh\n tempFile=`mktemp --dry-run`\n echo $4 > $tempFile\n sed -i \"s|<baseBoard></baseBoard>|<baseBoard><entry name='manufacturer'>Radical Edward</entry></baseBoard>|\" $tempFile\n cat $tempFile\n
"},{"location":"user_workloads/hook-sidecar/#configmap-with-python-script","title":"ConfigMap with python script","text":"apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: my-config-map\ndata:\n my_script.sh: |\n #!/usr/bin/env python3\n\n import xml.etree.ElementTree as ET\n import sys\n\n def main(s):\n # write to a temporary file\n f = open(\"/tmp/orig.xml\", \"w\")\n f.write(s)\n f.close()\n\n # parse xml from file\n xml = ET.parse(\"/tmp/orig.xml\")\n # get the root element\n root = xml.getroot()\n # find the baseBoard element\n baseBoard = root.find(\"sysinfo\").find(\"baseBoard\")\n\n # prepare new element to be inserted into the xml definition\n element = ET.Element(\"entry\", {\"name\": \"manufacturer\"})\n element.text = \"Radical Edward\"\n # insert the element\n baseBoard.insert(0, element)\n\n # write to a new file\n xml.write(\"/tmp/new.xml\")\n # print file contents to stdout\n f = open(\"/tmp/new.xml\")\n print(f.read())\n f.close()\n\n if __name__ == \"__main__\":\n main(sys.argv[4])\n
After creating one of the above ConfigMap, create the VMI using the manifest in this example. Of importance here is the ConfigMap information stored in the annotations:
annotations:\n hooks.kubevirt.io/hookSidecars: >\n [\n {\n \"args\": [\"--version\", \"v1alpha2\"],\n \"configMap\": {\"name\": \"my-config-map\", \"key\": \"my_script.sh\", \"hookPath\": \"/usr/bin/onDefineDomain\"}\n }\n ]\n
The name
field indicates the name of the ConfigMap on the cluster which contains the script you want to execute. The key
field indicates the key in the ConfigMap which contains the script to be executed. Finally, hookPath
indicates the path where you want the script to be mounted. It could be either of /usr/bin/onDefineDomain
or /usr/bin/preCloudInitIso
depending upon the hook you want to execute. An optional value can be specified with the \"image\"
key if a custom image is needed, if omitted the default Sidecar-shim image built together with the other KubeVirt images will be used. The default Sidecar-shim image, if not override with a custom value, will also be updated as other images as for Updating KubeVirt Workloads.
Whether you used the Go binary or a Shell/Python script from the above examples, you would be able to see the newly created VMI have the modified baseboard manufacturer information. After creating the VMI, verify that it is in the Running
state, and connect to its console and see if the desired changes to baseboard manufacturer get reflected:
# Once the VM is ready, connect to its display and login using name and password \"fedora\"\ncluster/virtctl.sh vnc vmi-with-sidecar-hook-configmap\n\n# Check whether the base board manufacturer value was successfully overwritten\nsudo dmidecode -s baseboard-manufacturer\n
"},{"location":"user_workloads/instancetypes/","title":"Instance types and preferences","text":"FEATURE STATE:
instancetype.kubevirt.io/v1alpha1
(Experimental) as of the v0.56.0
KubeVirt releaseinstancetype.kubevirt.io/v1alpha2
(Experimental) as of the v0.58.0
KubeVirt releaseinstancetype.kubevirt.io/v1beta1
as of the v1.0.0
KubeVirt releaseSee the Version History section for more details.
"},{"location":"user_workloads/instancetypes/#introduction","title":"Introduction","text":"KubeVirt's VirtualMachine
API contains many advanced options for tuning the performance of a VM that goes beyond what typical users need to be aware of. Users have previously been unable to simply define the storage/network they want assigned to their VM and then declare in broad terms what quality of resources and kind of performance characteristics they need for their VM.
Instance types and preferences provide a way to define a set of resource, performance and other runtime characteristics, allowing users to reuse these definitions across multiple VirtualMachines
.
---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachineInstancetype\nmetadata:\n name: example-instancetype\nspec:\n cpu:\n guest: 1\n memory:\n guest: 128Mi\n
KubeVirt provides two CRDs
for instance types, a cluster wide VirtualMachineClusterInstancetype
and a namespaced VirtualMachineInstancetype
. These CRDs
encapsulate the following resource related characteristics of a VirtualMachine
through a shared VirtualMachineInstancetypeSpec
:
CPU
: Required number of vCPUs presented to the guestMemory
: Required amount of memory presented to the guestGPUs
: Optional list of vGPUs to passthroughHostDevices
: Optional list of HostDevices
to passthroughIOThreadsPolicy
: Optional IOThreadsPolicy
to be usedLaunchSecurity
: Optional LaunchSecurity
to be usedAnything provided within an instance type cannot be overridden within the VirtualMachine
. For example, as CPU
and Memory
are both required attributes of an instance type, if a user makes any requests for CPU
or Memory
resources within the underlying VirtualMachine
, the instance type will conflict and the request will be rejected during creation.
---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachinePreference\nmetadata:\n name: example-preference\nspec:\n devices:\n preferredDiskBus: virtio\n preferredInterfaceModel: virtio\n
KubeVirt also provides two further preference based CRDs
, again a cluster wide VirtualMachineClusterPreference
and namespaced VirtualMachinePreference
. These CRDs
encapsulate the preferred value of any remaining attributes of a VirtualMachine
required to run a given workload, again this is through a shared VirtualMachinePreferenceSpec
.
Unlike instance types, preferences only represent the preferred values and as such, they can be overridden by values in the VirtualMachine
provided by the user.
In the example shown below, a user has provided a VirtualMachine
with a disk bus already defined within a DiskTarget
and has also selected a set of preferences with DevicePreference
and preferredDiskBus
, so the user's original choice within the VirtualMachine
and DiskTarget
are used:
$ kubectl apply -f - << EOF\n---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachinePreference\nmetadata:\n name: example-preference-disk-virtio\nspec:\n devices:\n preferredDiskBus: virtio\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: example-preference-user-override\nspec:\n preference:\n kind: VirtualMachinePreference\n name: example-preference-disk-virtio\n running: false\n template:\n spec:\n domain:\n memory:\n guest: 128Mi\n devices:\n disks:\n - disk:\n bus: sata\n name: containerdisk\n - disk: {}\n name: cloudinitdisk\n resources: {}\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/cirros-container-disk-demo:devel\n name: containerdisk\n - cloudInitNoCloud:\n userData: |\n #!/bin/sh\n\n echo 'printed from cloud-init userdata'\n name: cloudinitdisk\nEOF\nvirtualmachinepreference.instancetype.kubevirt.io/example-preference-disk-virtio created\nvirtualmachine.kubevirt.io/example-preference-user-override configured\n\n\n$ virtctl start example-preference-user-override\nVM example-preference-user-override was scheduled to start\n\n# We can see the original request from the user within the VirtualMachine lists `containerdisk` with a `SATA` bus\n$ kubectl get vms/example-preference-user-override -o json | jq .spec.template.spec.domain.devices.disks\n[\n {\n \"disk\": {\n \"bus\": \"sata\"\n },\n \"name\": \"containerdisk\"\n },\n {\n \"disk\": {},\n \"name\": \"cloudinitdisk\"\n }\n]\n\n# This is still the case in the VirtualMachineInstance with the remaining disk using the `preferredDiskBus` from the preference of `virtio`\n$ kubectl get vmis/example-preference-user-override -o json | jq .spec.domain.devices.disks\n[\n {\n \"disk\": {\n \"bus\": \"sata\"\n },\n \"name\": \"containerdisk\"\n },\n {\n \"disk\": {\n \"bus\": \"virtio\"\n },\n \"name\": \"cloudinitdisk\"\n }\n]\n
"},{"location":"user_workloads/instancetypes/#virtualmachine","title":"VirtualMachine","text":"---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: example-vm\nspec:\n instancetype:\n kind: VirtualMachineInstancetype\n name: example-instancetype\n preference:\n kind: VirtualMachinePreference\n name: example-preference\n
The previous instance type and preference CRDs are matched to a given VirtualMachine
through the use of a matcher. Each matcher consists of the following:
Name
(string): Name of the resource being referencedKind
(string): Optional, defaults to the cluster wide CRD kinds of VirtualMachineClusterInstancetype
or VirtualMachineClusterPreference
if not providedRevisionName
(string) : Optional, name of a ControllerRevision
containing a copy of the VirtualMachineInstancetypeSpec
or VirtualMachinePreferenceSpec
taken when the VirtualMachine
is first created. See the Versioning section below for more details on how and why this is captured.InferFromVolume
(string): Optional, see the Inferring defaults from a Volume section below for more details.It is possible to streamline the creation of instance types, preferences, and virtual machines with the usage of the virtctl command-line tool. To read more about it, please see the Creating VirtualMachines.
"},{"location":"user_workloads/instancetypes/#versioning","title":"Versioning","text":"Versioning of these resources is required to ensure the eventual VirtualMachineInstance
created when starting a VirtualMachine
does not change between restarts if any referenced instance type or set of preferences are updated during the lifetime of the VirtualMachine
.
This is currently achieved by using ControllerRevision
to retain a copy of the VirtualMachineInstancetype
or VirtualMachinePreference
at the time the VirtualMachine
is created. A reference to these ControllerRevisions
are then retained in the InstancetypeMatcher
and PreferenceMatcher
within the VirtualMachine
for future use.
$ kubectl apply -f examples/csmall.yaml -f examples/vm-cirros-csmall.yaml\nvirtualmachineinstancetype.instancetype.kubevirt.io/csmall created\nvirtualmachine.kubevirt.io/vm-cirros-csmall created\n\n$ kubectl get vm/vm-cirros-csmall -o json | jq .spec.instancetype\n{\n \"kind\": \"VirtualMachineInstancetype\",\n \"name\": \"csmall\",\n \"revisionName\": \"vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1\"\n}\n\n$ kubectl get controllerrevision/vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1 -o json | jq .\n{\n \"apiVersion\": \"apps/v1\",\n \"data\": {\n \"apiVersion\": \"instancetype.kubevirt.io/v1beta1\",\n \"kind\": \"VirtualMachineInstancetype\",\n \"metadata\": {\n \"creationTimestamp\": \"2022-09-30T12:20:19Z\",\n \"generation\": 1,\n \"name\": \"csmall\",\n \"namespace\": \"default\",\n \"resourceVersion\": \"10303\",\n \"uid\": \"72c3a35b-6e18-487d-bebf-f73c7d4f4a40\"\n },\n \"spec\": {\n \"cpu\": {\n \"guest\": 1\n },\n \"memory\": {\n \"guest\": \"128Mi\"\n }\n }\n },\n \"kind\": \"ControllerRevision\",\n \"metadata\": {\n \"creationTimestamp\": \"2022-09-30T12:20:19Z\",\n \"name\": \"vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1\",\n \"namespace\": \"default\",\n \"ownerReferences\": [\n {\n \"apiVersion\": \"kubevirt.io/v1\",\n \"blockOwnerDeletion\": true,\n \"controller\": true,\n \"kind\": \"VirtualMachine\",\n \"name\": \"vm-cirros-csmall\",\n \"uid\": \"5216527a-1d31-4637-ad3a-b640cb9949a2\"\n }\n ],\n \"resourceVersion\": \"10307\",\n \"uid\": \"a7bc784b-4cea-45d7-8432-15418e1dd7d3\"\n },\n \"revision\": 0\n}\n\n\n$ kubectl delete vm/vm-cirros-csmall\nvirtualmachine.kubevirt.io \"vm-cirros-csmall\" deleted\n\n$ kubectl get controllerrevision/controllerrevision/vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1\nError from server (NotFound): controllerrevisions.apps \"vm-cirros-csmall-csmall-72c3a35b-6e18-487d-bebf-f73c7d4f4a40-1\" not found\n
Users can opt in to moving to a newer generation of an instance type or preference by removing the referenced revisionName
from the appropriate matcher within the VirtualMachine
object. This will result in fresh ControllerRevisions
being captured and used.
The following example creates a VirtualMachine
using an initial version of the csmall instance type before increasing the number of vCPUs provided by the instance type:
$ kubectl apply -f examples/csmall.yaml -f examples/vm-cirros-csmall.yaml\nvirtualmachineinstancetype.instancetype.kubevirt.io/csmall created\nvirtualmachine.kubevirt.io/vm-cirros-csmall created\n\n$ kubectl get vm/vm-cirros-csmall -o json | jq .spec.instancetype\n{\n \"kind\": \"VirtualMachineInstancetype\",\n \"name\": \"csmall\",\n \"revisionName\": \"vm-cirros-csmall-csmall-3e86e367-9cd7-4426-9507-b14c27a08671-1\"\n}\n\n$ virtctl start vm-cirros-csmall\nVM vm-cirros-csmall was scheduled to start\n\n$ kubectl get vmi/vm-cirros-csmall -o json | jq .spec.domain.cpu\n{\n \"cores\": 1,\n \"model\": \"host-model\",\n \"sockets\": 1,\n \"threads\": 1\n}\n\n$ kubectl patch VirtualMachineInstancetype/csmall --type merge -p '{\"spec\":{\"cpu\":{\"guest\":2}}}'\nvirtualmachineinstancetype.instancetype.kubevirt.io/csmall patched\n
In order for this change to be picked up within the VirtualMachine
, we need to stop the running VirtualMachine
and clear the revisionName
referenced by the InstancetypeMatcher
:
$ virtctl stop vm-cirros-csmall\nVM vm-cirros-csmall was scheduled to stop\n\n$ kubectl patch vm/vm-cirros-csmall --type merge -p '{\"spec\":{\"instancetype\":{\"revisionName\":\"\"}}}'\nvirtualmachine.kubevirt.io/vm-cirros-csmall patched\n\n$ kubectl get vm/vm-cirros-csmall -o json | jq .spec.instancetype\n{\n \"kind\": \"VirtualMachineInstancetype\",\n \"name\": \"csmall\",\n \"revisionName\": \"vm-cirros-csmall-csmall-3e86e367-9cd7-4426-9507-b14c27a08671-2\"\n}\n
As you can see above, the InstancetypeMatcher
now references a new ControllerRevision
containing generation 2 of the instance type. We can now start the VirtualMachine
again and see the new number of vCPUs being used by the VirtualMachineInstance
:
$ virtctl start vm-cirros-csmall\nVM vm-cirros-csmall was scheduled to start\n\n$ kubectl get vmi/vm-cirros-csmall -o json | jq .spec.domain.cpu\n{\n \"cores\": 1,\n \"model\": \"host-model\",\n \"sockets\": 2,\n \"threads\": 1\n}\n
"},{"location":"user_workloads/instancetypes/#inferfromvolume","title":"inferFromVolume","text":"The inferFromVolume
attribute of both the InstancetypeMatcher
and PreferenceMatcher
allows a user to request that defaults are inferred from a volume. When requested, KubeVirt will look for the following labels on the underlying PVC
, DataSource
or DataVolume
to determine the default name and kind:
instancetype.kubevirt.io/default-instancetype
instancetype.kubevirt.io/default-instancetype-kind
(optional, defaults to VirtualMachineClusterInstancetype
)instancetype.kubevirt.io/default-preference
instancetype.kubevirt.io/default-preference-kind
(optional, defaults to VirtualMachineClusterPreference
)These values are then written into the appropriate matcher by the mutation webhook and used during validation before the VirtualMachine
is formally accepted.
The validation can be controlled by the value provided to inferFromVolumeFailurePolicy
in either the InstancetypeMatcher
or PreferenceMatcher
of a VirtualMachine
.
The default value of Reject
will cause the request to be rejected on failure to find the referenced Volume
or labels on an underlying resource.
If Ignore
was provided, the respective InstancetypeMatcher
or PreferenceMatcher
will be cleared on a failure instead.
Example with implicit default value of Reject
:
$ kubectl apply -k https://github.com/kubevirt/common-instancetypes.git\n[..]\n$ virtctl image-upload pvc cirros-pvc --size=1Gi --image-path=./cirros-0.5.2-x86_64-disk.img\n[..]\n$ kubectl label pvc/cirros-pvc \\\n instancetype.kubevirt.io/default-instancetype=server.tiny \\\n instancetype.kubevirt.io/default-preference=cirros\n[..]\n$ kubectl apply -f - << EOF\n---\napiVersion: cdi.kubevirt.io/v1beta1\nkind: DataSource\nmetadata:\n name: cirros-datasource\nspec:\n source:\n pvc:\n name: cirros-pvc\n namespace: default\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: cirros\nspec:\n instancetype:\n inferFromVolume: cirros-volume\n preference:\n inferFromVolume: cirros-volume\n running: false\n dataVolumeTemplates:\n - metadata:\n name: cirros-datavolume\n spec:\n storage:\n resources:\n requests:\n storage: 1Gi\n storageClassName: local\n sourceRef:\n kind: DataSource\n name: cirros-datasource\n namespace: default\n template:\n spec:\n domain:\n devices: {}\n volumes:\n - dataVolume:\n name: cirros-datavolume\n name: cirros-volume\nEOF\n[..]\nkubectl get vms/cirros -o json | jq '.spec.instancetype, .spec.preference'\n{\n \"kind\": \"virtualmachineclusterinstancetype\",\n \"name\": \"server.tiny\",\n \"revisionName\": \"cirros-server.tiny-76454433-3d82-43df-a7e5-586e48c71f68-1\"\n}\n{\n \"kind\": \"virtualmachineclusterpreference\",\n \"name\": \"cirros\",\n \"revisionName\": \"cirros-cirros-85823ddc-9e8c-4d23-a94c-143571b5489c-1\"\n}\n
Example with explicit value of Ignore
:
$ virtctl image-upload pvc cirros-pvc --size=1Gi --image-path=./cirros-0.5.2-x86_64-disk.img\n$ kubectl apply -f - << EOF\n---\napiVersion: cdi.kubevirt.io/v1beta1\nkind: DataSource\nmetadata:\n name: cirros-datasource\nspec:\n source:\n pvc:\n name: cirros-pvc\n namespace: default\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n name: cirros\nspec:\n instancetype:\n inferFromVolume: cirros-volume\n inferFromVolumeFailurePolicy: Ignore\n preference:\n inferFromVolume: cirros-volume\n inferFromVolumeFailurePolicy: Ignore\n running: false\n dataVolumeTemplates:\n - metadata:\n name: cirros-datavolume\n spec:\n storage:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 1Gi\n storageClassName: local\n sourceRef:\n kind: DataSource\n name: cirros-datasource\n namespace: default\n template:\n spec:\n domain:\n devices: {}\n volumes:\n - dataVolume:\n name: cirros-datavolume\n name: cirros-volume\nEOF\n[..]\nkubectl get vms/cirros -o json | jq '.spec.instancetype, .spec.preference'\nnull\nnull\n
"},{"location":"user_workloads/instancetypes/#common-instancetypes","title":"common-instancetypes","text":"The kubevirt/common-instancetypes
provide a set of instancetypes and preferences to help create KubeVirt VirtualMachines
.
See Deploy common-instancetypes on how to deploy them.
"},{"location":"user_workloads/instancetypes/#examples","title":"Examples","text":"Various examples are available within the kubevirt
repo under /examples
. The following uses an example VirtualMachine
provided by the containerdisk/fedora
repo and replaces much of the DomainSpec
with the equivalent instance type and preferences:
$ kubectl apply -f - << EOF\n---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachineInstancetype\nmetadata:\n name: cmedium\nspec:\n cpu:\n guest: 1\n memory:\n guest: 1Gi\n---\napiVersion: instancetype.kubevirt.io/v1beta1\nkind: VirtualMachinePreference\nmetadata:\n name: fedora\nspec:\n devices:\n preferredDiskBus: virtio\n preferredInterfaceModel: virtio\n preferredRng: {}\n features:\n preferredAcpi: {}\n preferredSmm: {}\n firmware:\n preferredUseEfi: true\n preferredUseSecureBoot: true \n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n creationTimestamp: null\n name: fedora\nspec:\n instancetype:\n name: cmedium\n kind: virtualMachineInstancetype\n preference:\n name: fedora\n kind: virtualMachinePreference\n runStrategy: Always\n template:\n metadata:\n creationTimestamp: null\n spec:\n domain:\n devices: {}\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n ssh_authorized_keys:\n - ssh-rsa AAAA...\n name: cloudinit\nEOF\n
"},{"location":"user_workloads/instancetypes/#version-history","title":"Version History","text":""},{"location":"user_workloads/instancetypes/#instancetypekubevirtiov1alpha1-experimental","title":"instancetype.kubevirt.io/v1alpha1
(Experimental)","text":"instancetype.kubevirt.io/v1alpha2
(Experimental)","text":"This version captured complete VirtualMachine{Instancetype,ClusterInstancetype,Preference,ClusterPreference}
objects within the created ControllerRevisions
This version is backwardly compatible with instancetype.kubevirt.io/v1alpha1
.
instancetype.kubevirt.io/v1beta1
","text":"Spec.Memory.OvercommitPercent
The following preference attributes have been added:
Spec.CPU.PreferredCPUFeatures
Spec.Devices.PreferredInterfaceMasquerade
Spec.PreferredSubdomain
Spec.PreferredTerminationGracePeriodSeconds
Spec.Requirements
This version is backwardly compatible with instancetype.kubevirt.io/v1alpha1
and instancetype.kubevirt.io/v1alpha2
objects, no modifications are required to existing VirtualMachine{Instancetype,ClusterInstancetype,Preference,ClusterPreference}
or ControllerRevisions
.
As with the migration to kubevirt.io/v1
it is recommend previous users of instancetype.kubevirt.io/v1alpha1
or instancetype.kubevirt.io/v1alpha2
use kube-storage-version-migrator
to upgrade any stored objects to instancetype.kubevirt.io/v1beta1
.
Every VirtualMachineInstance
represents a single virtual machine instance. In general, the management of VirtualMachineInstances is kept similar to how Pods
are managed: Every VM that is defined in the cluster is expected to be running, just like Pods. Deleting a VirtualMachineInstance is equivalent to shutting it down, this is also equivalent to how Pods behave.
In order to start a VirtualMachineInstance, you just need to create a VirtualMachineInstance
object using kubectl
:
$ kubectl create -f vmi.yaml\n
"},{"location":"user_workloads/lifecycle/#listing-virtual-machines","title":"Listing virtual machines","text":"VirtualMachineInstances can be listed by querying for VirtualMachineInstance objects:
$ kubectl get vmis\n
"},{"location":"user_workloads/lifecycle/#retrieving-a-virtual-machine-instance-definition","title":"Retrieving a virtual machine instance definition","text":"A single VirtualMachineInstance definition can be retrieved by getting the specific VirtualMachineInstance object:
$ kubectl get vmis testvmi\n
"},{"location":"user_workloads/lifecycle/#stopping-a-virtual-machine-instance","title":"Stopping a virtual machine instance","text":"To stop the VirtualMachineInstance, you just need to delete the corresponding VirtualMachineInstance
object using kubectl
.
$ kubectl delete -f vmi.yaml\n# OR\n$ kubectl delete vmis testvmi\n
Note: Stopping a VirtualMachineInstance implies that it will be deleted from the cluster. You will not be able to start this VirtualMachineInstance object again.
"},{"location":"user_workloads/lifecycle/#starting-and-stopping-a-virtual-machine","title":"Starting and stopping a virtual machine","text":"Virtual machines, in contrast to VirtualMachineInstances, have a running state. Thus on VM you can define if it should be running, or not. VirtualMachineInstances are, if they are defined in the cluster, always running and consuming resources.
virtctl
is used in order to start and stop a VirtualMachine:
$ virtctl start my-vm\n$ virtctl stop my-vm\n
Note: You can force stop a VM (which is like pulling the power cord, with all its implications like data inconsistencies or [in the worst case] data loss) by
$ virtctl stop my-vm --grace-period 0 --force\n
"},{"location":"user_workloads/lifecycle/#pausing-and-unpausing-a-virtual-machine","title":"Pausing and unpausing a virtual machine","text":"Note: Pausing in this context refers to libvirt's virDomainSuspend
command: \"The process is frozen without further access to CPU resources and I/O but the memory used by the domain at the hypervisor level will stay allocated\"
To pause a virtual machine, you need the virtctl
command line tool. Its pause
command works on either VirtualMachine
s or VirtualMachinesInstance
s:
$ virtctl pause vm testvm\n# OR\n$ virtctl pause vmi testvm\n
Paused VMIs have a Paused
condition in their status:
$ kubectl get vmi testvm -o=jsonpath='{.status.conditions[?(@.type==\"Paused\")].message}'\nVMI was paused by user\n
Unpausing works similar to pausing:
$ virtctl unpause vm testvm\n# OR\n$ virtctl unpause vmi testvm\n
"},{"location":"user_workloads/liveness_and_readiness_probes/","title":"Liveness and Readiness Probes","text":"It is possible to configure Liveness and Readiness Probes in a similar fashion like it is possible to configure Liveness and Readiness Probes on Containers.
Liveness Probes will effectively stop the VirtualMachineInstance if they fail, which will allow higher level controllers, like VirtualMachine or VirtualMachineInstanceReplicaSet to spawn new instances, which will hopefully be responsive again.
Readiness Probes are an indicator for Services and Endpoints if the VirtualMachineInstance is ready to receive traffic from Services. If Readiness Probes fail, the VirtualMachineInstance will be removed from the Endpoints which back services until the probe recovers.
Watchdogs focus on ensuring that an Operating System is still responsive. They complement the probes which are more workload centric. Watchdogs require kernel support from the guest and additional tooling like the commonly used watchdog
binary.
Exec probes are Liveness or Readiness probes specifically intended for VMs. These probes run a command inside the VM and determine the VM ready/live state based on its success. For running commands inside the VMs, the qemu-guest-agent package is used. A command supplied to an exec probe will be wrapped by virt-probe
in the operator and forwarded to the guest.
The following VirtualMachineInstance configures a HTTP Liveness Probe via spec.livenessProbe.httpGet
, which will query port 1500 of the VirtualMachineInstance, after an initial delay of 120 seconds. The VirtualMachineInstance itself installs and runs a minimal HTTP server on port 1500 via cloud-init.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/vm: vmi-fedora-vmi\n name: vmi-fedora\nspec:\n template:\n metadata:\n labels:\n kubevirt.io/domain: vmi-fedora\n kubevirt.io/vm: vmi-fedora\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n livenessProbe:\n initialDelaySeconds: 120\n periodSeconds: 20\n httpGet:\n port: 1500\n timeoutSeconds: 10\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n bootcmd:\n - [\"sudo\", \"dnf\", \"install\", \"-y\", \"nmap-ncat\"]\n - [\"sudo\", \"systemd-run\", \"--unit=httpserver\", \"nc\", \"-klp\", \"1500\", \"-e\", '/usr/bin/echo -e HTTP/1.1 200 OK\\\\nContent-Length: 12\\\\n\\\\nHello World!']\n name: cloudinitdisk\n
"},{"location":"user_workloads/liveness_and_readiness_probes/#define-a-tcp-liveness-probe","title":"Define a TCP Liveness Probe","text":"The following VirtualMachineInstance configures a TCP Liveness Probe via spec.livenessProbe.tcpSocket
, which will query port 1500 of the VirtualMachineInstance, after an initial delay of 120 seconds. The VirtualMachineInstance itself installs and runs a minimal HTTP server on port 1500 via cloud-init.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/vm: vmi-fedora-vmi\n name: vmi-fedora\nspec:\n template:\n metadata:\n labels:\n kubevirt.io/domain: vmi-fedora\n kubevirt.io/vm: vmi-fedora\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n livenessProbe:\n initialDelaySeconds: 120\n periodSeconds: 20\n tcpSocket:\n port: 1500\n timeoutSeconds: 10\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n bootcmd:\n - [\"sudo\", \"dnf\", \"install\", \"-y\", \"nmap-ncat\"]\n - [\"sudo\", \"systemd-run\", \"--unit=httpserver\", \"nc\", \"-klp\", \"1500\", \"-e\", '/usr/bin/echo -e HTTP/1.1 200 OK\\\\nContent-Length: 12\\\\n\\\\nHello World!']\n name: cloudinitdisk\n
"},{"location":"user_workloads/liveness_and_readiness_probes/#define-readiness-probes","title":"Define Readiness Probes","text":"Readiness Probes are configured in a similar way like liveness probes. Instead of spec.livenessProbe
, spec.readinessProbe
needs to be filled:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/vm: vmi-fedora-vmi\n name: vmi-fedora\nspec:\n template:\n metadata:\n labels:\n kubevirt.io/domain: vmi-fedora\n kubevirt.io/vm: vmi-fedora\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n readinessProbe:\n initialDelaySeconds: 120\n periodSeconds: 20\n timeoutSeconds: 10\n failureThreshold: 3\n successThreshold: 3\n httpGet:\n port: 1500\n terminationGracePeriodSeconds: 0\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora:latest\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n bootcmd:\n - [\"sudo\", \"dnf\", \"install\", \"-y\", \"nmap-ncat\"]\n - [\"sudo\", \"systemd-run\", \"--unit=httpserver\", \"nc\", \"-klp\", \"1500\", \"-e\", '/usr/bin/echo -e HTTP/1.1 200 OK\\\\nContent-Length: 12\\\\n\\\\nHello World!']\n name: cloudinitdisk\n
Note that in the case of Readiness Probes, it is also possible to set a failureThreshold
and a successThreashold
to only flip between ready and non-ready state if the probe succeeded or failed multiple times.
Some context is needed to understand the limitations imposed by a dual-stack network configuration on readiness - or liveness - probes. Users must be fully aware that a dual-stack configuration is currently only available when using a masquerade binding type. Furthermore, it must be recalled that accessing a VM using masquerade binding type is performed via the pod IP address; in dual-stack mode, both IPv4 and IPv6 addresses can be used to reach the VM.
Dual-stack networking configurations have a limitation when using HTTP / TCP probes - you cannot probe the VMI by its IPv6 address. The reason for this is the host
field for both the HTTP and TCP probe actions default to the pod's IP address, which is currently always the IPv4 address.
Since the pod's IP address is not known before creating the VMI, it is not possible to pre-provision the probe's host field.
"},{"location":"user_workloads/liveness_and_readiness_probes/#defining-a-watchdog","title":"Defining a Watchdog","text":"A watchdog is a more VM centric approach where the responsiveness of the Operating System is focused on. One can configure the i6300esb
watchdog device:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n special: vmi-with-watchdog\n name: vmi-with-watchdog\nspec:\n domain:\n devices:\n watchdog:\n name: mywatchdog\n i6300esb:\n action: \"poweroff\"\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n machine:\n type: \"\"\n resources:\n requests:\n memory: 1024M\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: quay.io/containerdisks/fedora:latest\n name: containerdisk\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n bootcmd:\n - [\"sudo\", \"dnf\", \"install\", \"-y\", \"busybox\"]\n name: cloudinitdisk\n
The example above configures it with the poweroff
action. It defines what will happen if the OS can't respond anymore. Other possible actions are reset
and shutdown
. The VM in this example will have the device exposed as /dev/watchdog
. This device can then be used by the watchdog
binary. For example, if root executes this command inside the VM:
sudo busybox watchdog -t 2000ms -T 4000ms /dev/watchdog\n
the watchdog will send a heartbeat every two seconds to /dev/watchdog
and after four seconds without a heartbeat the defined action will be executed. In this case a hard poweroff
.
Guest-Agent probes are based on qemu-guest-agent guest-ping
. This will ping the guest and return an error if the guest is not up and running. To easily define this on VM spec, specify guestAgentPing: {}
in VM's spec.template.spec.readinessProbe
. virt-controller
will translate this into a corresponding command wrapped by virt-probe
.
Note: You can only define one of the type of probe, i.e. guest-agent exec or ping probes.
Important: If the qemu-guest-agent is not installed and enabled inside the VM, the probe will fail. Many images don't enable the agent by default so make sure you either run one that does or enable it.
Make sure to provide enough delay and failureThreshold for the VM and the agent to be online.
In the following example the Fedora image does have qemu-guest-agent available by default. Nevertheless, in case qemu-guest-agent is not installed, it will be installed and enabled via cloud-init as shown in the example below. Also, cloud-init assigns the proper SELinux context, i.e. virt_qemu_ga_exec_t, to the /tmp/healthy.txt
file. Otherwise, SELinux will deny the attempts to open the /tmp/healthy.txt
file causing the probe to fail.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n labels:\n kubevirt.io/vm: vmi-guest-probe-vmi\n name: vmi-fedora\nspec:\n template:\n metadata:\n labels:\n kubevirt.io/domain: vmi-guest-probe\n kubevirt.io/vm: vmi-guest-probe\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n - disk:\n bus: virtio\n name: cloudinitdisk\n rng: {}\n resources:\n requests:\n memory: 1024M\n readinessProbe:\n exec:\n command: [\"cat\", \"/tmp/healthy.txt\"]\n failureThreshold: 10\n initialDelaySeconds: 20\n periodSeconds: 10\n timeoutSeconds: 5\n terminationGracePeriodSeconds: 180\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/fedora\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n user: fedora\n chpasswd: { expire: False }\n packages:\n qemu-guest-agent\n runcmd:\n - [\"touch\", \"/tmp/healthy.txt\"]\n - [\"sudo\", \"chcon\", \"-t\", \"virt_qemu_ga_exec_t\", \"/tmp/healthy.txt\"]\n - [\"sudo\", \"systemctl\", \"enable\", \"--now\", \"qemu-guest-agent\"]\n name: cloudinitdisk\n
Note that, in the above example if SELinux is not installed in your container disk image, the command chcon
should be removed from the VM manifest shown below. Otherwise, the chcon
command will fail.
The .status.ready
field will switch to true
indicating that probes are returning successfully:
kubectl wait vmis/vmi-guest-probe --for=condition=Ready --timeout=5m\n
Additionally, the following command can be used inside the VM to watch the incoming qemu-ga commands:
journalctl _COMM=qemu-ga --follow \n
"},{"location":"user_workloads/pool/","title":"VirtualMachinePool","text":"A VirtualMachinePool tries to ensure that a specified number of VirtualMachine replicas and their respective VirtualMachineInstances are in the ready state at any time. In other words, a VirtualMachinePool makes sure that a VirtualMachine or a set of VirtualMachines is always up and ready.
No state is kept and no guarantees are made about the maximum number of VirtualMachineInstance replicas running at any time. For example, the VirtualMachinePool may decide to create new replicas if possibly still running VMs are entering an unknown state.
"},{"location":"user_workloads/pool/#using-virtualmachinepool","title":"Using VirtualMachinePool","text":"The VirtualMachinePool allows us to specify a VirtualMachineTemplate in spec.virtualMachineTemplate
. It consists of ObjectMetadata
in spec.virtualMachineTemplate.metadata
, and a VirtualMachineSpec
in spec.virtualMachineTemplate.spec
. The specification of the virtual machine is equal to the specification of the virtual machine in the VirtualMachine
workload.
spec.replicas
can be used to specify how many replicas are wanted. If unspecified, the default value is 1. This value can be updated anytime. The controller will react to the changes.
spec.selector
is used by the controller to keep track of managed virtual machines. The selector specified there must be able to match the virtual machine labels as specified in spec.virtualMachineTemplate.metadata.labels
. If the selector does not match these labels, or they are empty, the controller will simply do nothing except log an error. The user is responsible for avoiding the creation of other virtual machines or VirtualMachinePools which may conflict with the selector and the template labels.
VirtualMachinePool is part of the Kubevirt API pool.kubevirt.io/v1alpha1
.
The example below shows how to create a simple VirtualMachinePool
:
apiVersion: pool.kubevirt.io/v1alpha1\nkind: VirtualMachinePool\nmetadata:\n name: vm-pool-cirros\nspec:\n replicas: 3\n selector:\n matchLabels:\n kubevirt.io/vmpool: vm-pool-cirros\n virtualMachineTemplate:\n metadata:\n creationTimestamp: null\n labels:\n kubevirt.io/vmpool: vm-pool-cirros\n spec:\n running: true\n template:\n metadata:\n creationTimestamp: null\n labels:\n kubevirt.io/vmpool: vm-pool-cirros\n spec:\n domain:\n devices:\n disks:\n - disk:\n bus: virtio\n name: containerdisk\n resources:\n requests:\n memory: 128Mi\n terminationGracePeriodSeconds: 0\n volumes:\n - containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n name: containerdisk \n
Saving this manifest into vm-pool-cirros.yaml
and submitting it to Kubernetes will create three virtual machines based on the template.
$ kubectl create -f vm-pool-cirros.yaml\nvirtualmachinepool.pool.kubevirt.io/vm-pool-cirros created\n$ kubectl describe vmpool vm-pool-cirros\nName: vm-pool-cirros\nNamespace: default\nLabels: <none>\nAnnotations: <none>\nAPI Version: pool.kubevirt.io/v1alpha1\nKind: VirtualMachinePool\nMetadata:\n Creation Timestamp: 2023-02-09T18:30:08Z\n Generation: 1\n Manager: kubectl-create\n Operation: Update\n Time: 2023-02-09T18:30:08Z\n API Version: pool.kubevirt.io/v1alpha1\n Fields Type: FieldsV1\n fieldsV1:\n f:status:\n .:\n f:labelSelector:\n f:readyReplicas:\n f:replicas:\n Manager: virt-controller\n Operation: Update\n Subresource: status\n Time: 2023-02-09T18:30:44Z\n Resource Version: 6606\n UID: ba51daf4-f99f-433c-89e5-93f39bc9989d\nSpec:\n Replicas: 3\n Selector:\n Match Labels:\n kubevirt.io/vmpool: vm-pool-cirros\n Virtual Machine Template:\n Metadata:\n Creation Timestamp: <nil>\n Labels:\n kubevirt.io/vmpool: vm-pool-cirros\n Spec:\n Running: true\n Template:\n Metadata:\n Creation Timestamp: <nil>\n Labels:\n kubevirt.io/vmpool: vm-pool-cirros\n Spec:\n Domain:\n Devices:\n Disks:\n Disk:\n Bus: virtio\n Name: containerdisk\n Resources:\n Requests:\n Memory: 128Mi\n Termination Grace Period Seconds: 0\n Volumes:\n Container Disk:\n Image: kubevirt/cirros-container-disk-demo:latest\n Name: containerdisk\nStatus:\n Label Selector: kubevirt.io/vmpool=vm-pool-cirros\n Ready Replicas: 2\n Replicas: 3\nEvents:\n Type Reason Age From Message\n ---- ------ ---- ---- -------\n Normal SuccessfulCreate 17s virtualmachinepool-controller Created VM default/vm-pool-cirros-0\n Normal SuccessfulCreate 17s virtualmachinepool-controller Created VM default/vm-pool-cirros-2\n Normal SuccessfulCreate 17s virtualmachinepool-controller Created VM default/vm-pool-cirros-1\n
Replicas
is 3
and Ready Replicas
is 2
. This means that at the moment when showing the status, three Virtual Machines were already created, but only two are running and ready.
Note: This requires KubeVirt 0.59 or newer.
The VirtualMachinePool
supports the scale
subresource. As a consequence it is possible to scale it via kubectl
:
$ kubectl scale vmpool vm-pool-cirros --replicas 5\n
"},{"location":"user_workloads/pool/#removing-a-virtualmachine-from-virtualmachinepool","title":"Removing a VirtualMachine from VirtualMachinePool","text":"It is also possible to remove a VirtualMachine
from its VirtualMachinePool
.
In this scenario, the ownerReferences
needs to be removed from the VirtualMachine
. This can be achieved either by using kubectl edit
or kubectl patch
. Using kubectl patch
it would look like:
kubectl patch vm vm-pool-cirros-0 --type merge --patch '{\"metadata\":{\"ownerReferences\":null}}'\n
Note: You may want to update your VirtualMachine labels as well to avoid impact on selectors.
"},{"location":"user_workloads/pool/#using-the-horizontal-pod-autoscaler","title":"Using the Horizontal Pod Autoscaler","text":"Note: This requires KubeVirt 0.59 or newer.
The HorizontalPodAutoscaler (HPA) can be used with a VirtualMachinePool
. Simply reference it in the spec of the autoscaler:
apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n creationTimestamp: null\n name: vm-pool-cirros\nspec:\n maxReplicas: 10\n minReplicas: 3\n scaleTargetRef:\n apiVersion: pool.kubevirt.io/v1alpha1\n kind: VirtualMachinePool\n name: vm-pool-cirros\n targetCPUUtilizationPercentage: 50\n
or use kubectl autoscale
to define the HPA via the commandline:
$ kubectl autoscale vmpool vm-pool-cirros --min=3 --max=10 --cpu-percent=50\n
"},{"location":"user_workloads/pool/#exposing-a-virtualmachinepool-as-a-service","title":"Exposing a VirtualMachinePool as a Service","text":"A VirtualMachinePool may be exposed as a service. When this is done, one of the VirtualMachine replicas will be picked for the actual delivery of the service.
For example, exposing SSH port (22) as a ClusterIP service:
apiVersion: v1\nkind: Service\nmetadata:\n name: vm-pool-cirros-ssh\nspec:\n type: ClusterIP\n selector:\n kubevirt.io/vmpool: vm-pool-cirros\n ports:\n - protocol: TCP\n port: 2222\n targetPort: 22\n
Saving this manifest into vm-pool-cirros-ssh.yaml
and submitting it to Kubernetes will create the ClusterIP
service listening on port 2222 and forwarding to port 22. See Service Objects for more details.
"},{"location":"user_workloads/pool/#using-persistent-storage","title":"Using Persistent Storage","text":"Note: DataVolumes are part of CDI
Usage of a DataVolumeTemplates
within a spec.virtualMachineTemplate.spec
will result in the creation of unique persistent storage for each VM within a VMPool. The DataVolumeTemplate
name will have the VM's sequential postfix appended to it when the VM is created from the spec.virtualMachineTemplate.spec.dataVolumeTemplates
. This makes each VM a completely unique stateful workload.
By default, any secrets or configMaps references in a spec.virtualMachineTemplate.spec.template
Volume section will be used directly as is, without any modification to the naming. This means if you specify a secret in a CloudInitNoCloud
volume, that every VM instance spawned from the VirtualMachinePool with this volume will get the exact same secret used for their cloud-init user data.
This default behavior can be modified by setting the AppendPostfixToSecretReferences
and AppendPostfixToConfigMapReferences
booleans to true on the VMPool spec. When these booleans are enabled, references to secret and configMap names will have the VM's sequential postfix appended to the secret and configmap name. This allows someone to pre-generate unique per VM secret
and configMap
data for a VirtualMachinePool ahead of time in a way that will be predictably assigned to VMs within the VirtualMachinePool.
FEATURE STATE:
VirtualMachineInstancePresets
are deprecated as of the v0.57.0
release and will be removed in a future release. VirtualMachineInstancePresets
are an extension to general VirtualMachineInstance
configuration behaving much like PodPresets
from Kubernetes. When a VirtualMachineInstance
is created, any applicable VirtualMachineInstancePresets
will be applied to the existing spec for the VirtualMachineInstance
. This allows for re-use of common settings that should apply to multiple VirtualMachineInstances
.
You can describe a VirtualMachineInstancePreset
in a YAML file. For example, the vmi-preset.yaml
file below describes a VirtualMachineInstancePreset
that requests a VirtualMachineInstance
be created with a resource request for 64M of RAM.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: small-qemu\nspec:\n selector:\n matchLabels:\n kubevirt.io/size: small\n domain:\n resources:\n requests:\n memory: 64M\n
VirtualMachineInstancePreset
based on that YAML file: kubectl create -f vmipreset.yaml\n
"},{"location":"user_workloads/presets/#required-fields","title":"Required Fields","text":"As with most Kubernetes resources, a VirtualMachineInstancePreset
requires apiVersion
, kind
and metadata
fields.
Additionally VirtualMachineInstancePresets
also need a spec
section. While not technically required to satisfy syntax, it is strongly recommended to include a Selector
in the spec
section, otherwise a VirtualMachineInstancePreset
will match all VirtualMachineInstances
in a namespace.
KubeVirt uses Kubernetes Labels
and Selectors
to determine which VirtualMachineInstancePresets
apply to a given VirtualMachineInstance
, similarly to how PodPresets
work in Kubernetes. If a setting from a VirtualMachineInstancePreset
is applied to a VirtualMachineInstance
, the VirtualMachineInstance
will be marked with an Annotation upon completion.
Any domain structure can be listed in the spec
of a VirtualMachineInstancePreset
, e.g. Clock, Features, Memory, CPU, or Devices such as network interfaces. All elements of the spec
section of a VirtualMachineInstancePreset
will be applied to the VirtualMachineInstance
.
Once a VirtualMachineInstancePreset
is successfully applied to a VirtualMachineInstance
, the VirtualMachineInstance
will be marked with an annotation to indicate that it was applied. If a conflict occurs while a VirtualMachineInstancePreset
is being applied, that portion of the VirtualMachineInstancePreset
will be skipped.
Any valid Label
can be matched against, but it is suggested that a general rule of thumb is to use os/shortname, e.g. kubevirt.io/os: rhel7
.
If a VirtualMachineInstancePreset
is modified, changes will not be applied to existing VirtualMachineInstances
. This applies to both the Selector
indicating which VirtualMachineInstances
should be matched, and also the Domain
section which lists the settings that should be applied to a VirtualMachine
.
VirtualMachineInstancePresets
use a similar conflict resolution strategy to Kubernetes PodPresets
. If a portion of the domain spec is present in both a VirtualMachineInstance
and a VirtualMachineInstancePreset
and both resources have the identical information, then creation of the VirtualMachineInstance
will continue normally. If however there is a difference between the resources, an Event will be created indicating which DomainSpec
element of which VirtualMachineInstancePreset
was overridden. For example: If both the VirtualMachineInstance
and VirtualMachineInstancePreset
define a CPU
, but use a different number of Cores
, KubeVirt will note the difference.
If any settings from the VirtualMachineInstancePreset
were successfully applied, the VirtualMachineInstance
will be annotated.
In the event that there is a difference between the Domains
of a VirtualMachineInstance
and VirtualMachineInstancePreset
, KubeVirt will create an Event
. kubectl get events
can be used to show all Events
. For example:
$ kubectl get events\n ....\n Events:\n FirstSeen LastSeen Count From SubobjectPath Reason Message\n 2m 2m 1 myvmi.1515bbb8d397f258 VirtualMachineInstance Warning Conflict virtualmachineinstance-preset-controller Unable to apply VirtualMachineInstancePreset 'example-preset': spec.cpu: &{6} != &{4}\n
"},{"location":"user_workloads/presets/#usage","title":"Usage","text":"VirtualMachineInstancePresets
are namespaced resources, so should be created in the same namespace as the VirtualMachineInstances
that will use them:
kubectl create -f <preset>.yaml [--namespace <namespace>]
KubeVirt will determine which VirtualMachineInstancePresets
apply to a Particular VirtualMachineInstance
by matching Labels
. For example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: example-preset\n selector:\n matchLabels:\n kubevirt.io/os: win10\n ...\n
would match any VirtualMachineInstance
in the same namespace with a Label
of flavor: foo
. For example:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nversion: v1\nmetadata:\n name: myvmi\n labels:\n kubevirt.io/os: win10\n ...\n
"},{"location":"user_workloads/presets/#conflicts","title":"Conflicts","text":"When multiple VirtualMachineInstancePresets
match a particular VirtualMachineInstance
, if they specify the same settings within a Domain, those settings must match. If two VirtualMachineInstancePresets
have conflicting settings (e.g. for the number of CPU cores requested), an error will occur, and the VirtualMachineInstance
will enter the Failed
state, and a Warning
event will be emitted explaining which settings of which VirtualMachineInstancePresets
were problematic.
VirtualMachineInstances
","text":"The main use case for VirtualMachineInstancePresets
is to create re-usable settings that can be applied across various machines. Multiple methods are available to match the labels of a VirtualMachineInstance
using selectors.
matchLabels: Each VirtualMachineInstance
can use a specific label shared by all
instances. * matchExpressions: Logical operators for sets can be used to match multiple
labels.
Using matchLabels, the label used in the VirtualMachineInstancePreset
must match one of the labels of the VirtualMachineInstance
:
selector:\n matchLabels:\n kubevirt.io/memory: large\n
would match
metadata:\n labels:\n kubevirt.io/memory: large\n kubevirt.io/os: win10\n
or
metadata:\n labels:\n kubevirt.io/memory: large\n kubevirt.io/os: fedora27\n
Using matchExpressions allows for matching multiple labels of VirtualMachineInstances
without needing to explicity list a label.
selector:\n matchExpressions:\n - {key: kubevirt.io/os, operator: In, values: [fedora27, fedora26]}\n
would match both:
metadata:\n labels:\n kubevirt.io/os: fedora26\n\nmetadata:\n labels:\n kubevirt.io/os: fedora27\n
The Kubernetes documentation has a detailed explanation. Examples are provided below.
"},{"location":"user_workloads/presets/#exclusions","title":"Exclusions","text":"Since VirtualMachineInstancePresets
use Selectors
that indicate which VirtualMachineInstances
their settings should apply to, there needs to exist a mechanism by which VirtualMachineInstances
can opt out of VirtualMachineInstancePresets
altogether. This is done using an annotation:
kind: VirtualMachineInstance\nversion: v1\nmetadata:\n name: myvmi\n annotations:\n virtualmachineinstancepresets.admission.kubevirt.io/exclude: \"true\"\n ...\n
"},{"location":"user_workloads/presets/#examples","title":"Examples","text":""},{"location":"user_workloads/presets/#simple-virtualmachineinstancepreset-example","title":"Simple VirtualMachineInstancePreset
Example","text":"apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nversion: v1\nmetadata:\n name: example-preset\nspec:\n selector:\n matchLabels:\n kubevirt.io/os: win10\n domain:\n features:\n acpi: {}\n apic: {}\n hyperv:\n relaxed: {}\n vapic: {}\n spinlocks:\n spinlocks: 8191\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nversion: v1\nmetadata:\n name: myvmi\n labels:\n kubevirt.io/os: win10\nspec:\n domain:\n firmware:\n uuid: c8f99fc8-20f5-46c4-85e5-2b841c547cef\n
Once the VirtualMachineInstancePreset
is applied to the VirtualMachineInstance
, the resulting resource would look like this:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/example-preset: kubevirt.io/v1\n labels:\n kubevirt.io/os: win10\n kubevirt.io/nodeName: master\n name: myvmi\n namespace: default\nspec:\n domain:\n devices: {}\n features:\n acpi:\n enabled: true\n apic:\n enabled: true\n hyperv:\n relaxed:\n enabled: true\n spinlocks:\n enabled: true\n spinlocks: 8191\n vapic:\n enabled: true\n firmware:\n uuid: c8f99fc8-20f5-46c4-85e5-2b841c547cef\n machine:\n type: q35\n resources:\n requests:\n memory: 8Mi\n
"},{"location":"user_workloads/presets/#conflict-example","title":"Conflict Example","text":"This is an example of a merge conflict. In this case both the VirtualMachineInstance
and VirtualMachineInstancePreset
request different number of CPU's.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nversion: v1\nmetadata:\n name: example-preset\nspec:\n selector:\n matchLabels:\n kubevirt.io/flavor: default-features\n domain:\n cpu:\n cores: 4\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nversion: v1\nmetadata:\n name: myvmi\n labels:\n kubevirt.io/flavor: default-features\nspec:\n domain:\n cpu:\n cores: 6\n
In this case the VirtualMachineInstance
Spec will remain unmodified. Use kubectl get events
to show events.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n generation: 0\n labels:\n kubevirt.io/flavor: default-features\n name: myvmi\n namespace: default\nspec:\n domain:\n cpu:\n cores: 6\n devices: {}\n machine:\n type: \"\"\n resources: {}\nstatus: {}\n
Calling kubectl get events
would have a line like:
2m 2m 1 myvmi.1515bbb8d397f258 VirtualMachineInstance Warning Conflict virtualmachineinstance-preset-controller Unable to apply VirtualMachineInstancePreset example-preset: spec.cpu: &{6} != &{4}\n
"},{"location":"user_workloads/presets/#matching-multiple-virtualmachineinstances-using-matchlabels","title":"Matching Multiple VirtualMachineInstances Using MatchLabels","text":"These VirtualMachineInstances
have multiple labels, one that is unique and one that is shared.
Note: This example breaks from the convention of using os-shortname as a Label
for demonstration purposes.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: twelve-cores\nspec:\n selector:\n matchLabels:\n kubevirt.io/cpu: dodecacore\n domain:\n cpu:\n cores: 12\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: windows-10\n labels:\n kubevirt.io/os: win10\n kubevirt.io/cpu: dodecacore\nspec:\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: windows-7\n labels:\n kubevirt.io/os: win7\n kubevirt.io/cpu: dodecacore\nspec:\n terminationGracePeriodSeconds: 0\n
Adding this VirtualMachineInstancePreset
and these VirtualMachineInstances
will result in:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/twelve-cores: kubevirt.io/v1\n labels:\n kubevirt.io/cpu: dodecacore\n kubevirt.io/os: win10\n name: windows-10\nspec:\n domain:\n cpu:\n cores: 12\n devices: {}\n resources:\n requests:\n memory: 4Gi\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/twelve-cores: kubevirt.io/v1\n labels:\n kubevirt.io/cpu: dodecacore\n kubevirt.io/os: win7\n name: windows-7\nspec:\n domain:\n cpu:\n cores: 12\n devices: {}\n resources:\n requests:\n memory: 4Gi\n terminationGracePeriodSeconds: 0\n
"},{"location":"user_workloads/presets/#matching-multiple-virtualmachineinstances-using-matchexpressions","title":"Matching Multiple VirtualMachineInstances Using MatchExpressions","text":"This VirtualMachineInstancePreset
has a matchExpression that will match two labels: kubevirt.io/os: win10
and kubevirt.io/os: win7
.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstancePreset\nmetadata:\n name: windows-vmis\nspec:\n selector:\n matchExpressions:\n - {key: kubevirt.io/os, operator: In, values: [win10, win7]}\n domain:\n resources:\n requests:\n memory: 128M\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: smallvmi\n labels:\n kubevirt.io/os: win10\nspec:\n terminationGracePeriodSeconds: 60\n---\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: largevmi\n labels:\n kubevirt.io/os: win7\nspec:\n terminationGracePeriodSeconds: 120\n
Applying the preset to both VM's will result in:
apiVersion: v1\nitems:\n- apiVersion: kubevirt.io/v1\n kind: VirtualMachineInstance\n metadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/windows-vmis: kubevirt.io/v1\n labels:\n kubevirt.io/os: win7\n name: largevmi\n spec:\n domain:\n resources:\n requests:\n memory: 128M\n terminationGracePeriodSeconds: 120\n- apiVersion: kubevirt.io/v1\n kind: VirtualMachineInstance\n metadata:\n annotations:\n presets.virtualmachineinstances.kubevirt.io/presets-applied: kubevirt.io/v1\n virtualmachineinstancepreset.kubevirt.io/windows-vmis: kubevirt.io/v1\n labels:\n kubevirt.io/os: win10\n name: smallvmi\n spec:\n domain:\n resources:\n requests:\n memory: 128M\n terminationGracePeriodSeconds: 60\n
"},{"location":"user_workloads/replicaset/","title":"VirtualMachineInstanceReplicaSet","text":"A VirtualMachineInstanceReplicaSet tries to ensures that a specified number of VirtualMachineInstance replicas are running at any time. In other words, a VirtualMachineInstanceReplicaSet makes sure that a VirtualMachineInstance or a homogeneous set of VirtualMachineInstances is always up and ready. It is very similar to a Kubernetes ReplicaSet.
No state is kept and no guarantees about the maximum number of VirtualMachineInstance replicas which are up are given. For example, the VirtualMachineInstanceReplicaSet may decide to create new replicas if possibly still running VMs are entering an unknown state.
"},{"location":"user_workloads/replicaset/#using-virtualmachineinstancereplicaset","title":"Using VirtualMachineInstanceReplicaSet","text":"The VirtualMachineInstanceReplicaSet allows us to specify a VirtualMachineInstanceTemplate in spec.template
. It consists of ObjectMetadata
in spec.template.metadata
, and a VirtualMachineInstanceSpec
in spec.template.spec
. The specification of the virtual machine is equal to the specification of the virtual machine in the VirtualMachineInstance
workload.
spec.replicas
can be used to specify how many replicas are wanted. If unspecified, the default value is 1. This value can be updated anytime. The controller will react to the changes.
spec.selector
is used by the controller to keep track of managed virtual machines. The selector specified there must be able to match the virtual machine labels as specified in spec.template.metadata.labels
. If the selector does not match these labels, or they are empty, the controller will simply do nothing except from logging an error. The user is responsible for not creating other virtual machines or VirtualMachineInstanceReplicaSets which conflict with the selector and the template labels.
A VirtualMachineInstanceReplicaSet could be exposed as a service. When this is done, one of the VirtualMachineInstances replicas will be picked for the actual delivery of the service.
For example, exposing SSH port (22) as a ClusterIP service using virtctl on a VirtualMachineInstanceReplicaSet:
$ virtctl expose vmirs vmi-ephemeral --name vmiservice --port 27017 --target-port 22\n
All service exposure options that apply to a VirtualMachineInstance apply to a VirtualMachineInstanceReplicaSet. See Exposing VirtualMachineInstance for more details.
"},{"location":"user_workloads/replicaset/#when-to-use-a-virtualmachineinstancereplicaset","title":"When to use a VirtualMachineInstanceReplicaSet","text":"Note: The base assumption is that referenced disks are read-only or that the VMIs are writing internally to a tmpfs. The most obvious volume sources for VirtualMachineInstanceReplicaSets which KubeVirt supports are referenced below. If other types are used data corruption is possible.
Using VirtualMachineInstanceReplicaSet is the right choice when one wants many identical VMs and does not care about maintaining any disk state after the VMs are terminated.
Volume types which work well in combination with a VirtualMachineInstanceReplicaSet are:
This use-case involves small and fast booting VMs with little provisioning performed during initialization.
In this scenario, migrations are not important. Redistributing VM workloads between Nodes can be achieved simply by deleting managed VirtualMachineInstances which are running on an overloaded Node. The eviction
of such a VirtualMachineInstance can happen by directly deleting the VirtualMachineInstance instance (KubeVirt aware workload redistribution) or by deleting the corresponding Pod where the Virtual Machine runs in (Only Kubernetes aware workload redistribution).
In this use-case one has big and slow booting VMs, and complex or resource intensive provisioning is done during boot. More specifically, the timespan between the creation of a new VM and it entering the ready state is long.
In this scenario, one still does not care about the state, but since re-provisioning VMs is expensive, migrations are important. Workload redistribution between Nodes can be achieved by migrating VirtualMachineInstances to different Nodes. A workload redistributor needs to be aware of KubeVirt and create migrations, instead of evicting
VirtualMachineInstances by deletion.
Note: The simplest form of having a migratable ephemeral VirtualMachineInstance, will be to use local storage based on ContainerDisks
in combination with a file based backing store. However, migratable backing store support has not officially landed yet in KubeVirt and is untested.
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstanceReplicaSet\nmetadata:\n name: testreplicaset\nspec:\n replicas: 3\n selector:\n matchLabels:\n myvmi: myvmi\n template:\n metadata:\n name: test\n labels:\n myvmi: myvmi\n spec:\n domain:\n devices:\n disks:\n - disk:\n name: containerdisk\n resources:\n requests:\n memory: 64M\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n
Saving this manifest into testreplicaset.yaml
and submitting it to Kubernetes will create three virtual machines based on the template. $ kubectl create -f testreplicaset.yaml\nvirtualmachineinstancereplicaset \"testreplicaset\" created\n$ kubectl describe vmirs testreplicaset\nName: testreplicaset\nNamespace: default\nLabels: <none>\nAnnotations: <none>\nAPI Version: kubevirt.io/v1\nKind: VirtualMachineInstanceReplicaSet\nMetadata:\n Cluster Name:\n Creation Timestamp: 2018-01-03T12:42:30Z\n Generation: 0\n Resource Version: 6380\n Self Link: /apis/kubevirt.io/v1/namespaces/default/virtualmachineinstancereplicasets/testreplicaset\n UID: 903a9ea0-f083-11e7-9094-525400ee45b0\nSpec:\n Replicas: 3\n Selector:\n Match Labels:\n Myvmi: myvmi\n Template:\n Metadata:\n Creation Timestamp: <nil>\n Labels:\n Myvmi: myvmi\n Name: test\n Spec:\n Domain:\n Devices:\n Disks:\n Disk:\n Name: containerdisk\n Volume Name: containerdisk\n Resources:\n Requests:\n Memory: 64M\n Volumes:\n Name: containerdisk\n Container Disk:\n Image: kubevirt/cirros-container-disk-demo:latest\nStatus:\n Conditions: <nil>\n Ready Replicas: 2\n Replicas: 3\nEvents:\n Type Reason Age From Message\n ---- ------ ---- ---- -------\n Normal SuccessfulCreate 13s virtualmachineinstancereplicaset-controller Created virtual machine: testh8998\n Normal SuccessfulCreate 13s virtualmachineinstancereplicaset-controller Created virtual machine: testf474w\n Normal SuccessfulCreate 13s virtualmachineinstancereplicaset-controller Created virtual machine: test5lvkd\n
Replicas
is 3
and Ready Replicas
is 2
. This means that at the moment when showing the status, three Virtual Machines were already created, but only two are running and ready.
Note: This requires the CustomResourceSubresources
feature gate to be enabled for clusters prior to 1.11.
The VirtualMachineInstanceReplicaSet
supports the scale
subresource. As a consequence it is possible to scale it via kubectl
:
$ kubectl scale vmirs myvmirs --replicas 5\n
"},{"location":"user_workloads/replicaset/#using-the-horizontal-pod-autoscaler","title":"Using the Horizontal Pod Autoscaler","text":"Note: This requires at cluster newer or equal to 1.11.
The HorizontalPodAutoscaler (HPA) can be used with a VirtualMachineInstanceReplicaSet
. Simply reference it in the spec of the autoscaler:
apiVersion: autoscaling/v1\nkind: HorizontalPodAutoscaler\nmetadata:\n name: myhpa\nspec:\n scaleTargetRef:\n kind: VirtualMachineInstanceReplicaSet\n name: vmi-replicaset-cirros\n apiVersion: kubevirt.io/v1\n minReplicas: 3\n maxReplicas: 10\n targetCPUUtilizationPercentage: 50\n
or use kubectl autoscale
to define the HPA via the commandline:
$ kubectl autoscale vmirs vmi-replicaset-cirros --min=3 --max=10\n
"},{"location":"user_workloads/startup_scripts/","title":"Startup Scripts","text":"KubeVirt supports the ability to assign a startup script to a VirtualMachineInstance instance which is executed automatically when the VM initializes.
These scripts are commonly used to automate injection of users and SSH keys into VMs in order to provide remote access to the machine. For example, a startup script can be used to inject credentials into a VM that allows an Ansible job running on a remote host to access and provision the VM.
Startup scripts are not limited to any specific use case though. They can be used to run any arbitrary script in a VM on boot.
"},{"location":"user_workloads/startup_scripts/#cloud-init","title":"Cloud-init","text":"cloud-init is a widely adopted project used for early initialization of a VM. Used by cloud providers such as AWS and GCP, cloud-init has established itself as the defacto method of providing startup scripts to VMs.
Cloud-init documentation can be found here: Cloud-init Documentation.
KubeVirt supports cloud-init's NoCloud and ConfigDrive datasources which involve injecting startup scripts into a VM instance through the use of an ephemeral disk. VMs with the cloud-init package installed will detect the ephemeral disk and execute custom userdata scripts at boot.
"},{"location":"user_workloads/startup_scripts/#ignition","title":"Ignition","text":"Ignition is an alternative to cloud-init which allows for configuring the VM disk on first boot. You can find the Ignition documentation here. You can also find a comparison between cloud-init and Ignition here.
Ignition can be used with Kubevirt by using the cloudInitConfigDrive
volume.
Sysprep is an automation tool for Windows that automates Windows installation, setup, and custom software provisioning.
The general flow is:
Seal the vm image with the Sysprep tool, for example by running:
%WINDIR%\\system32\\sysprep\\sysprep.exe /generalize /shutdown /oobe /mode:vm\n
Note
We need to make sure the base vm does not restart, which can be done by setting the vm run strategy as RerunOnFailure
.
VM runStrategy:
spec:\n runStrategy: RerunOnFailure\n
More information can be found here:
Note
It is important that there is no answer file detected when the Sysprep Tool is triggered, because Windows Setup searches for answer files at the beginning of each configuration pass and caches it. If that happens, when the OS will start - it will just use the cached answer file, ignoring the one we provide through the Sysprep API. More information can be found here.
Providing an Answer file named autounattend.xml
in an attached media. The answer file can be provided in a ConfigMap or a Secret with the key autounattend.xml
The configuration file can be generated with Windows SIM or it can be specified manually according to the information found here:
Note
There are also many easy to find online tools available for creating an answer file.
KubeVirt supports the cloud-init NoCloud and ConfigDrive data sources which involve injecting startup scripts through the use of a disk attached to the VM.
In order to assign a custom userdata script to a VirtualMachineInstance using this method, users must define a disk and a volume for the NoCloud or ConfigDrive datasource in the VirtualMachineInstance's spec.
"},{"location":"user_workloads/startup_scripts/#data-sources","title":"Data Sources","text":"Under most circumstances users should stick to the NoCloud data source as it is the simplest cloud-init data source. Only if NoCloud is not supported by the cloud-init implementation (e.g. coreos-cloudinit) users should switch the data source to ConfigDrive.
Switching the cloud-init data source to ConfigDrive is as easy as changing the volume type in the VirtualMachineInstance's spec from cloudInitNoCloud
to cloudInitConfigDrive
.
NoCloud data source:
volumes:\n - name: cloudinitvolume\n cloudInitNoCloud:\n userData: \"#cloud-config\"\n
ConfigDrive data source:
volumes:\n - name: cloudinitvolume\n cloudInitConfigDrive:\n userData: \"#cloud-config\"\n
When using the ConfigDrive datasource, the networkData
part has to be in the OpenStack Metadata Service Network format:
spec:\n domain:\n interfaces: \n - name: secondary-net\n bridge: {}\n macAddress: '02:26:19:00:00:30'\n model: virtio\n networks: \n - multus:\n networkName: my-ns/my-net\n name: secondary-net\n volumes:\n - name: cloudinitvolume\n cloudInitConfigDrive:\n networkData: |\n {\"links\":[{\"id\":\"enp2s0\",\"type\":\"phy\",\"ethernet_mac_address\":\"02:26:19:00:00:30\"}],\"networks\":[{\"id\":\"NAD1\",\"type\":\"ipv4\",\"link\":\"enp2s0\",\"ip_address\":\"10.184.0.244\",\"netmask\":\"255.255.240.0\",\"routes\":[{\"network\":\"0.0.0.0\",\"netmask\":\"0.0.0.0\",\"gateway\":\"23.253.157.1\"}],\"network_id\":\"\"}],\"services\":[]}\n userData: \"#cloud-config\"\n
Note The MAC address of the secondary interface should be predefined and identical in the network interface and the cloud-init networkData.
See the examples below for more complete cloud-init examples.
"},{"location":"user_workloads/startup_scripts/#cloud-init-user-data-as-clear-text","title":"Cloud-init user-data as clear text","text":"In the example below, a SSH key is stored in the cloudInitNoCloud Volume's userData field as clean text. There is a corresponding disks entry that references the cloud-init volume and assigns it to the VM's device.
# Create a VM manifest with the startup script\n# a cloudInitNoCloud volume's userData field.\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userData: |\n #cloud-config\n ssh_authorized_keys:\n - ssh-rsa AAAAB3NzaK8L93bWxnyp test@test.com\n\nEND\n\n# Post the Virtual Machine spec to KubeVirt.\n\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-user-data-as-base64-string","title":"Cloud-init user-data as base64 string","text":"In the example below, a simple bash script is base64 encoded and stored in the cloudInitNoCloud Volume's userDataBase64 field. There is a corresponding disks entry that references the cloud-init volume and assigns it to the VM's device.
Users also have the option of storing the startup script in a Kubernetes Secret and referencing the Secret in the VM's spec. Examples further down in the document illustrate how that is done.
# Create a simple startup script\n\ncat << END > startup-script.sh\n#!/bin/bash\necho \"Hi from startup script!\"\nEND\n\n# Create a VM manifest with the startup script base64 encoded into\n# a cloudInitNoCloud volume's userDataBase64 field.\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userDataBase64: $(cat startup-script.sh | base64 -w0)\nEND\n\n# Post the Virtual Machine spec to KubeVirt.\n\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-userdata-as-k8s-secret","title":"Cloud-init UserData as k8s Secret","text":"Users who wish to not store the cloud-init userdata directly in the VirtualMachineInstance spec have the option to store the userdata into a Kubernetes Secret and reference that Secret in the spec.
Multiple VirtualMachineInstance specs can reference the same Kubernetes Secret containing cloud-init userdata.
Below is an example of how to create a Kubernetes Secret containing a startup script and reference that Secret in the VM's spec.
# Create a simple startup script\n\ncat << END > startup-script.sh\n#!/bin/bash\necho \"Hi from startup script!\"\nEND\n\n# Store the startup script in a Kubernetes Secret\nkubectl create secret generic my-vmi-secret --from-file=userdata=startup-script.sh\n\n# Create a VM manifest and reference the Secret's name in the cloudInitNoCloud\n# Volume's secretRef field\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/cirros-registry-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n secretRef:\n name: my-vmi-secret\nEND\n\n# Post the VM\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#injecting-ssh-keys-with-cloud-inits-cloud-config","title":"Injecting SSH keys with Cloud-init's Cloud-config","text":"In the examples so far, the cloud-init userdata script has been a bash script. Cloud-init has it's own configuration that can handle some common tasks such as user creation and SSH key injection.
More cloud-config examples can be found here: Cloud-init Examples
Below is an example of using cloud-config to inject an SSH key for the default user (fedora in this case) of a Fedora Atomic disk image.
# Create the cloud-init cloud-config userdata.\ncat << END > startup-script\n#cloud-config\npassword: atomic\nchpasswd: { expire: False }\nssh_pwauth: False\nssh_authorized_keys:\n - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6zdgFiLr1uAK7PdcchDd+LseA5fEOcxCCt7TLlr7Mx6h8jUg+G+8L9JBNZuDzTZSF0dR7qwzdBBQjorAnZTmY3BhsKcFr8Gt4KMGrS6r3DNmGruP8GORvegdWZuXgASKVpXeI7nCIjRJwAaK1x+eGHwAWO9Z8ohcboHbLyffOoSZDSIuk2kRIc47+ENRjg0T6x2VRsqX27g6j4DfPKQZGk0zvXkZaYtr1e2tZgqTBWqZUloMJK8miQq6MktCKAS4VtPk0k7teQX57OGwD6D7uo4b+Cl8aYAAwhn0hc0C2USfbuVHgq88ESo2/+NwV4SQcl3sxCW21yGIjAGt4Hy7J fedora@localhost.localdomain\nEND\n\n# Create the VM spec\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: sshvmi\nspec:\n terminationGracePeriodSeconds: 0\n domain:\n resources:\n requests:\n memory: 1024M\n devices:\n disks:\n - name: containerdisk\n disk:\n dev: vda\n - name: cloudinitdisk\n disk:\n dev: vdb\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-atomic-registry-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userDataBase64: $(cat startup-script | base64 -w0)\nEND\n\n# Post the VirtualMachineInstance spec to KubeVirt.\nkubectl create -f my-vmi.yaml\n\n# Connect to VM with passwordless SSH key\nssh -i <insert private key here> fedora@<insert ip here>\n
"},{"location":"user_workloads/startup_scripts/#inject-ssh-key-using-a-custom-shell-script","title":"Inject SSH key using a Custom Shell Script","text":"Depending on the boot image in use, users may have a mixed experience using cloud-init's cloud-config to create users and inject SSH keys.
Below is an example of creating a user and injecting SSH keys for that user using a script instead of cloud-config.
cat << END > startup-script.sh\n#!/bin/bash\nexport NEW_USER=\"foo\"\nexport SSH_PUB_KEY=\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6zdgFiLr1uAK7PdcchDd+LseA5fEOcxCCt7TLlr7Mx6h8jUg+G+8L9JBNZuDzTZSF0dR7qwzdBBQjorAnZTmY3BhsKcFr8Gt4KMGrS6r3DNmGruP8GORvegdWZuXgASKVpXeI7nCIjRJwAaK1x+eGHwAWO9Z8ohcboHbLyffOoSZDSIuk2kRIc47+ENRjg0T6x2VRsqX27g6j4DfPKQZGk0zvXkZaYtr1e2tZgqTBWqZUloMJK8miQq6MktCKAS4VtPk0k7teQX57OGwD6D7uo4b+Cl8aYAAwhn0hc0C2USfbuVHgq88ESo2/+NwV4SQcl3sxCW21yGIjAGt4Hy7J $NEW_USER@localhost.localdomain\"\n\nsudo adduser -U -m $NEW_USER\necho \"$NEW_USER:atomic\" | chpasswd\nsudo mkdir /home/$NEW_USER/.ssh\nsudo echo \"$SSH_PUB_KEY\" > /home/$NEW_USER/.ssh/authorized_keys\nsudo chown -R ${NEW_USER}: /home/$NEW_USER/.ssh\nEND\n\n# Create the VM spec\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: sshvmi\nspec:\n terminationGracePeriodSeconds: 0\n domain:\n resources:\n requests:\n memory: 1024M\n devices:\n disks:\n - name: containerdisk\n disk:\n dev: vda\n - name: cloudinitdisk\n disk:\n dev: vdb\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-atomic-registry-disk-demo:latest\n - name: cloudinitdisk\n cloudInitNoCloud:\n userDataBase64: $(cat startup-script.sh | base64 -w0)\nEND\n\n# Post the VirtualMachineInstance spec to KubeVirt.\nkubectl create -f my-vmi.yaml\n\n# Connect to VM with passwordless SSH key\nssh -i <insert private key here> foo@<insert ip here>\n
"},{"location":"user_workloads/startup_scripts/#network-config","title":"Network Config","text":"A cloud-init network version 1 configuration can be set to configure the network at boot.
Cloud-init user-data must be set for cloud-init to parse network-config even if it is just the user-data config header:
#cloud-config\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-network-config-as-clear-text","title":"Cloud-init network-config as clear text","text":"In the example below, a simple cloud-init network-config is stored in the cloudInitNoCloud Volume's networkData field as clean text. There is a corresponding disks entry that references the cloud-init volume and assigns it to the VM's device.
# Create a VM manifest with the network-config in\n# a cloudInitNoCloud volume's networkData field.\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1alpha2\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n volumeName: registryvolume\n disk:\n bus: virtio\n - name: cloudinitdisk\n volumeName: cloudinitvolume\n disk:\n bus: virtio\n volumes:\n - name: registryvolume\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - name: cloudinitvolume\n cloudInitNoCloud:\n userData: \"#cloud-config\"\n networkData: |\n network:\n version: 1\n config:\n - type: physical\n name: eth0\n subnets:\n - type: dhcp\n\nEND\n\n# Post the Virtual Machine spec to KubeVirt.\n\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-network-config-as-base64-string","title":"Cloud-init network-config as base64 string","text":"In the example below, a simple network-config is base64 encoded and stored in the cloudInitNoCloud Volume's networkDataBase64 field. There is a corresponding disks entry that references the cloud-init volume and assigns it to the VM's device.
Users also have the option of storing the network-config in a Kubernetes Secret and referencing the Secret in the VM's spec. Examples further down in the document illustrate how that is done.
# Create a simple network-config\n\ncat << END > network-config\nnetwork:\n version: 1\n config:\n - type: physical\n name: eth0\n subnets:\n - type: dhcp\nEND\n\n# Create a VM manifest with the networkData base64 encoded into\n# a cloudInitNoCloud volume's networkDataBase64 field.\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1alpha2\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n volumeName: registryvolume\n disk:\n bus: virtio\n - name: cloudinitdisk\n volumeName: cloudinitvolume\n disk:\n bus: virtio\n volumes:\n - name: registryvolume\n containerDisk:\n image: kubevirt/cirros-container-disk-demo:latest\n - name: cloudinitvolume\n cloudInitNoCloud:\n userData: \"#cloud-config\"\n networkDataBase64: $(cat network-config | base64 -w0)\nEND\n\n# Post the Virtual Machine spec to KubeVirt.\n\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#cloud-init-network-config-as-k8s-secret","title":"Cloud-init network-config as k8s Secret","text":"Users who wish to not store the cloud-init network-config directly in the VirtualMachineInstance spec have the option to store the network-config into a Kubernetes Secret and reference that Secret in the spec.
Multiple VirtualMachineInstance specs can reference the same Kubernetes Secret containing cloud-init network-config.
Below is an example of how to create a Kubernetes Secret containing a network-config and reference that Secret in the VM's spec.
# Create a simple network-config\n\ncat << END > network-config\nnetwork:\n version: 1\n config:\n - type: physical\n name: eth0\n subnets:\n - type: dhcp\nEND\n\n# Store the network-config in a Kubernetes Secret\nkubectl create secret generic my-vmi-secret --from-file=networkdata=network-config\n\n# Create a VM manifest and reference the Secret's name in the cloudInitNoCloud\n# Volume's secretRef field\n\ncat << END > my-vmi.yaml\napiVersion: kubevirt.io/v1alpha2\nkind: VirtualMachineInstance\nmetadata:\n name: myvmi\nspec:\n terminationGracePeriodSeconds: 5\n domain:\n resources:\n requests:\n memory: 64M\n devices:\n disks:\n - name: containerdisk\n volumeName: registryvolume\n disk:\n bus: virtio\n - name: cloudinitdisk\n volumeName: cloudinitvolume\n disk:\n bus: virtio\n volumes:\n - name: registryvolume\n containerDisk:\n image: kubevirt/cirros-registry-disk-demo:latest\n - name: cloudinitvolume\n cloudInitNoCloud:\n userData: \"#cloud-config\"\n networkDataSecretRef:\n name: my-vmi-secret\nEND\n\n# Post the VM\nkubectl create -f my-vmi.yaml\n
"},{"location":"user_workloads/startup_scripts/#debugging","title":"Debugging","text":"Depending on the operating system distribution in use, cloud-init output is often printed to the console output on boot up. When developing userdata scripts, users can connect to the VM's console during boot up to debug.
Example of connecting to console using virtctl:
virtctl console <name of vmi>\n
"},{"location":"user_workloads/startup_scripts/#device-role-tagging","title":"Device Role Tagging","text":"KubeVirt provides a mechanism for users to tag devices such as Network Interfaces with a specific role. The tag will be matched to the hardware address of the device and this mapping exposed to the guest OS via cloud-init.
This additional metadata will help the guest OS users with multiple networks interfaces to identify the devices that may have a specific role, such as a network device dedicated to a specific service or a disk intended to be used by a specific application (database, webcache, etc.)
This functionality already exists in platforms such as OpenStack. KubeVirt will provide the data in a similar format, known to users and services like cloud-init.
For example:
kind: VirtualMachineInstance\nspec:\n domain:\n devices:\n interfaces:\n - masquerade: {}\n name: default\n - bridge: {}\n name: ptp\n tag: ptp\n - name: sriov-net\n sriov: {}\n tag: nfvfunc\n networks:\n - name: default\n pod: {}\n - multus:\n networkName: ptp-conf\n name: ptp\n networkName: sriov/sriov-network\n name: sriov-net\n\nThe metadata will be available in the guests config drive `openstack/latest/meta_data.json`\n\n{\n \"devices\": [\n {\n \"type\": \"nic\",\n \"bus\": \"pci\",\n \"address\": \"0000:00:02.0\",\n \"mac\": \"01:22:22:42:22:21\",\n \"tags\": [\"ptp\"]\n },\n {\n \"type\": \"nic\",\n \"bus\": \"pci\",\n \"address\": \"0000:81:10.1\",\n \"mac\": \"01:22:22:42:22:22\",\n \"tags\": [\"nfvfunc\"]\n },\n ]\n}\n
"},{"location":"user_workloads/startup_scripts/#ignition-examples","title":"Ignition Examples","text":"Ignition data can be passed into a cloudInitConfigDrive
source using either clear text, a base64 string or a k8s Secret.
Some examples of Ignition configurations can be found in the examples given by the Ignition documentation.
"},{"location":"user_workloads/startup_scripts/#ignition-as-clear-text","title":"Ignition as clear text","text":"Here is a complete example of a Kubevirt VM using Ignition to add an ssh key to the coreos
user at first boot :
apiVersion: kubevirt.io/v1alpha3\nkind: VirtualMachine\nmetadata:\n name: ign-demo\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/size: small\n kubevirt.io/domain: ign-demo\n spec:\n domain:\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: cloudinitdisk\n disk:\n bus: virtio\n interfaces:\n - name: default\n masquerade: {}\n resources:\n requests:\n memory: 2G\n networks:\n - name: default\n pod: {}\n volumes:\n - name: containerdisk\n containerDisk:\n image: quay.io/containerdisks/rhcos:4.9\n - name: cloudinitdisk\n cloudInitConfigDrive:\n userData: |\n {\n \"ignition\": {\n \"config\": {},\n \"proxy\": {},\n \"security\": {},\n \"timeouts\": {},\n \"version\": \"3.2.0\"\n },\n \"passwd\": {\n \"users\": [\n {\n \"name\": \"coreos\",\n \"sshAuthorizedKeys\": [\n \"ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPL3axFGHI3db9iJWkPXVbYzD7OaWTtHuqmxLvj+DztB user@example\"\n ]\n }\n ]\n },\n \"storage\": {},\n \"systemd\": {}\n }\n
See that the Ignition config is simply passed to the userData
annotation of the cloudInitConfigDrive
volume.
You can also pass the Ignition config as a base64 string by using the userDatabase64
annotation :
...\ncloudInitConfigDrive:\n userDataBase64: eyJpZ25pdGlvbiI6eyJjb25maWciOnt9LCJwcm94eSI6e30sInNlY3VyaXR5Ijp7fSwidGltZW91dHMiOnt9LCJ2ZXJzaW9uIjoiMy4yLjAifSwicGFzc3dkIjp7InVzZXJzIjpbeyJuYW1lIjoiY29yZW9zIiwic3NoQXV0aG9yaXplZEtleXMiOlsic3NoLWVkMjU1MTlBQUFBQzNOemFDMWxaREkxTlRFNUFBQUFJUEwzYXhGR0hJM2RiOWlKV2tQWFZiWXpEN09hV1R0SHVxbXhMdmorRHp0QiB1c2VyQGV4YW1wbGUiXX1dfSwic3RvcmFnZSI6e30sInN5c3RlbWQiOnt9fQ==\n
You can obtain the base64 string by doing cat ignition.json | base64 -w0
in your terminal.
If you do not want to store the Ignition config into the VM configuration, you can use a k8s Secret.
First, create the secret with the ignition data in it :
kubectl create secret generic my-ign-secret --from-file=ignition=ignition.json\n
Then specify this secret into your VM configuration :
...\ncloudInitConfigDrive:\n secretRef:\n name: my-ign-secret\n
"},{"location":"user_workloads/startup_scripts/#sysprep-examples","title":"Sysprep Examples","text":""},{"location":"user_workloads/startup_scripts/#sysprep-in-a-configmap","title":"Sysprep in a ConfigMap","text":"The answer file can be provided in a ConfigMap:
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: sysprep-config\ndata:\n autounattend.xml: |\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <unattend xmlns=\"urn:schemas-microsoft-com:unattend\">\n ...\n </unattend>\n
And attached to the VM like so:
kind: VirtualMachine\nmetadata:\n name: windows-with-sysprep\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/domain: windows-with-sysprep\n spec:\n domain:\n cpu:\n cores: 3\n devices:\n disks:\n - bootOrder: 1\n disk:\n bus: virtio\n name: harddrive\n - name: sysprep\n cdrom:\n bus: sata\n machine:\n type: q35\n resources:\n requests:\n memory: 6G\n volumes:\n - name: harddrive\n persistentVolumeClaim:\n claimName: windows_pvc\n - name: sysprep\n sysprep:\n configMap:\n name: sysprep-config\n
"},{"location":"user_workloads/startup_scripts/#sysprep-in-a-secret","title":"Sysprep in a Secret","text":"The answer file can be provided in a Secret:
apiVersion: v1\nkind: Secret\nmetadata:\n name: sysprep-config\nstringData:\ndata:\n autounattend.xml: |\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <unattend xmlns=\"urn:schemas-microsoft-com:unattend\">\n ...\n </unattend>\n
And attached to the VM like so:
kind: VirtualMachine\nmetadata:\n name: windows-with-sysprep\nspec:\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/domain: windows-with-sysprep\n spec:\n domain:\n cpu:\n cores: 3\n devices:\n disks:\n - bootOrder: 1\n disk:\n bus: virtio\n name: harddrive\n - name: sysprep\n cdrom:\n bus: sata\n machine:\n type: q35\n resources:\n requests:\n memory: 6G\n volumes:\n - name: harddrive\n persistentVolumeClaim:\n claimName: windows_pvc\n - name: sysprep\n sysprep:\n secret:\n name: sysprep-secret\n
"},{"location":"user_workloads/startup_scripts/#base-sysprep-vm","title":"Base Sysprep VM","text":"In the example below, a configMap with autounattend.xml
file is used to modify the Windows iso image which is downloaded from Microsoft and creates a base installed Windows machine with virtio drivers installed and all the commands executed in post-install.ps1
For the below manifests to work it needs to have win10-iso
DataVolume.
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: win10-template-configmap\ndata:\n autounattend.xml: |-\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <unattend xmlns=\"urn:schemas-microsoft-com:unattend\">\n <settings pass=\"windowsPE\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-International-Core-WinPE\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <SetupUILanguage>\n <UILanguage>en-US</UILanguage>\n </SetupUILanguage>\n <InputLocale>0409:00000409</InputLocale>\n <SystemLocale>en-US</SystemLocale>\n <UILanguage>en-US</UILanguage>\n <UILanguageFallback>en-US</UILanguageFallback>\n <UserLocale>en-US</UserLocale>\n </component>\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-PnpCustomizationsWinPE\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <DriverPaths>\n <PathAndCredentials wcm:keyValue=\"4b29ba63\" wcm:action=\"add\">\n <Path>E:\\amd64\\2k19</Path>\n </PathAndCredentials>\n <PathAndCredentials wcm:keyValue=\"25fe51ea\" wcm:action=\"add\">\n <Path>E:\\NetKVM\\2k19\\amd64</Path>\n </PathAndCredentials>\n </DriverPaths>\n </component>\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-Setup\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <DiskConfiguration>\n <Disk wcm:action=\"add\">\n <CreatePartitions>\n <CreatePartition wcm:action=\"add\">\n <Order>1</Order>\n <Type>Primary</Type>\n <Size>100</Size>\n </CreatePartition>\n <CreatePartition wcm:action=\"add\">\n <Extend>true</Extend>\n <Order>2</Order>\n <Type>Primary</Type>\n </CreatePartition>\n </CreatePartitions>\n <ModifyPartitions>\n <ModifyPartition wcm:action=\"add\">\n <Format>NTFS</Format>\n <Label>System Reserved</Label>\n <Order>1</Order>\n <PartitionID>1</PartitionID>\n <TypeID>0x27</TypeID>\n </ModifyPartition>\n <ModifyPartition wcm:action=\"add\">\n <Format>NTFS</Format>\n <Label>OS</Label>\n <Letter>C</Letter>\n <Order>2</Order>\n <PartitionID>2</PartitionID>\n </ModifyPartition>\n </ModifyPartitions>\n <DiskID>0</DiskID>\n <WillWipeDisk>true</WillWipeDisk>\n </Disk>\n </DiskConfiguration>\n <ImageInstall>\n <OSImage>\n <InstallFrom>\n <MetaData wcm:action=\"add\">\n <Key>/Image/Description</Key>\n <Value>Windows 10 Pro</Value>\n </MetaData>\n </InstallFrom>\n <InstallTo>\n <DiskID>0</DiskID>\n <PartitionID>2</PartitionID>\n </InstallTo>\n </OSImage>\n </ImageInstall>\n <UserData>\n <AcceptEula>true</AcceptEula>\n <FullName/>\n <Organization/>\n <ProductKey>\n <Key/>\n </ProductKey>\n </UserData>\n </component>\n </settings>\n <settings pass=\"offlineServicing\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-LUA-Settings\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <EnableLUA>false</EnableLUA>\n </component>\n </settings>\n <settings pass=\"specialize\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-International-Core\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <InputLocale>0409:00000409</InputLocale>\n <SystemLocale>en-US</SystemLocale>\n <UILanguage>en-US</UILanguage>\n <UILanguageFallback>en-US</UILanguageFallback>\n <UserLocale>en-US</UserLocale>\n </component>\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-Security-SPP-UX\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <SkipAutoActivation>true</SkipAutoActivation>\n </component>\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-SQMApi\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <CEIPEnabled>0</CEIPEnabled>\n </component>\n </settings>\n <settings pass=\"oobeSystem\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-Shell-Setup\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <OOBE>\n <HideEULAPage>true</HideEULAPage>\n <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>\n <HideOnlineAccountScreens>true</HideOnlineAccountScreens>\n <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>\n <NetworkLocation>Work</NetworkLocation>\n <SkipUserOOBE>true</SkipUserOOBE>\n <SkipMachineOOBE>true</SkipMachineOOBE>\n <ProtectYourPC>3</ProtectYourPC>\n </OOBE>\n <AutoLogon>\n <Password>\n <Value>123456</Value>\n <PlainText>true</PlainText>\n </Password>\n <Enabled>true</Enabled>\n <Username>Administrator</Username>\n </AutoLogon>\n <UserAccounts>\n <AdministratorPassword>\n <Value>123456</Value>\n <PlainText>true</PlainText>\n </AdministratorPassword>\n </UserAccounts>\n <RegisteredOrganization/>\n <RegisteredOwner/>\n <TimeZone>Eastern Standard Time</TimeZone>\n <FirstLogonCommands>\n <SynchronousCommand wcm:action=\"add\">\n <CommandLine>powershell -ExecutionPolicy Bypass -NoExit -NoProfile f:\\post-install.ps1</CommandLine>\n <RequiresUserInput>false</RequiresUserInput>\n <Order>1</Order>\n <Description>Post Installation Script</Description>\n </SynchronousCommand>\n </FirstLogonCommands>\n </component>\n </settings>\n </unattend>\n\n\n post-install.ps1: |-\n # Remove AutoLogin\n # https://docs.microsoft.com/en-us/windows-hardware/customize/desktop/unattend/microsoft-windows-shell-setup-autologon-logoncount#logoncount-known-issue\n reg add \"HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows NT\\CurrentVersion\\Winlogon\" /v AutoAdminLogon /t REG_SZ /d 0 /f\n\n # install Qemu Tools (Drivers)\n Start-Process msiexec -Wait -ArgumentList '/i e:\\virtio-win-gt-x64.msi /qn /passive /norestart'\n\n # install Guest Agent\n Start-Process msiexec -Wait -ArgumentList '/i e:\\guest-agent\\qemu-ga-x86_64.msi /qn /passive /norestart'\n\n # Rename cached unattend.xml to avoid it is picked up by sysprep\n mv C:\\Windows\\Panther\\unattend.xml C:\\Windows\\Panther\\unattend.install.xml\n\n # Eject CD, to avoid that the autounattend.xml on the CD is picked up by sysprep\n (new-object -COM Shell.Application).NameSpace(17).ParseName('F:').InvokeVerb('Eject')\n\n # Run Sysprep and Shutdown\n C:\\Windows\\System32\\Sysprep\\sysprep.exe /generalize /oobe /shutdown /mode:vm\n\n---\n\napiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n annotations:\n name.os.template.kubevirt.io/win10: Microsoft Windows 10\n vm.kubevirt.io/validations: |\n [\n {\n \"name\": \"minimal-required-memory\",\n \"path\": \"jsonpath::.spec.domain.resources.requests.memory\",\n \"rule\": \"integer\",\n \"message\": \"This VM requires more memory.\",\n \"min\": 2147483648\n }, {\n \"name\": \"windows-virtio-bus\",\n \"path\": \"jsonpath::.spec.domain.devices.disks[*].disk.bus\",\n \"valid\": \"jsonpath::.spec.domain.devices.disks[*].disk.bus\",\n \"rule\": \"enum\",\n \"message\": \"virto disk bus type has better performance, install virtio drivers in VM and change bus type\",\n \"values\": [\"virtio\"],\n \"justWarning\": true\n }, {\n \"name\": \"windows-disk-bus\",\n \"path\": \"jsonpath::.spec.domain.devices.disks[*].disk.bus\",\n \"valid\": \"jsonpath::.spec.domain.devices.disks[*].disk.bus\",\n \"rule\": \"enum\",\n \"message\": \"disk bus has to be either virtio or sata or scsi\",\n \"values\": [\"virtio\", \"sata\", \"scsi\"]\n }, {\n \"name\": \"windows-cd-bus\",\n \"path\": \"jsonpath::.spec.domain.devices.disks[*].cdrom.bus\",\n \"valid\": \"jsonpath::.spec.domain.devices.disks[*].cdrom.bus\",\n \"rule\": \"enum\",\n \"message\": \"cd bus has to be sata\",\n \"values\": [\"sata\"]\n }\n ]\n name: win10-template\n namespace: default\n labels:\n app: win10-template\n flavor.template.kubevirt.io/medium: 'true'\n os.template.kubevirt.io/win10: 'true'\n vm.kubevirt.io/template: windows10-desktop-medium\n vm.kubevirt.io/template.namespace: openshift\n vm.kubevirt.io/template.revision: '1'\n vm.kubevirt.io/template.version: v0.14.0\n workload.template.kubevirt.io/desktop: 'true'\nspec:\n runStrategy: RerunOnFailure\n dataVolumeTemplates:\n - metadata:\n name: win10-template-windows-iso\n spec:\n storage: {}\n source:\n pvc:\n name: windows10-iso\n namespace: default\n - metadata:\n name: win10-template\n spec:\n storage:\n resources:\n requests:\n storage: 25Gi\n volumeMode: Filesystem\n source:\n blank: {}\n template:\n metadata:\n annotations:\n vm.kubevirt.io/flavor: medium\n vm.kubevirt.io/os: windows10\n vm.kubevirt.io/workload: desktop\n labels:\n flavor.template.kubevirt.io/medium: 'true'\n kubevirt.io/domain: win10-template\n kubevirt.io/size: medium\n os.template.kubevirt.io/win10: 'true'\n vm.kubevirt.io/name: win10-template\n workload.template.kubevirt.io/desktop: 'true'\n spec:\n domain:\n clock:\n timer:\n hpet:\n present: false\n hyperv: {}\n pit:\n tickPolicy: delay\n rtc:\n tickPolicy: catchup\n utc: {}\n cpu:\n cores: 1\n sockets: 1\n threads: 1\n devices:\n disks:\n - bootOrder: 1\n disk:\n bus: virtio\n name: win10-template\n - bootOrder: 2\n cdrom:\n bus: sata\n name: windows-iso\n - cdrom:\n bus: sata\n name: windows-guest-tools\n - name: sysprep\n cdrom:\n bus: sata\n inputs:\n - bus: usb\n name: tablet\n type: tablet\n interfaces:\n - masquerade: {}\n model: virtio\n name: default\n features:\n acpi: {}\n apic: {}\n hyperv:\n reenlightenment: {}\n ipi: {}\n synic: {}\n synictimer:\n direct: {}\n spinlocks:\n spinlocks: 8191\n reset: {}\n relaxed: {}\n vpindex: {}\n runtime: {}\n tlbflush: {}\n frequencies: {}\n vapic: {}\n machine:\n type: pc-q35-rhel8.4.0\n resources:\n requests:\n memory: 4Gi\n hostname: win10-template\n networks:\n - name: default\n pod: {}\n volumes:\n - dataVolume:\n name: win10-iso\n name: windows-iso\n - dataVolume:\n name: win10-template-windows-iso\n name: win10-template\n - containerDisk:\n image: quay.io/kubevirt/virtio-container-disk\n name: windows-guest-tools\n - name: sysprep\n sysprep:\n configMap:\n name: win10-template-configmap\n
"},{"location":"user_workloads/startup_scripts/#launching-a-vm-from-template","title":"Launching a VM from template","text":"From the above example after the sysprep command is executed in the post-install.ps1
and the vm is in shutdown state, A new VM can be launched from the base win10-template
with additional changes mentioned from the below unattend.xml
in sysprep-config
. The new VM can take upto 5 minutes to be in running state since Windows goes through oobe setup in the background with the customizations specified in the below unattend.xml
file.
apiVersion: v1\nkind: ConfigMap\nmetadata:\n name: sysprep-config\ndata:\n autounattend.xml: |-\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <!-- responsible for installing windows, ignored on sysprepped images -->\n unattend.xml: |-\n <?xml version=\"1.0\" encoding=\"utf-8\"?>\n <unattend xmlns=\"urn:schemas-microsoft-com:unattend\">\n <settings pass=\"oobeSystem\">\n <component xmlns:wcm=\"http://schemas.microsoft.com/WMIConfig/2002/State\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" name=\"Microsoft-Windows-Shell-Setup\" processorArchitecture=\"amd64\" publicKeyToken=\"31bf3856ad364e35\" language=\"neutral\" versionScope=\"nonSxS\">\n <OOBE>\n <HideEULAPage>true</HideEULAPage>\n <HideOEMRegistrationScreen>true</HideOEMRegistrationScreen>\n <HideOnlineAccountScreens>true</HideOnlineAccountScreens>\n <HideWirelessSetupInOOBE>true</HideWirelessSetupInOOBE>\n <NetworkLocation>Work</NetworkLocation>\n <SkipUserOOBE>true</SkipUserOOBE>\n <SkipMachineOOBE>true</SkipMachineOOBE>\n <ProtectYourPC>3</ProtectYourPC>\n </OOBE>\n <AutoLogon>\n <Password>\n <Value>123456</Value>\n <PlainText>true</PlainText>\n </Password>\n <Enabled>true</Enabled>\n <Username>Administrator</Username>\n </AutoLogon>\n <UserAccounts>\n <AdministratorPassword>\n <Value>123456</Value>\n <PlainText>true</PlainText>\n </AdministratorPassword>\n </UserAccounts>\n <RegisteredOrganization>Kuebvirt</RegisteredOrganization>\n <RegisteredOwner>Kubevirt</RegisteredOwner>\n <TimeZone>Eastern Standard Time</TimeZone>\n <FirstLogonCommands>\n <SynchronousCommand wcm:action=\"add\">\n <CommandLine>powershell -ExecutionPolicy Bypass -NoExit -WindowStyle Hidden -NoProfile d:\\customize.ps1</CommandLine>\n <RequiresUserInput>false</RequiresUserInput>\n <Order>1</Order>\n <Description>Customize Script</Description>\n </SynchronousCommand>\n </FirstLogonCommands>\n </component>\n </settings>\n </unattend>\n customize.ps1: |-\n # Enable RDP\n Set-ItemProperty -Path 'HKLM:\\System\\CurrentControlSet\\Control\\Terminal Server' -name \"fDenyTSConnections\" -value 0\n Enable-NetFirewallRule -DisplayGroup \"Remote Desktop\"\n\n\n # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_install_firstuse\n # Install the OpenSSH Server\n Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0\n # Start the sshd service\n Start-Service sshd\n\n Set-Service -Name sshd -StartupType 'Automatic'\n\n # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_server_configuration\n # use powershell as default shell for ssh\n New-ItemProperty -Path \"HKLM:\\SOFTWARE\\OpenSSH\" -Name DefaultShell -Value \"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe\" -PropertyType String -Force\n\n\n # Add ssh authorized_key for administrator\n # https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_keymanagement\n $MyDir = $MyInvocation.MyCommand.Path | Split-Path -Parent\n $PublicKey = Get-Content -Path $MyDir\\id_rsa.pub\n $authrized_keys_path = $env:ProgramData + \"\\ssh\\administrators_authorized_keys\" \n Add-Content -Path $authrized_keys_path -Value $PublicKey\n icacls.exe $authrized_keys_path /inheritance:r /grant \"Administrators:F\" /grant \"SYSTEM:F\"\n\n\n # install application via exe file installer from url\n function Install-Exe {\n $dlurl = $args[0]\n $installerPath = Join-Path $env:TEMP (Split-Path $dlurl -Leaf)\n Invoke-WebRequest -UseBasicParsing $dlurl -OutFile $installerPath\n Start-Process -FilePath $installerPath -Args \"/S\" -Verb RunAs -Wait\n Remove-Item $installerPath\n\n }\n\n # Wait for networking before running a task at startup\n do {\n $ping = test-connection -comp kubevirt.io -count 1 -Quiet\n } until ($ping)\n\n # Installing the Latest Notepad++ with PowerShell\n $BaseUri = \"https://notepad-plus-plus.org\"\n $BasePage = Invoke-WebRequest -Uri $BaseUri -UseBasicParsing\n $ChildPath = $BasePage.Links | Where-Object { $_.outerHTML -like '*Current Version*' } | Select-Object -ExpandProperty href\n $DownloadPageUri = $BaseUri + $ChildPath\n $DownloadPage = Invoke-WebRequest -Uri $DownloadPageUri -UseBasicParsing\n $DownloadUrl = $DownloadPage.Links | Where-Object { $_.outerHTML -like '*npp.*.Installer.x64.exe\"*' } | Select-Object -ExpandProperty href\n Install-Exe $DownloadUrl\n id_rsa.pub: |-\n ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC6zdgFiLr1uAK7PdcchDd+LseA5fEOcxCCt7TLlr7Mx6h8jUg+G+8L9JBNZuDzTZSF0dR7qwzdBBQjorAnZTmY3BhsKcFr8Gt4KMGrS6r3DNmGruP8GORvegdWZuXgASKVpXeI7nCIjRJwAaK1x+eGHwAWO9Z8ohcboHbLyffOoSZDSIuk2kRIc47+ENRjg0T6x2VRsqX27g6j4DfPKQZGk0zvXkZaYtr1e2tZgqTBWqZUloMJK8miQq6MktCKAS4VtPk0k7teQX57OGwD6D7uo4b+Cl8aYAAwhn0hc0C2USfbuVHgq88ESo2/+NwV4SQcl3sxCW21yGIjAGt4Hy7J fedora@localhost.localdomain\n
"},{"location":"user_workloads/templates/","title":"Templates","text":"Note
By deploying KubeVirt on top of OpenShift the user can benefit from the OpenShift Template functionality.
"},{"location":"user_workloads/templates/#virtual-machine-templates","title":"Virtual machine templates","text":""},{"location":"user_workloads/templates/#what-is-a-virtual-machine-template","title":"What is a virtual machine template?","text":"The KubeVirt projects provides a set of templates to create VMs to handle common usage scenarios. These templates provide a combination of some key factors that could be further customized and processed to have a Virtual Machine object. The key factors which define a template are
Workload Most Virtual Machine should be server or desktop to have maximum flexibility; the highperformance workload trades some of this flexibility to provide better performances.
Guest Operating System (OS) This allow to ensure that the emulated hardware is compatible with the guest OS. Furthermore, it allows to maximize the stability of the VM, and allows performance optimizations.
Size (flavor) Defines the amount of resources (CPU, memory) to allocate to the VM.
More documentation is available in the common templates subproject
"},{"location":"user_workloads/templates/#accessing-the-virtual-machine-templates","title":"Accessing the virtual machine templates","text":"If you installed KubeVirt using a supported method you should find the common templates preinstalled in the cluster. Should you want to upgrade the templates, or install them from scratch, you can use one of the supported releases
To install the templates:
$ export VERSION=$(curl -s https://api.github.com/repos/kubevirt/common-templates/releases | grep tag_name | grep -v -- '-rc' | head -1 | awk -F': ' '{print $2}' | sed 's/,//' | xargs)\n $ oc create -f https://github.com/kubevirt/common-templates/releases/download/$VERSION/common-templates-$VERSION.yaml\n
"},{"location":"user_workloads/templates/#editable-fields","title":"Editable fields","text":"You can edit the fields of the templates which define the amount of resources which the VMs will receive.
Each template can list a different set of fields that are to be considered editable. The fields are used as hints for the user interface, and also for other components in the cluster.
The editable fields are taken from annotations in the template. Here is a snippet presenting a couple of most commonly found editable fields:
metadata:\n annotations:\n template.kubevirt.io/editable: |\n /objects[0].spec.template.spec.domain.cpu.sockets\n /objects[0].spec.template.spec.domain.cpu.cores\n /objects[0].spec.template.spec.domain.cpu.threads\n /objects[0].spec.template.spec.domain.resources.requests.memory\n
Each entry in the editable field list must be a jsonpath. The jsonpath root is the objects: element of the template. The actually editable field is the last entry (the \"leaf\") of the path. For example, the following minimal snippet highlights the fields which you can edit:
objects:\n spec:\n template:\n spec:\n domain:\n cpu:\n sockets:\n VALUE # this is editable\n cores:\n VALUE # this is editable\n threads:\n VALUE # this is editable\n resources:\n requests:\n memory:\n VALUE # this is editable\n
"},{"location":"user_workloads/templates/#relationship-between-templates-and-vms","title":"Relationship between templates and VMs","text":"Once processed the templates produce VM objects to be used in the cluster. The VMs produced from templates will have a vm.kubevirt.io/template
label, whose value will be the name of the parent template, for example fedora-desktop-medium
:
metadata:\n labels:\n vm.kubevirt.io/template: fedora-desktop-medium\n
In addition, these VMs can include an optional label vm.kubevirt.io/template-namespace
, whose value will be the namespace of the parent template, for example:
metadata:\n labels:\n vm.kubevirt.io/template-namespace: openshift\n
If this label is not defined, the template is expected to belong to the same namespace as the VM.
This make it possible to query for all the VMs built from any template.
Example:
oc process -o yaml -f dist/templates/rhel8-server-tiny.yaml NAME=rheltinyvm SRC_PVC_NAME=rhel SRC_PVC_NAMESPACE=kubevirt\n
And the output:
apiVersion: v1\nitems:\n- apiVersion: kubevirt.io/v1\n kind: VirtualMachine\n metadata:\n annotations:\n vm.kubevirt.io/flavor: tiny\n vm.kubevirt.io/os: rhel8\n vm.kubevirt.io/validations: |\n [\n {\n \"name\": \"minimal-required-memory\",\n \"path\": \"jsonpath::.spec.domain.resources.requests.memory\",\n \"rule\": \"integer\",\n \"message\": \"This VM requires more memory.\",\n \"min\": 1610612736\n }\n ]\n vm.kubevirt.io/workload: server\n labels:\n app: rheltinyvm\n vm.kubevirt.io/template: rhel8-server-tiny\n vm.kubevirt.io/template.revision: \"45\"\n vm.kubevirt.io/template.version: 0.11.3\n name: rheltinyvm\n spec:\n dataVolumeTemplates:\n - apiVersion: cdi.kubevirt.io/v1beta1\n kind: DataVolume\n metadata:\n name: rheltinyvm\n spec:\n storage:\n accessModes:\n - ReadWriteMany\n source:\n pvc:\n name: rhel\n namespace: kubevirt\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/domain: rheltinyvm\n kubevirt.io/size: tiny\n spec:\n domain:\n cpu:\n cores: 1\n sockets: 1\n threads: 1\n devices:\n disks:\n - disk:\n bus: virtio\n name: rheltinyvm\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - masquerade: {}\n name: default\n networkInterfaceMultiqueue: true\n rng: {}\n resources:\n requests:\n memory: 1.5Gi\n networks:\n - name: default\n pod: {}\n terminationGracePeriodSeconds: 180\n volumes:\n - dataVolume:\n name: rheltinyvm\n name: rheltinyvm\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n user: cloud-user\n password: lymp-fda4-m1cv\n chpasswd: { expire: False }\n name: cloudinitdisk\nkind: List\nmetadata: {}\n
You can add the VM from the template to the cluster in one go
oc process rhel8-server-tiny NAME=rheltinyvm SRC_PVC_NAME=rhel SRC_PVC_NAMESPACE=kubevirt | oc apply -f -\n
Please note that after the generation step VM and template objects have no relationship with each other besides the aforementioned label. Changes in templates do not automatically affect VMs or vice versa.
"},{"location":"user_workloads/templates/#common-template-customization","title":"common template customization","text":"The templates provided by the kubevirt project provide a set of conventions and annotations that augment the basic feature of the openshift templates. You can customize your kubevirt-provided templates editing these annotations, or you can add them to your existing templates to make them consumable by the kubevirt services.
Here's a description of the kubevirt annotations. Unless otherwise specified, the following keys are meant to be top-level entries of the template metadata, like
apiVersion: v1\nkind: Template\nmetadata:\n name: windows-10\n annotations:\n openshift.io/display-name: \"Generic demo template\"\n
All the following annotations are prefixed with defaults.template.kubevirt.io
, which is omitted below for brevity. So the actual annotations you should use will look like
apiVersion: v1\nkind: Template\nmetadata:\n name: windows-10\n annotations:\n defaults.template.kubevirt.io/disk: default-disk\n defaults.template.kubevirt.io/volume: default-volume\n defaults.template.kubevirt.io/nic: default-nic\n defaults.template.kubevirt.io/network: default-network\n
Unless otherwise specified, all annotations are meant to be safe defaults, both for performance and compatibility, and hints for the CNV-aware UI and tooling.
"},{"location":"user_workloads/templates/#disk","title":"disk","text":"See the section references
below.
Example:
apiVersion: v1\nkind: Template\nmetadata:\n name: Linux\n annotations:\n defaults.template.kubevirt.io/disk: rhel-disk\n
"},{"location":"user_workloads/templates/#nic","title":"nic","text":"See the section references
below.
Example:
apiVersion: v1\nkind: Template\nmetadata:\n name: Windows\n annotations:\n defaults.template.kubevirt.io/nic: my-nic\n
"},{"location":"user_workloads/templates/#volume","title":"volume","text":"See the section references
below.
Example:
apiVersion: v1\nkind: Template\nmetadata:\n name: Linux\n annotations:\n defaults.template.kubevirt.io/volume: custom-volume\n
"},{"location":"user_workloads/templates/#network","title":"network","text":"See the section references
below.
Example:
apiVersion: v1\nkind: Template\nmetadata:\n name: Linux\n annotations:\n defaults.template.kubevirt.io/network: fast-net\n
"},{"location":"user_workloads/templates/#references","title":"references","text":"The default values for network, nic, volume, disk are meant to be the name of a section later in the document that the UI will find and consume to find the default values for the corresponding types. For example, considering the annotation defaults.template.kubevirt.io/disk: my-disk
: we assume that later in the document it exists an element called my-disk
that the UI can use to find the data it needs. The names actually don't matter as long as they are legal for kubernetes and consistent with the content of the document.
demo-template.yaml
apiversion: v1\nitems:\n- apiversion: kubevirt.io/v1\n kind: virtualmachine\n metadata:\n labels:\n vm.kubevirt.io/template: rhel7-generic-tiny\n name: rheltinyvm\n osinfoname: rhel7.0\n defaults.template.kubevirt.io/disk: rhel-default-disk\n defaults.template.kubevirt.io/nic: rhel-default-net\n spec:\n running: false\n template:\n spec:\n domain:\n cpu:\n sockets: 1\n cores: 1\n threads: 1\n devices:\n rng: {}\n resources:\n requests:\n memory: 1g\n terminationgraceperiodseconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/cirros-container-disk-demo:devel\n name: rhel-default-disk\n networks:\n - genie:\n networkName: flannel\n name: rhel-default-net\nkind: list\nmetadata: {}\n
once processed becomes: demo-vm.yaml
apiVersion: kubevirt.io/v1\nkind: VirtualMachine\nmetadata:\n labels:\n vm.kubevirt.io/template: rhel7-generic-tiny\n name: rheltinyvm\n osinfoname: rhel7.0\nspec:\n running: false\n template:\n spec:\n domain:\n cpu:\n sockets: 1\n cores: 1\n threads: 1\n resources:\n requests:\n memory: 1g\n devices:\n rng: {}\n disks:\n - disk:\n name: rhel-default-disk\n interfaces:\n - bridge: {}\n name: rhel-default-nic\n terminationgraceperiodseconds: 0\n volumes:\n - containerDisk:\n image: registry:5000/kubevirt/cirros-container-disk-demo:devel\n name: containerdisk\n networks:\n - genie:\n networkName: flannel\n name: rhel-default-nic\n
"},{"location":"user_workloads/templates/#virtual-machine-creation","title":"Virtual machine creation","text":""},{"location":"user_workloads/templates/#overview","title":"Overview","text":"The KubeVirt projects provides a set of templates to create VMs to handle common usage scenarios. These templates provide a combination of some key factors that could be further customized and processed to have a Virtual Machine object.
The key factors which define a template are - Workload Most Virtual Machine should be server or desktop to have maximum flexibility; the highperformance workload trades some of this flexibility to provide better performances. - Guest Operating System (OS) This allow to ensure that the emulated hardware is compatible with the guest OS. Furthermore, it allows to maximize the stability of the VM, and allows performance optimizations. - Size (flavor) Defines the amount of resources (CPU, memory) to allocate to the VM.
"},{"location":"user_workloads/templates/#openshift-console","title":"Openshift Console","text":"VMs can be created through OpenShift Cluster Console UI . This UI supports creation VM using templates and templates features - flavors and workload profiles. To create VM from template, choose WorkLoads in the left panel >> choose Virtualization >> press to the \"Create Virtual Machine\" blue button >> choose \"Create from wizard\". Next, you have to see \"Create Virtual Machine\" window
"},{"location":"user_workloads/templates/#common-templates","title":"Common-templates","text":"There is the common-templates subproject. It provides official prepared and useful templates. You can also create templates by hand. You can find an example below, in the \"Example template\" section.
"},{"location":"user_workloads/templates/#example-template","title":"Example template","text":"In order to create a virtual machine via OpenShift CLI, you need to provide a template defining the corresponding object and its metadata.
NOTE Only VirtualMachine
object is currently supported.
Here is an example template that defines an instance of the VirtualMachine
object:
apiVersion: template.openshift.io/v1\nkind: Template\nmetadata:\n name: fedora-desktop-large\n annotations:\n openshift.io/display-name: \"Fedora 32+ VM\"\n description: >-\n Template for Fedora 32 VM or newer.\n A PVC with the Fedora disk image must be available.\n Recommended disk image:\n https://download.fedoraproject.org/pub/fedora/linux/releases/32/Cloud/x86_64/images/Fedora-Cloud-Base-32-1.6.x86_64.qcow2\n tags: \"hidden,kubevirt,virtualmachine,fedora\"\n iconClass: \"icon-fedora\"\n openshift.io/provider-display-name: \"KubeVirt\"\n openshift.io/documentation-url: \"https://github.com/kubevirt/common-templates\"\n openshift.io/support-url: \"https://github.com/kubevirt/common-templates/issues\"\n template.openshift.io/bindable: \"false\"\n template.kubevirt.io/version: v1alpha1\n defaults.template.kubevirt.io/disk: rootdisk\n template.kubevirt.io/editable: |\n /objects[0].spec.template.spec.domain.cpu.sockets\n /objects[0].spec.template.spec.domain.cpu.cores\n /objects[0].spec.template.spec.domain.cpu.threads\n /objects[0].spec.template.spec.domain.resources.requests.memory\n /objects[0].spec.template.spec.domain.devices.disks\n /objects[0].spec.template.spec.volumes\n /objects[0].spec.template.spec.networks\n name.os.template.kubevirt.io/fedora32: Fedora 32 or higher\n name.os.template.kubevirt.io/fedora33: Fedora 32 or higher\n name.os.template.kubevirt.io/silverblue32: Fedora 32 or higher\n name.os.template.kubevirt.io/silverblue33: Fedora 32 or higher\n labels:\n os.template.kubevirt.io/fedora32: \"true\"\n os.template.kubevirt.io/fedora33: \"true\"\n os.template.kubevirt.io/silverblue32: \"true\"\n os.template.kubevirt.io/silverblue33: \"true\"\n workload.template.kubevirt.io/desktop: \"true\"\n flavor.template.kubevirt.io/large: \"true\"\n template.kubevirt.io/type: \"base\"\n template.kubevirt.io/version: \"0.11.3\"\nobjects:\n- apiVersion: kubevirt.io/v1\n kind: VirtualMachine\n metadata:\n name: ${NAME}\n labels:\n vm.kubevirt.io/template: fedora-desktop-large\n vm.kubevirt.io/template.version: \"0.11.3\"\n vm.kubevirt.io/template.revision: \"45\"\n app: ${NAME}\n annotations:\n vm.kubevirt.io/os: \"fedora\"\n vm.kubevirt.io/workload: \"desktop\"\n vm.kubevirt.io/flavor: \"large\"\n vm.kubevirt.io/validations: |\n [\n {\n \"name\": \"minimal-required-memory\",\n \"path\": \"jsonpath::.spec.domain.resources.requests.memory\",\n \"rule\": \"integer\",\n \"message\": \"This VM requires more memory.\",\n \"min\": 1073741824\n }\n ]\n spec:\n dataVolumeTemplates:\n - apiVersion: cdi.kubevirt.io/v1beta1\n kind: DataVolume\n metadata:\n name: ${NAME}\n spec:\n storage:\n accessModes:\n - ReadWriteMany\n source:\n pvc:\n name: ${SRC_PVC_NAME}\n namespace: ${SRC_PVC_NAMESPACE}\n running: false\n template:\n metadata:\n labels:\n kubevirt.io/domain: ${NAME}\n kubevirt.io/size: large\n spec:\n domain:\n cpu:\n sockets: 2\n cores: 1\n threads: 1\n resources:\n requests:\n memory: 8Gi\n devices:\n rng: {}\n networkInterfaceMultiqueue: true\n inputs:\n - type: tablet\n bus: virtio\n name: tablet\n disks:\n - disk:\n bus: virtio\n name: ${NAME}\n - disk:\n bus: virtio\n name: cloudinitdisk\n interfaces:\n - masquerade: {}\n name: default\n terminationGracePeriodSeconds: 180\n networks:\n - name: default\n pod: {}\n volumes:\n - dataVolume:\n name: ${NAME}\n name: ${NAME}\n - cloudInitNoCloud:\n userData: |-\n #cloud-config\n user: fedora\n password: ${CLOUD_USER_PASSWORD}\n chpasswd: { expire: False }\n name: cloudinitdisk\nparameters:\n- description: VM name\n from: 'fedora-[a-z0-9]{16}'\n generate: expression\n name: NAME\n- name: SRC_PVC_NAME\n description: Name of the PVC to clone\n value: 'fedora'\n- name: SRC_PVC_NAMESPACE\n description: Namespace of the source PVC\n value: kubevirt-os-images\n- description: Randomized password for the cloud-init user fedora\n from: '[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}'\n generate: expression\n name: CLOUD_USER_PASSWORD\n
Note that the template above defines free parameters (NAME
, SRC_PVC_NAME
, SRC_PVC_NAMESPACE
, CLOUD_USER_PASSWORD
) and the NAME
parameter does not have specified default value.
An OpenShift template has to be converted into the JSON file via oc process
command, that also allows you to set the template parameters.
A complete example can be found in the KubeVirt repository.
!> You need to be logged in by oc login
command.
$ oc process -f cluster/vmi-template-fedora.yaml\\\n -p NAME=testvmi \\\n -p SRC_PVC_NAME=fedora \\\n -p SRC_PVC_NAMESPACE=kubevirt \\\n{\n \"kind\": \"List\",\n \"apiVersion\": \"v1\",\n \"metadata\": {},\n \"items\": [\n {\n
The JSON file is usually applied directly by piping the processed output to oc create
command.
$ oc process -f cluster/examples/vm-template-fedora.yaml \\\n -p NAME=testvm \\\n -p SRC_PVC_NAME=fedora \\\n -p SRC_PVC_NAMESPACE=kubevirt \\\n | oc create -f -\nvirtualmachine.kubevirt.io/testvm created\n
The command above results in creating a Kubernetes object according to the specification given by the template \\(in this example it is an instance of the VirtualMachine object\\).
It's possible to get list of available parameters using the following command:
$ oc process -f dist/templates/fedora-desktop-large.yaml --parameters\nNAME DESCRIPTION GENERATOR VALUE\nNAME VM name expression fedora-[a-z0-9]{16}\nSRC_PVC_NAME Name of the PVC to clone fedora\nSRC_PVC_NAMESPACE Namespace of the source PVC kubevirt-os-images\nCLOUD_USER_PASSWORD Randomized password for the cloud-init user fedora expression [a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}\n
"},{"location":"user_workloads/templates/#starting-virtual-machine-from-the-created-object","title":"Starting virtual machine from the created object","text":"The created object is now a regular VirtualMachine object and from now it can be controlled by accessing Kubernetes API resources. The preferred way how to do this from within the OpenShift environment is to use oc patch
command.
$ oc patch virtualmachine testvm --type merge -p '{\"spec\":{\"running\":true}}'\nvirtualmachine.kubevirt.io/testvm patched\n
Do not forget about virtctl tool. Using it in the real cases instead of using kubernetes API can be more convenient. Example:
$ virtctl start testvm\nVM testvm was scheduled to start\n
As soon as VM starts, Kubernetes creates new type of object - VirtualMachineInstance. It has similar name to VirtualMachine. Example (not full output, it's too big):
$ kubectl describe vm testvm\nname: testvm\nNamespace: myproject\nLabels: kubevirt-vm=vm-testvm\n kubevirt.io/os=fedora33\nAnnotations: <none>\nAPI Version: kubevirt.io/v1\nKind: VirtualMachine\n
"},{"location":"user_workloads/templates/#cloud-init-script-and-parameters","title":"Cloud-init script and parameters","text":"Kubevirt VM templates, just like kubevirt VM/VMI yaml configs, supports cloud-init scripts
"},{"location":"user_workloads/templates/#hack-use-pre-downloaded-image","title":"Hack - use pre-downloaded image","text":"Kubevirt VM templates, just like kubevirt VM/VMI yaml configs, can use pre-downloaded VM image, which can be a useful feature especially in the debug/development/testing cases. No special parameters required in the VM template or VM/VMI yaml config. The main idea is to create Kubernetes PersistentVolume and PersistentVolumeClaim corresponding to existing image in the file system. Example:
---\nkind: PersistentVolume\napiVersion: v1\nmetadata:\n name: mypv\n labels:\n type: local\nspec:\n storageClassName: manual\n capacity:\n storage: 10G\n accessModes:\n - ReadWriteOnce\n hostPath:\n path: \"/mnt/sda1/images/testvm\"\n---\nkind: PersistentVolumeClaim\napiVersion: v1\nmetadata:\n name: mypvc\nspec:\n storageClassName: manual\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 10G\n
"},{"location":"user_workloads/templates/#using-datavolumes","title":"Using DataVolumes","text":"Kubevirt VM templates are using dataVolumeTemplates. Before using dataVolumes, CDI has to be installed in cluster. After that, source Datavolume can be created.
---\napiVersion: cdi.kubevirt.io/v1beta1\nkind: DataVolume\nmetadata:\n name: fedora-datavolume-original\n namespace: kubevirt\nspec:\n source:\n registry:\n url: \"image_url\"\n storage:\n resources:\n requests:\n storage: 30Gi\n
After import is completed, VM can be created:
$ oc process -f cluster/examples/vm-template-fedora.yaml \\\n -p NAME=testvmi \\\n -p SRC_PVC_NAME=fedora-datavolume-original \\\n -p SRC_PVC_NAMESPACE=kubevirt \\\n | oc create -f -\nvirtualmachine.kubevirt.io/testvm created\n
"},{"location":"user_workloads/templates/#additional-information","title":"Additional information","text":"You can follow Virtual Machine Lifecycle Guide for further reference.
"},{"location":"user_workloads/virtctl_client_tool/","title":"Download and Install the virtctl Command Line Interface","text":""},{"location":"user_workloads/virtctl_client_tool/#download-the-virtctl-client-tool","title":"Download thevirtctl
client tool","text":"Basic VirtualMachineInstance operations can be performed with the stock kubectl
utility. However, the virtctl
binary utility is required to use advanced features such as:
It also provides convenience commands for:
Starting and stopping VirtualMachineInstances
Live migrating VirtualMachineInstances and canceling live migrations
Uploading virtual machine disk images
There are two ways to get it:
the most recent version of the tool can be retrieved from the official release page
it can be installed as a kubectl
plugin using krew
Example:
export VERSION==$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)\nwget https://github.com/kubevirt/kubevirt/releases/download/${VERSION}/virtctl-${VERSION}-linux-amd64\n
"},{"location":"user_workloads/virtctl_client_tool/#install-virtctl-with-krew","title":"Install virtctl
with krew
","text":"It is required to install krew
plugin manager beforehand. If krew
is installed, virtctl
can be installed via krew
:
$ kubectl krew install virt\n
Then virtctl
can be used as a kubectl plugin. For a list of available commands run:
$ kubectl virt help\n
Every occurrence throughout this guide of
$ ./virtctl <command>...\n
should then be read as
$ kubectl virt <command>...\n
"},{"location":"user_workloads/virtual_machine_instances/","title":"Virtual Machines Instances","text":"The VirtualMachineInstance
type conceptionally has two parts:
Information for making scheduling decisions
Information about the virtual machine API
Every VirtualMachineInstance
object represents a single running virtual machine instance.
With the installation of KubeVirt, new types are added to the Kubernetes API to manage Virtual Machines.
You can interact with the new resources (via kubectl
) as you would with any other API resource.
Note: A full API reference is available at https://kubevirt.io/api-reference/.
Here is an example of a VirtualMachineInstance object:
apiVersion: kubevirt.io/v1\nkind: VirtualMachineInstance\nmetadata:\n name: testvmi-nocloud\nspec:\n terminationGracePeriodSeconds: 30\n domain:\n resources:\n requests:\n memory: 1024M\n devices:\n disks:\n - name: containerdisk\n disk:\n bus: virtio\n - name: emptydisk\n disk:\n bus: virtio\n - disk:\n bus: virtio\n name: cloudinitdisk\n volumes:\n - name: containerdisk\n containerDisk:\n image: kubevirt/fedora-cloud-container-disk-demo:latest\n - name: emptydisk\n emptyDisk:\n capacity: \"2Gi\"\n - name: cloudinitdisk\n cloudInitNoCloud:\n userData: |-\n #cloud-config\n password: fedora\n chpasswd: { expire: False }\n
This example uses a fedora cloud image in combination with cloud-init and an ephemeral empty disk with a capacity of 2Gi
. For the sake of simplicity, the volume sources in this example are ephemeral and don't require a provisioner in your cluster.
Using instancetypes and preferences with a VirtualMachine: Instancetypes and preferences
More information about persistent and ephemeral volumes: Disks and Volumes
How to access a VirtualMachineInstance via console
or vnc
: Console Access
How to customize VirtualMachineInstances with cloud-init
: Cloud Init
In KubeVirt, the VM rollout strategy defines how changes to a VM object affect a running guest. In other words, it defines when and how changes to a VM object get propagated to its corresponding VMI object.
There are currently 2 rollout strategies: LiveUpdate
and Stage
. Only 1 can be specified and the default is Stage
.
As long as the VMLiveUpdateFeatures
is not enabled, the VM Rollout Strategy is ignored and defaults to \"Stage\". The feature gate is set in the KubeVirt custom resource (CR) like that:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n developerConfiguration:\n featureGates:\n - VMLiveUpdateFeatures\n
"},{"location":"user_workloads/vm_rollout_strategies/#liveupdate","title":"LiveUpdate","text":"The LiveUpdate
VM rollout strategy tries to propagate VM object changes to running VMIs as soon as possible. For example, changing the number of CPU sockets will trigger a CPU hotplug.
Enable the LiveUpdate
VM rollout strategy in the KubeVirt CR:
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmRolloutStrategy: \"LiveUpdate\"\n
"},{"location":"user_workloads/vm_rollout_strategies/#stage","title":"Stage","text":"The Stage
VM rollout strategy stages every change made to the VM object until its next reboot.
apiVersion: kubevirt.io/v1\nkind: KubeVirt\nspec:\n configuration:\n vmRolloutStrategy: \"Stage\"\n
"},{"location":"user_workloads/vm_rollout_strategies/#restartrequired-condition","title":"RestartRequired condition","text":"Any change made to a VM object when the rollout strategy is Stage
will trigger the RestartRequired
VM condition. When the rollout strategy is LiveUpdate
, only non-propagatable changes will trigger the condition.
Once the RestartRequired
condition is set on a VM object, no further changes can be propagated, even if the strategy is set to LiveUpdate
. Changes will become effective on next reboot, and the condition will be removed.
The current implementation has the following limitations:
RestartRequired
condition is set, the only way to get rid of it is to restart the VM. In the future, we plan on implementing a way to get rid of it by reverting the VM template spec to its last non-RestartRequired state.RestartRequired
condition comes with a message stating what kind of change triggered the condition (CPU/memory/other). That message pertains only to the first change that triggered the condition. Additional changes that would usually trigger the condition will just get staged and no additional RestartRequired
condition will be added.Purpose of this document is to explain how to install virtio drivers for Microsoft Windows running in a fully virtualized guest.
"},{"location":"user_workloads/windows_virtio_drivers/#do-i-need-virtio-drivers","title":"Do I need virtio drivers?","text":"Yes. Without the virtio drivers, you cannot use paravirtualized hardware properly. It would either not work, or will have a severe performance penalty.
For more information about VirtIO and paravirtualization, see VirtIO and paravirtualization
For more details on configuring your VirtIO driver please refer to Installing VirtIO driver on a new Windows virtual machine and Installing VirtIO driver on an existing Windows virtual machine.
"},{"location":"user_workloads/windows_virtio_drivers/#which-drivers-i-need-to-install","title":"Which drivers I need to install?","text":"There are usually up to 8 possible devices that are required to run Windows smoothly in a virtualized environment. KubeVirt currently supports only:
viostor, the block driver, applies to SCSI Controller in the Other devices group.
viorng, the entropy source driver, applies to PCI Device in the Other devices group.
NetKVM, the network driver, applies to Ethernet Controller in the Other devices group. Available only if a virtio NIC is configured.
Other virtio drivers, that exists and might be supported in the future:
Balloon, the balloon driver, applies to PCI Device in the Other devices group
vioserial, the paravirtual serial driver, applies to PCI Simple Communications Controller in the Other devices group.
vioscsi, the SCSI block driver, applies to SCSI Controller in the Other devices group.
qemupciserial, the emulated PCI serial driver, applies to PCI Serial Port in the Other devices group.
qxl, the paravirtual video driver, applied to Microsoft Basic Display Adapter in the Display adapters group.
pvpanic, the paravirtual panic driver, applies to Unknown device in the Other devices group.
Note
Some drivers are required in the installation phase. When you are installing Windows onto the virtio block storage you have to provide an appropriate virtio driver. Namely, choose viostor driver for your version of Microsoft Windows, eg. does not install XP driver when you run Windows 10.
Other drivers can be installed after the successful windows installation. Again, please install only drivers matching your Windows version.
"},{"location":"user_workloads/windows_virtio_drivers/#how-to-install-during-windows-install","title":"How to install during Windows install?","text":"To install drivers before the Windows starts its install, make sure you have virtio-win package attached to your VirtualMachine as SATA CD-ROM. In the Windows installation, choose advanced install and load driver. Then please navigate to loaded Virtio CD-ROM and install one of viostor or vioscsi, depending on whichever you have set up.
Step by step screenshots:
"},{"location":"user_workloads/windows_virtio_drivers/#how-to-install-after-windows-install","title":"How to install after Windows install?","text":"After windows install, please go to Device Manager. There you should see undetected devices in \"available devices\" section. You can install virtio drivers one by one going through this list.
For more details on how to choose a proper driver and how to install the driver, please refer to the Windows Guest Virtual Machines on Red Hat Enterprise Linux 7.
"},{"location":"user_workloads/windows_virtio_drivers/#how-to-obtain-virtio-drivers","title":"How to obtain virtio drivers?","text":"The virtio Windows drivers are distributed in a form of containerDisk, which can be simply mounted to the VirtualMachine. The container image, containing the disk is located at: https://quay.io/repository/kubevirt/virtio-container-disk?tab=tags and the image be pulled as any other docker container:
docker pull quay.io/kubevirt/virtio-container-disk\n
However, pulling image manually is not required, it will be downloaded if not present by Kubernetes when deploying VirtualMachine.
"},{"location":"user_workloads/windows_virtio_drivers/#attaching-to-virtualmachine","title":"Attaching to VirtualMachine","text":"KubeVirt distributes virtio drivers for Microsoft Windows in a form of container disk. The package contains the virtio drivers and QEMU guest agent. The disk was tested on Microsoft Windows Server 2012. Supported Windows version is XP and up.
The package is intended to be used as CD-ROM attached to the virtual machine with Microsoft Windows. It can be used as SATA CDROM during install phase or to provide drivers in an existing Windows installation.
Attaching the virtio-win package can be done simply by adding ContainerDisk to you VirtualMachine.
spec:\n domain:\n devices:\n disks:\n - name: virtiocontainerdisk\n # Any other disk you want to use, must go before virtioContainerDisk.\n # KubeVirt boots from disks in order ther are defined.\n # Therefore virtioContainerDisk, must be after bootable disk.\n # Other option is to choose boot order explicitly:\n # - https://kubevirt.io/api-reference/v0.13.2/definitions.html#_v1_disk\n # NOTE: You either specify bootOrder explicitely or sort the items in\n # disks. You can not do both at the same time.\n # bootOrder: 2\n cdrom:\n bus: sata\nvolumes:\n - containerDisk:\n image: quay.io/kubevirt/virtio-container-disk\n name: virtiocontainerdisk\n
Once you are done installing virtio drivers, you can remove virtio container disk by simply removing the disk from yaml specification and restarting the VirtualMachine.
"}]} \ No newline at end of file