Walkthrough: Migrating a VM
A XenAPI client wishes to migrate a VM from one host to another within the same pool.
The client will issue a command to migrate the VM and it will be dispatched
by the autogenerated dispatch_call
function from xapi/server.ml. For
more information about the generated functions you can have a look to
XAPI IDL model.
The command will trigger the operation VM_migrate that has low level operations performed by the backend. These atomics operations that we will describe in the documentation are:
- VM.restore
- VM.rename
- VBD.set_active
- VBD.plug
- VIF.set_active
- VGPU.set_active
- VM.create_device_model
- PCI.plug
- VM.set_domain_action_request
The command have serveral parameters such as: should it be ran asynchronously, should it be forwared to another host, how arguments should be marshalled and so on. A new thread is created by xapi/server_helpers.ml to handle the command asynchronously. At this point the helper also check if the command should be passed to the message forwarding layer in order to be executed on another host (the destination) or locally if we are already at the right place.
It will finally reach xapi/api_server.ml that will take the action of posted a command to the message broker message switch. It is a JSON-RPC HTTP request sends on a Unix socket to communicate between some XAPI daemons. In the case of the migration this message sends by XAPI will be consumed by the xenopsd daemon that will do the job of migrating the VM.
The migration of the VM
The migration is an asynchronous task and a thread is created to handle this task. The tasks’s reference is returned to the client, which can then check its status until completion.
As we see in the introduction the xenopsd daemon will pop the operation VM_migrate from the message broker.
Only one backend is know available that interacts with libxc, libxenguest and xenstore. It is the xc backend.
The entities that need to be migrated are: VDI, VIF, VGPU and PCI components.
During the migration process the destination domain will be built with the same
uuid than the original VM but the last part of the UUID will be
XXXXXXXX-XXXX-XXXX-XXXX-000000000001
. The original domain will be removed using
XXXXXXXX-XXXX-XXXX-XXXX-000000000000
.
There are some points called hooks at which xenopsd
can execute some script.
Before starting a migration a command is send to the original domain to execute
a pre migrate script if it exists.
Before starting the migration a command is sent to Qemu using the Qemu Machine Protocol (QMP) to check that the domain can be suspended (see xenopsd/xc/device_common.ml). After checking with Qemu that the VM is suspendable we can start the migration.
Importing metadata
As for hooks, commands to source domain are sent using stunnel a daemon which is used as a wrapper to manage SSL encryption communication between two hosts on the same pool. To import metada an XML RPC command is sent to the original domain.
Once imported it will give us a reference id and will allow to build the new domain
on the destination using the temporary VM uuid XXXXXXXX-XXXX-XXXX-XXXX-000000000001
where XXX...
is the reference id of the original VM.
Setting memory
One of the first thing to do is to setup the memory. The backend will check that there is no ballooning operation in progress. At this point the migration can fail if a ballooning operation is in progress and takes too much time.
Once memory checked the daemon will get the state of the VM (running, halted, …) and information about the VM are retrieve by the backend like the maximum memory the domain can consume but also information about quotas for example. Information are retrieve by the backend from xenstore.
Once this is complete, we can restore VIF and create the domain.
The synchronisation of the memory is the first point of synchronisation and everythin is ready for VM migration.
VM Migration
After receiving memory we can set up the destination domain. If we have a vGPU we need to kick off its migration process. We will need to wait the acknowledge that indicates that the entry for the GPU has been well initialized. before starting the main VM migration.
Their is a mechanism of handshake for synchronizing between the source and the destination. Using the handshake protocol the receiver inform the sender of the request that everything has been setup and ready to save/restore.
VM restore
VM restore is a low level atomic operation VM.restore. This operation is represented by a function call to backend. It uses Xenguest, a low-level utility from XAPI toolstack, to interact with the Xen hypervisor and libxc for sending a request of migration to the emu-manager.
After sending the request results coming from emu-manager are collected by the main thread. It blocks until results are received.
During the live migration, emu-manager helps in ensuring the correct state transitions for the devices and handling the message passing for the VM as it’s moved between hosts. This includes making sure that the state of the VM’s virtual devices, like disks or network interfaces, is correctly moved over.
VM renaming
Once all operations are done we can rename the VM on the target from its temporary name to its real UUID. This operation is another low level atomic one VM.rename that will take care of updating the xenstore on the destination.
The next step is the restauration of devices and unpause the domain.
Restoring remaining devices
Restoring devices starts by activating VBD using the low level atomic operation VBD.set_active. It is an update of Xenstore. VBDs that are read-write must be plugged before read-only ones. Once activated the low level atomic operation VBD.plug is called. VDI are attached and activate.
Next devices are VIFs that are set as active VIF.set_active and plug VIF.plug. If there are VGPUs we will set them as active now using the atomic VGPU.set_active.
We are almost done. The next step is to create the device model
create device model
Create device model is done by using the atomic operation VM.create_device_model. This will configure qemu-dm and started. This allow to manage PCI devices.
PCI plug
PCI.plug is executed by the backend. It plugs a PCI device and advertise it to QEMU if this option is set. It is the case for NVIDIA SR-IOV vGPUS.
At this point devices have been restored. The new domain is considered survivable. We can unpause the domain and performs last actions
Unpause and done
Unpause is done by managing the state of the domain using bindings to xenctrl. Once hypervisor has unpaused the domain some actions can be requested using VM.set_domain_action_request. It is a path in xenstore. By default no action is done but a reboot can be for example initiated.
Previously we spoke about some points called hooks at which xenopsd
can execute some script. There
is also a hook to run a post migrate script. After the execution of the script if there is one
the migration is almost done. The last step is a handskake to seal the success of the migration
and the old VM can now be cleaned.
Links
Some links are old but even if many changes occured they are relevant for a global understanding of the XAPI toolstack.