Walkthrough: Migrating a VM
At the end of this walkthrough, a sequence diagram of the overall process is included.
Invocation
The command to migrate the VM is dispatched
by the autogenerated dispatch_call
function from xapi/server.ml. For
more information about the generated functions you can have a look to
XAPI IDL model.
The command triggers the operation VM_migrate that uses many low level atomics operations. These are:
- VM.restore
- VM.rename
- VBD.set_active
- VBD.plug
- VIF.set_active
- VGPU.set_active
- VM.create_device_model
- PCI.plug
The migrate command has several parameters such as:
- Should it be started asynchronously,
- Should it be forwarded to another host,
- How arguments should be marshalled, and so on.
A new thread is created by xapi/server_helpers.ml to handle the command asynchronously. The helper thread checks if the command should be passed to the message forwarding layer in order to be executed on another host (the destination) or locally (if it is already at the destination host).
It will finally reach xapi/api_server.ml that will take the action of posted a command to the message broker message switch. It is a JSON-RPC HTTP request sends on a Unix socket to communicate between some XAPI daemons. In the case of the migration this message sends by XAPI will be consumed by the xenopsd daemon that will do the job of migrating the VM.
Overview
The migration is an asynchronous task and a thread is created to handle this task. The task reference is returned to the client, which can then check its status until completion.
As shown in the introduction, xenopsd fetches the VM_migrate operation from the message broker.
All tasks specific to libxenctrl, xenguest and Xenstore are handled by the xenopsd xc backend.
The entities that need to be migrated are: VDI, VIF, VGPU and PCI components.
During the migration process, the destination domain will be built with the same
UUID as the original VM, except that the last part of the UUID will be
XXXXXXXX-XXXX-XXXX-XXXX-000000000001
. The original domain will be removed using
XXXXXXXX-XXXX-XXXX-XXXX-000000000000
.
Preparing VM migration
At specific places, xenopsd
can execute hooks to run scripts.
In case a pre-migrate script is in place, a command to run this script
is sent to the original domain.
Likewise, a command is sent to Qemu using the Qemu Machine Protocol (QMP) to check that the domain can be suspended (see xenopsd/xc/device_common.ml). After checking with Qemu that the VM is can be suspended, the migration can begin.
Importing metadata
As for hooks, commands to source domain are sent using stunnel a daemon which is used as a wrapper to manage SSL encryption communication between two hosts on the same pool. To import the metadata, an XML RPC command is sent to the original domain.
Once imported, it will give us a reference id and will allow building the new domain
on the destination using the temporary VM uuid XXXXXXXX-XXXX-XXXX-XXXX-000000000001
where XXX...
is the reference id of the original VM.
Memory setup
One of the first steps the setup of the VM’s memory: The backend checks that there is no ballooning operation in progress. If so, the migration could fail.
Once memory has been checked, the daemon will get the state of the VM (running, halted, …) and The backend retrieves the domain’s platform data (memory, vCPUs setc) from the Xenstore.
Once this is complete, we can restore VIF and create the domain.
The synchronisation of the memory is the first point of synchronisation and everything is ready for VM migration.
Destination VM setup
After receiving memory we can set up the destination domain. If we have a vGPU we need to kick off its migration process. We will need to wait for the acknowledgement that the GPU entry has been successfully initialized before starting the main VM migration.
The receiver informs the sender using a handshake protocol that everything is set up and ready for save/restore.
Destination VM restore
VM restore is a low level atomic operation VM.restore.
This operation is represented by a function call to backend.
It uses Xenguest, a low-level utility from XAPI toolstack, to interact with the Xen hypervisor
and libxc
for sending a migration request to the emu-manager.
After sending the request results coming from emu-manager are collected by the main thread. It blocks until results are received.
During the live migration, emu-manager helps in ensuring the correct state transitions for the devices and handling the message passing for the VM as it’s moved between hosts. This includes making sure that the state of the VM’s virtual devices, like disks or network interfaces, is correctly moved over.
Destination VM rename
Once all operations are done, xenopsd
renames the target VM from its temporary
name to its real UUID. This operation is a low-level atomic
VM.rename
which takes care of updating the Xenstore on the destination host.
Restoring devices
Restoring devices starts by activating VBD using the low level atomic operation VBD.set_active. It is an update of Xenstore. VBDs that are read-write must be plugged before read-only ones. Once activated the low level atomic operation VBD.plug is called. VDI are attached and activate.
Next devices are VIFs that are set as active VIF.set_active and plug VIF.plug. If there are VGPUs we will set them as active now using the atomic VGPU.set_active.
Creating the device model
create_device_model configures qemu-dm and starts it. This allows to manage PCI devices.
PCI plug
PCI.plug is executed by the backend. It plugs a PCI device and advertises it to QEMU if this option is set. It is the case for NVIDIA SR-IOV vGPUs.
Unpause
The libxenctrl call xc_domain_unpause() unpauses the domain, and it starts running.
Cleanup
VM_set_domain_action_request marks the domain as alive: In case
xenopsd
restarts, it no longer reboots the VM. See the chapter on marking domains as alive for more information.If a post-migrate script is in place, it is executed by the Xenops_hooks.VM_post_migrate hook.
The final step is a handshake to seal the success of the migration and the old VM can now be cleaned up.
Syncronisation point 4 has been reached, the migration is complete.
Live migration flowchart
This flowchart gives a visual representation of the VM migration workflow:
sequenceDiagram autonumber participant tx as sender participant rx0 as receiver thread 0 participant rx1 as receiver thread 1 participant rx2 as receiver thread 2 activate tx tx->>rx0: VM.import_metadata tx->>tx: Squash memory to dynamic-min tx->>rx1: HTTP /migrate/vm activate rx1 rx1->>rx1: VM_receive_memory<br/>VM_create (00000001)<br/>VM_restore_vifs rx1->>tx: handshake (control channel)<br/>Synchronisation point 1 tx->>rx2: HTTP /migrate/mem activate rx2 rx2->>tx: handshake (memory channel)<br/>Synchronisation point 1-mem tx->>rx1: handshake (control channel)<br/>Synchronisation point 1-mem ACK rx2->>rx1: memory fd tx->>rx1: VM_save/VM_restore<br/>Synchronisation point 2 tx->>tx: VM_rename rx1->>rx2: exit deactivate rx2 tx->>rx1: handshake (control channel)<br/>Synchronisation point 3 rx1->>rx1: VM_rename<br/>VM_restore_devices<br/>VM_unpause<br/>VM_set_domain_action_request rx1->>tx: handshake (control channel)<br/>Synchronisation point 4 deactivate rx1 tx->>tx: VM_shutdown<br/>VM_remove deactivate tx
References
These pages might help for a better understanding of the XAPI toolstack:
- See the XAPI architecture for the overall architecture of Xapi
- See the XAPI dispatcher for service dispatch and message forwarding
- See the Xenopsd architecture for the overall architecture of Xenopsd
- See the How Xen suspend and resume works for very similar operations in more detail.