Walkthrough: Migrating a VM

At the end of this walkthrough, a sequence diagram of the overall process is included.

Invocation

The command to migrate the VM is dispatched by the autogenerated dispatch_call function from xapi/server.ml. For more information about the generated functions you can have a look to XAPI IDL model.

The command triggers the operation VM_migrate that uses many low level atomics operations. These are:

The migrate command has several parameters such as:

Should it be started asynchronously,
Should it be forwarded to another host,
How arguments should be marshalled, and so on.

A new thread is created by xapi/server_helpers.ml to handle the command asynchronously. The helper thread checks if the command should be passed to the message forwarding layer in order to be executed on another host (the destination) or locally (if it is already at the destination host).

It will finally reach xapi/api_server.ml that will take the action of posted a command to the message broker message switch. It is a JSON-RPC HTTP request sends on a Unix socket to communicate between some XAPI daemons. In the case of the migration this message sends by XAPI will be consumed by the xenopsd daemon that will do the job of migrating the VM.

Overview

The migration is an asynchronous task and a thread is created to handle this task. The task reference is returned to the client, which can then check its status until completion.

As shown in the introduction, xenopsd fetches the VM_migrate operation from the message broker.

All tasks specific to libxenctrl, xenguest and Xenstore are handled by the xenopsd xc backend.

The entities that need to be migrated are: VDI, VIF, VGPU and PCI components.

During the migration process, the destination domain will be built with the same UUID as the original VM, except that the last part of the UUID will be XXXXXXXX-XXXX-XXXX-XXXX-000000000001. The original domain will be removed using XXXXXXXX-XXXX-XXXX-XXXX-000000000000.

Preparing VM migration

At specific places, xenopsd can execute hooks to run scripts. In case a pre-migrate script is in place, a command to run this script is sent to the original domain.

Likewise, a command is sent to Qemu using the Qemu Machine Protocol (QMP) to check that the domain can be suspended (see xenopsd/xc/device_common.ml). After checking with Qemu that the VM is can be suspended, the migration can begin.

Importing metadata

As for hooks, commands to source domain are sent using stunnel a daemon which is used as a wrapper to manage SSL encryption communication between two hosts on the same pool. To import the metadata, an XML RPC command is sent to the original domain.

Once imported, it will give us a reference id and will allow building the new domain on the destination using the temporary VM uuid XXXXXXXX-XXXX-XXXX-XXXX-000000000001 where XXX... is the reference id of the original VM.

Memory setup

One of the first steps the setup of the VM’s memory: The backend checks that there is no ballooning operation in progress. If so, the migration could fail.

Once memory has been checked, the daemon will get the state of the VM (running, halted, …) and The backend retrieves the domain’s platform data (memory, vCPUs setc) from the Xenstore.

Once this is complete, we can restore VIF and create the domain.

The synchronisation of the memory is the first point of synchronisation and everything is ready for VM migration.

Destination VM setup

After receiving memory we can set up the destination domain. If we have a vGPU we need to kick off its migration process. We will need to wait for the acknowledgement that the GPU entry has been successfully initialized before starting the main VM migration.

The receiver informs the sender using a handshake protocol that everything is set up and ready for save/restore.

Destination VM restore

VM restore is a low level atomic operation VM.restore. This operation is represented by a function call to backend. It uses Xenguest, a low-level utility from XAPI toolstack, to interact with the Xen hypervisor and libxc for sending a migration request to the emu-manager.

After sending the request results coming from emu-manager are collected by the main thread. It blocks until results are received.

During the live migration, emu-manager helps in ensuring the correct state transitions for the devices and handling the message passing for the VM as it’s moved between hosts. This includes making sure that the state of the VM’s virtual devices, like disks or network interfaces, is correctly moved over.

Destination VM rename

Once all operations are done, xenopsd renames the target VM from its temporary name to its real UUID. This operation is a low-level atomic VM.rename which takes care of updating the Xenstore on the destination host.

Restoring devices

Restoring devices starts by activating VBD using the low level atomic operation VBD.set_active. It is an update of Xenstore. VBDs that are read-write must be plugged before read-only ones. Once activated the low level atomic operation VBD.plug is called. VDI are attached and activate.

Next devices are VIFs that are set as active VIF.set_active and plug VIF.plug. If there are VGPUs we will set them as active now using the atomic VGPU.set_active.

Creating the device model

create_device_model configures qemu-dm and starts it. This allows to manage PCI devices.

PCI plug

PCI.plug is executed by the backend. It plugs a PCI device and advertises it to QEMU if this option is set. It is the case for NVIDIA SR-IOV vGPUs.

Unpause

The libxenctrl call xc_domain_unpause() unpauses the domain, and it starts running.

Cleanup

VM_set_domain_action_request marks the domain as alive: In case xenopsd restarts, it no longer reboots the VM. See the chapter on marking domains as alive for more information.
If a post-migrate script is in place, it is executed by the Xenops_hooks.VM_post_migrate hook.
The final step is a handshake to seal the success of the migration and the old VM can now be cleaned up.

Syncronisation point 4 has been reached, the migration is complete.

Live migration flowchart

This flowchart gives a visual representation of the VM migration workflow:

sequenceDiagram
autonumber
participant tx as sender
participant rx0 as receiver thread 0
participant rx1 as receiver thread 1
participant rx2 as receiver thread 2

activate tx
tx->>rx0: VM.import_metadata
tx->>tx: Squash memory to dynamic-min

tx->>rx1: HTTP /migrate/vm
activate rx1
rx1->>rx1: VM_receive_memory<br/>VM_create (00000001)<br/>VM_restore_vifs
rx1->>tx: handshake (control channel)<br/>Synchronisation point 1

tx->>rx2: HTTP /migrate/mem
activate rx2
rx2->>tx: handshake (memory channel)<br/>Synchronisation point 1-mem

tx->>rx1: handshake (control channel)<br/>Synchronisation point 1-mem ACK

rx2->>rx1: memory fd

tx->>rx1: VM_save/VM_restore<br/>Synchronisation point 2
tx->>tx: VM_rename
rx1->>rx2: exit
deactivate rx2

tx->>rx1: handshake (control channel)<br/>Synchronisation point 3

rx1->>rx1: VM_rename<br/>VM_restore_devices<br/>VM_unpause<br/>VM_set_domain_action_request

rx1->>tx: handshake (control channel)<br/>Synchronisation point 4

deactivate rx1

tx->>tx: VM_shutdown<br/>VM_remove
deactivate tx

References

These pages might help for a better understanding of the XAPI toolstack:

See the XAPI architecture for the overall architecture of Xapi
See the XAPI dispatcher for service dispatch and message forwarding
See the Xenopsd architecture for the overall architecture of Xenopsd
See the How Xen suspend and resume works for very similar operations in more detail.