Xenopsd instances run on a host and manage VMs on behalf of clients. This picture shows 3 different Xenopsd instances: 2 named “xenopsd-xc” and 1 named “xenopsd-xenlight”.
Each instance is responsible for managing a disjoint set of VMs. Clients should never ask more than one Xenopsd to manage the same VM. Managing a VM means:
For a full list of features, consult the features list.
Each Xenopsd instance has a unique name on the host. A typical name is
A higher-level tool, such as xapi will associate VMs with individual Xenopsd names.
Running multiple Xenopsds is necessary because
Communication with Xenopsd is handled through a Xapi-global library: xcp-idl. This library supports
This library allows the communication details to be changed without having to change all the Xapi clients and servers.
Xenopsd has a number of “backends” which perform the low-level VM operations such as (on Xen) “create domain” “hotplug disk” “destroy domain”. These backends contain all the hypervisor-specific code including
The following diagram shows the internal structure of Xenopsd:
At the top of the diagram two client RPC have been sent: one to start a VM and the other to fetch the latest events. The RPCs are all defined in xcp-idl/xen/xenops_interface.ml. The RPCs are received by the Xenops_server module and decomposed into “micro-ops” (labelled “μ op”). These micro ops represent actions like
Each of these micro-ops is represented by a function call in a “backend plugin” interface. The micro-ops are enqueued in queues, one queue per VM. There is a thread pool (whose size can be changed dynamically by the admin) which pulls micro-ops from the VM queues and calls the corresponding backend function.
The active backend (there can only be one backend per Xenopsd instance) executes the micro-ops. The Xenops_server_xen backend in the picture above talks to libxc, libxl and qemu to create and destroy domains. The backend also talks to other Xapi services, in particular
Xenopsd backends are also responsible for monitoring running VMs. In the Xenops_server_xen backend this is done by watching Xenstore for
When such an event happens (for example: @releaseDomain sent when a domain requests a reboot) the corresponding operation does not happen inline. Instead the event is rebroadcast upwards to Xenops_server as a signal (for example: “VM id needs some attention”) and a “VM_stat” micro-op is queued in the appropriate queue. Xenopsd does not allow operations to run on the same VM in parallel and enforces this by:
The event takes the form “VM id needs some attention” and not “VM id needs to be rebooted” because, by the time the queue is flushed, the VM may well now be in a different state. Perhaps rather than being rebooted it now needs to be shutdown; or perhaps the domain is now in a good state because the reboot has already happened. The signals sent by the backend to the Xenops_server are a bit like event channel notifications in the Xen ring protocols: they are requests to ask someone to perform work, they don’t themselves describe the work that needs to be done.
An implication of this design is that it should always be possible to answer the question, “what operation should be performed to get the VM into a valid state?”. If an operation is cancelled half-way through or if Xenopsd is suddenly restarted, it will ask the question about all the VMs and perform the necessary operations. The operations must be designed carefully to make this work. For example if Xenopsd is restarted half-way through starting a VM, it must be obvious on restart that the VM should either be forcibly shutdown or rebooted to make it a valid state again. Note: we don’t demand that operations are performed as transactions; we only demand that the state they leave the system be “sensible” in the sense that the admin will recognise it and be able to continue their work.
Sometimes this can be achieved through careful ordering of side-effects within the operations, taking advantage of artifacts of the system such as:
In the absense of “tells” from the system, operations are expected to journal their intentions and support restart after failure.
There are three categories of metadata associated with VMs:
VM and VmExtra metadata is stored by Xenopsd in the domain 0 filesystem, in a simple directory hierarchy.