Domain.build
Overview
flowchart LR subgraph xenopsd VM_build[ xenopsd thread pool with two VM_build micro#8209;ops: During parallel VM_start, Many threads run this in parallel! ] direction LR build_domain_exn[ VM.build_domain_exn from thread pool Thread #1 ] --> Domain.build Domain.build --> build_pre build_pre --> wait_xen_free_mem build_pre -->|if NUMA/Best_effort| numa_placement Domain.build --> xenguest[Invoke xenguest] click Domain.build "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210" _blank click build_domain_exn "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225" _blank click wait_xen_free_mem "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272" _blank click numa_placement "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897" _blank click build_pre "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964" _blank click xenguest "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1139-L1146" _blank build_domain_exn2[ VM.build_domain_exn from thread pool Thread #2] --> Domain.build2[Domain.build] Domain.build2 --> build_pre2[build_pre] build_pre2 --> wait_xen_free_mem2[wait_xen_free_mem] build_pre2 -->|if NUMA/Best_effort| numa_placement2[numa_placement] Domain.build2 --> xenguest2[Invoke xenguest] click Domain.build2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1111-L1210" _blank click build_domain_exn2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/xenops_server_xen.ml#L2222-L2225" _blank click wait_xen_free_mem2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L236-L272" _blank click numa_placement2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L862-L897" _blank click build_pre2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L899-L964" _blank click xenguest2 "https://github.com/xapi-project/xen-api/blob/master/ocaml/xenopsd/xc/domain.ml#L1139-L1146" _blank end
VM.build_domain_exn
calls
Domain.build
to call:
- build_preto prepare the build of a VM:- If the xeconfignuma_placementis set toBest_effort, invoke the NUMA placement algorithm.
- Run xenguest
 
- If the 
- xenguestto invoke the xenguest program to setup the domain’s system memory.
build_pre: Prepare building the VM
Domain.build calls build_pre (which is also used for VM restore) to:
- Call wait_xen_free_mem to wait (if necessary), for the Xen memory scrubber to catch up reclaiming memory. It - calls Xenctrl.physinfowhich returns:- hostinfo.free_pages- the free and already scrubbed pages (available)
- host.scrub_pages- the not yet scrubbed pages (not yet available)
 
- repeats this until a timeout as long as free_pagesis lower than the required pages- unless if scrub_pagesis 0 (no scrubbing left to do)
 
- unless if 
 - Note: - free_pagesis system-wide memory, not memory specific to a NUMA node. Because this is not NUMA-aware, in case of temporary node-specific memory shortage, this check is not sufficient to prevent the VM from being spread over all NUMA nodes. It is planned to resolve this issue by claiming NUMA node memory during NUMA placement.
- calls 
- Call the hypercall to set the timer mode 
- Call the hypercall to set the number of vCPUs 
- Call the - numa_placementfunction as described in the NUMA feature description when the- xeconfiguration option- numa_placementis set to- Best_effort(except when the VM has a hard CPU affinity).- match !Xenops_server.numa_placement with | Any -> () | Best_effort -> log_reraise (Printf.sprintf "NUMA placement") (fun () -> if has_hard_affinity then D.debug "VM has hard affinity set, skipping NUMA optimization" else numa_placement domid ~vcpus ~memory:(Int64.mul memory.xen_max_mib 1048576L) )
NUMA placement
build_pre passes the domid, the number of vCPUs and xen_max_mib to the
numa_placement
function to run the algorithm to find the best NUMA placement.
When it returns a NUMA node to use, it calls the Xen hypercalls to set the vCPU affinity to this NUMA node:
  let vm = NUMARequest.make ~memory ~vcpus in
  let nodea =
    match !numa_resources with
    | None ->
        Array.of_list nodes
    | Some a ->
        Array.map2 NUMAResource.min_memory (Array.of_list nodes) a
  in
  numa_resources := Some nodea ;
  Softaffinity.plan ~vm host nodeaBy using the default auto_node_affinity feature of Xen,
setting the vCPU affinity causes the Xen hypervisor to activate
NUMA node affinity for memory allocations to be aligned with
the vCPU affinity of the domain.
Summary: This passes the information to the hypervisor that memory allocation for this domain should preferably be done from this NUMA node.
Invoke the xenguest program
With the preparation in build_pre completed, Domain.build
calls
the xenguest function to invoke the xenguest program to build the domain.
Notes on future design improvements
The Xen domain feature flag domain->auto_node_affinity can be disabled by calling xc_domain_node_setaffinity() to set a specific NUMA node affinity in special cases:
This can be used, for example, when there might not be enough memory on the preferred NUMA node, and there are other NUMA nodes (in the same CPU package) to use (reference).