xc_domain_node_setaffinity()
xc_domain_node_setaffinity()
controls the NUMA node affinity of a domain,
but it only updates the Xen hypervisor domain’s d->node_affinity
mask.
This mask is read by the Xen memory allocator as the 2nd preference for the
NUMA node to allocate memory from for this domain.
[!info] Preferences of the Xen memory allocator:
- A NUMA node passed to the allocator directly takes precedence, if present.
- Then, if the allocation is for a domain, it’s
node_affinity
mask is tried.- Finally, it falls back to spread the pages over all remaining NUMA nodes.
As this call has no practical effect on the Xen scheduler, vCPU affinities need to be set separately anyways.
The domain’s auto_node_affinity
flag is enabled by default by Xen. This means
that when setting vCPU affinities, Xen updates the d->node_affinity
mask
to consist of the NUMA nodes to which its vCPUs have affinity to.
See xc_vcpu_setaffinity() for more information
on how d->auto_node_affinity
is used to set the NUMA node affinity.
Thus, so far, there is no obvious need to call xc_domain_node_setaffinity()
when building a domain.
Setting the NUMA node affinity using this call can be used, for example, when there might not be enough memory on the preferred NUMA node, but there are other NUMA nodes that have enough free memory to be used for the system memory of the domain.
In terms of future NUMA design, it might be even more favourable to
have a strategy in xenguest
where in such cases, the superpages
of the preferred node are used first and a fallback to neighbouring
NUMA nodes only happens to the extent necessary.
Likely, the future allocation strategy should be passed to xenguest
using Xenstore like the other platform parameters for the VM.
Walk-through of xc_domain_node_setaffinity()
classDiagram class `xc_domain_node_setaffinity()` { +xch: xc_interface #42; +domid: uint32_t +nodemap: xc_nodemap_t 0(on success) -EINVAL(if a node in the nodemask is not online) } click `xc_domain_node_setaffinity()` href " https://github.com/xen-project/xen/blob/master/tools/libs/ctrl/xc_domain.c#L122-L158" `xc_domain_node_setaffinity()` --> `Xen hypercall: do_domctl()` `xc_domain_node_setaffinity()` <-- `Xen hypercall: do_domctl()` class `Xen hypercall: do_domctl()` { Calls domain_set_node_affinity#40;#41; and returns its return value Passes: domain (struct domain *, looked up using the domid) Passes: new_affinity (modemask, converted from xc_nodemap_t) } click `Xen hypercall: do_domctl()` href " https://github.com/xen-project/xen/blob/master/xen/common/domctl.c#L516-L525" `Xen hypercall: do_domctl()` --> `domain_set_node_affinity()` `Xen hypercall: do_domctl()` <-- `domain_set_node_affinity()` class `domain_set_node_affinity()` { domain: struct domain new_affinity: nodemask 0(on success, the domain's node_affinity is updated) -EINVAL(if a node in the nodemask is not online) } click `domain_set_node_affinity()` href " https://github.com/xen-project/xen/blob/master/xen/common/domain.c#L943-L970"
domain_set_node_affinity()
This function implements the functionality of xc_domain_node_setaffinity
to set the NUMA affinity of a domain as described above.
If the new_affinity does not intersect the node_online_map
,
it returns -EINVAL
. Otherwise, the result is a success, and it returns 0
.
When the new_affinity
is a specific set of NUMA nodes, it updates the NUMA
node_affinity
of the domain to these nodes and disables d->auto_node_affinity
for this domain. With d->auto_node_affinity
disabled,
xc_vcpu_setaffinity() no longer updates the NUMA affinity
of this domain.
If new_affinity
has all bits set, it re-enables the d->auto_node_affinity
for this domain and calls
domain_update_node_aff()
to re-set the domain’s node_affinity
mask to the NUMA nodes of the current
the hard and soft affinity of the domain’s online vCPUs.
Flowchart in relation to xc_set_vcpu_affinity()
The effect of domain_set_node_affinity()
can be seen more clearly on this
flowchart which shows how xc_set_vcpu_affinity()
is currently used to set
the NUMA affinity of a new domain, but also shows how domain_set_node_affinity()
relates to it:
In the flowchart, two code paths are set in bold:
- Show the path when
Host.numa_affinity_policy
is the default (off) inxenopsd
. - Show the default path of
xc_vcpu_setaffinity(XEN_VCPUAFFINITY_SOFT)
in Xen, when the Domain’sauto_node_affinity
flag is enabled (default) to show how it changes to the vCPU affinity update the domain’snode_affinity
in this default case as well.
xenguest uses the Xenstore to read the static domain configuration that it needs reads to build the domain.
flowchart TD subgraph VM.create["xenopsd VM.create"] %% Is xe vCPU-params:mask= set? If yes, write to Xenstore: is_xe_vCPUparams_mask_set?{" Is <tt>xe vCPU-params:mask=</tt> set? Example: <tt>1,2,3</tt> (Is used to enable vCPU<br>hard-affinity) "} --"yes"--> set_hard_affinity("Write hard-affinity to XenStore: <tt>platform/vcpu/#domid/affinity</tt> (xenguest will read this and other configuration data from Xenstore)") end subgraph VM.build["xenopsd VM.build"] %% Labels of the decision nodes is_Host.numa_affinity_policy_set?{ Is<p><tt>Host.numa_affinity_policy</tt><p>set?} has_hard_affinity?{ Is hard-affinity configured in <p><tt>platform/vcpu/#domid/affinity</tt>?} %% Connections from VM.create: set_hard_affinity --> is_Host.numa_affinity_policy_set? is_xe_vCPUparams_mask_set? == "no"==> is_Host.numa_affinity_policy_set? %% The Subgraph itself: %% Check Host.numa_affinity_policy is_Host.numa_affinity_policy_set? %% If Host.numa_affinity_policy is "best_effort": -- Host.numa_affinity_policy is<p><tt>best_effort --> %% If has_hard_affinity is set, skip numa_placement: has_hard_affinity? --"yes"-->exec_xenguest %% If has_hard_affinity is not set, run numa_placement: has_hard_affinity? --"no"-->numa_placement-->exec_xenguest %% If Host.numa_affinity_policy is off (default, for now), %% skip NUMA placement: is_Host.numa_affinity_policy_set? =="default: disabled"==> exec_xenguest end %% xenguest subgraph subgraph xenguest exec_xenguest ==> stub_xc_hvm_build("<tt>stub_xc_hvm_build()") ==> configure_vcpus("<tT>configure_vcpus()") %% Decision ==> set_hard_affinity?{" Is <tt>platform/<br>vcpu/#domid/affinity</tt> set?"} end %% do_domctl Hypercalls numa_placement --Set the NUMA placement using soft-affinity--> XEN_VCPUAFFINITY_SOFT("<tt>xc_vcpu_setaffinity(SOFT)") ==> do_domctl set_hard_affinity? --yes--> XEN_VCPUAFFINITY_HARD("<tt>xc_vcpu_setaffinity(HARD)") --> do_domctl xc_domain_node_setaffinity("<tt>xc_domain_node_setaffinity()</tt> and <tt>xc_domain_node_getaffinity()") <--> do_domctl %% Xen subgraph subgraph xen[Xen Hypervisor] subgraph domain_update_node_affinity["domain_update_node_affinity()"] domain_update_node_aff("<tt>domain_update_node_aff()") ==> check_auto_node{"Is domain's<br><tt>auto_node_affinity</tt><br>enabled?"} =="yes (default)"==>set_node_affinity_from_vcpu_affinities(" Calculate the domain's <tt>node_affinity</tt> mask from vCPU affinity (used for further NUMA memory allocation for the domain)") end do_domctl{"do_domctl()<br>op->cmd=?"} ==XEN_DOMCTL_setvcpuaffinity==> vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity") ==>domain_update_node_aff do_domctl --XEN_DOMCTL_setnodeaffinity (not used currently) -->is_new_affinity_all_nodes? subgraph domain_set_node_affinity["domain_set_node_affinity()"] is_new_affinity_all_nodes?{new_affinity<br>is #34;all#34;?} --is #34;all#34; --> enable_auto_node_affinity("<tt>auto_node_affinity=1") --> domain_update_node_aff is_new_affinity_all_nodes? --not #34;all#34; --> disable_auto_node_affinity("<tt>auto_node_affinity=0") --> domain_update_node_aff end %% setting and getting the struct domain's node_affinity: disable_auto_node_affinity --node_affinity=new_affinity--> domain_node_affinity set_node_affinity_from_vcpu_affinities ==> domain_node_affinity@{ shape: bow-rect,label: "domain: node_affinity" } --XEN_DOMCTL_getnodeaffinity--> do_domctl end click is_Host.numa_affinity_policy_set? "https://github.com/xapi-project/xen-api/blob/90ef043c1f3a3bc20f1c5d3ccaaf6affadc07983/ocaml/xenopsd/xc/domain.ml#L951-L962" click numa_placement "https://github.com/xapi-project/xen-api/blob/90ef043c/ocaml/xenopsd/xc/domain.ml#L862-L897" click stub_xc_hvm_build "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436" _blank click get_flags "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1164-L1288" _blank click do_domctl "https://github.com/xen-project/xen/blob/7cf163879/xen/common/domctl.c#L282-L894" _blank click domain_set_node_affinity "https://github.com/xen-project/xen/blob/7cf163879/xen/common/domain.c#L943-L970" _blank click configure_vcpus "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1297-L1348" _blank click set_hard_affinity? "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1305-L1326" _blank click xc_vcpu_setaffinity "https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank click vcpu_set_affinity "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank click domain_update_node_aff "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank click check_auto_node "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank click set_node_affinity_from_vcpu_affinities "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank
xc_domain_node_setaffinity
can be used to set the domain’s node_affinity
(which is normally set by xc_set_vcpu_affinity
) to different NUMA nodes.
No effect on the Xen scheduler
Currently, the node affinity does not affect the Xen scheudler:
In case d->node_affinity
would be set before vCPU creation, the initial pCPU
of the new vCPU is the first pCPU of the first NUMA node in the domain’s
node_affinity
. This is further changed when one of more cpupools
are set up.
As this is only the initial pCPU of the vCPU, this alone does not change the
scheduling of Xen Credit scheduler as it reschedules the vCPUs to other pCPUs.
Notes on future design improvements
It may be possible to call it before vCPUs are created
When done early, before vCPU creation, some domain-related data structures
could be allocated using the domain’s d->node_affinity
NUMA node mask.
With further changes in Xen and xenopsd
, Xen could allocate the vCPU structs
on the affine NUMA nodes of the domain.
For this, would be that xenopsd
would have to call xc_domain_node_setaffinity()
before vCPU creation, after having decided the domain’s NUMA placement,
preferably including claiming the required memory for the domain to ensure
that the domain will be populated from the same NUMA node(s).
This call cannot influence the past: The xenopsd
VM_create
micro-ops calls Xenctrl.domain_create
. It currently creates
the domain’s data structures before numa_placement
was done.
Improving Xenctrl.domain_create
to pass a NUMA node
for allocating the Hypervisor’s data structures (e.g. vCPU)
of the domain would require changes
to the Xen hypervisor and the xenopsd
xenopsd VM_create
micro-op.