libxenctrl

Subsections of libxenctrl

xc_domain_claim_pages()

Purpose

The purpose of xc_domain_claim_pages() is to attempt to stake a claim on an amount of memory for a given domain which guarantees that memory allocations for the claimed amount will be successful.

The domain can still attempt to allocate beyond the claim, but those are not guaranteed to be successful and will fail if the domain’s memory reaches it’s max_mem value.

Each domain can only have one claim, and the domid is the key of the claim. By killing the domain, the claim is also released.

Depending on the given size argument, the remaining stack of the domain can be set initially, updated to the given amount, or reset to no claim (0).

Management of claims

  • The stake is centrally managed by the Xen hypervisor using a Hypercall.
  • Claims are not reflected in the amount of free memory reported by Xen.

Reporting of claims

  • xl claims reports the outstanding claims of the domains:

    [!info] Sample output of xl claims:

    Name         ID   Mem VCPUs      State   Time(s)  Claimed
    Domain-0      0  2656     8     r-----  957418.2     0
  • xl info reports the host-wide outstanding claims:

    [!info] Sample output from xl info | grep outstanding:

    outstanding_claims     : 0

Tracking of claims

Xen only tracks:

  • the outstanding claims of each domain and
  • the outstanding host-wide claims.

Claiming zero pages effectively cancels the domain’s outstanding claim and is always successful.

[!info]

  • Allocations for outstanding claims are expected to always be successful.
  • But this reduces the amount of outstanding claims if the domain.
  • Freeing memory of the domain increases the domain’s claim again:
    • But, when a domain consumes its claim, it is reset.
    • When the claim is reset, freed memory is longer moved to the outstanding claims!
    • It would have to get a new claim on memory to have spare memory again.

[!warning] The domain’s max_mem value is used to deny memory allocation If an allocation would cause the domain to exceed it’s max_mem value, it will always fail.

Implementation

Function signature of the libXenCtrl function to call the Xen hypercall:

long xc_memory_op(libxc_handle, XENMEM_claim_pages, struct xen_memory_reservation *)

struct xen_memory_reservation is defined as :

struct xen_memory_reservation {
    .nr_extents   = nr_pages, /* number of pages to claim */
    .extent_order = 0,        /* an order 0 means: 4k pages, only 0 is allowed */
    .mem_flags    = 0,        /* no flags, only 0 is allowed (at the moment) */
    .domid        = domid     /* numerical domain ID of the domain */
};

Concurrency

Xen protects the consistency of the stake of the domain using the domain’s page_alloc_lock and the global heap_lock of Xen. Thse spin-locks prevent any “time-of-check-time-of-use” races. As the hypercall needs to take those spin-locks, it cannot be preempted.

Return value

The call returns 0 if the hypercall successfully claimed the requested amount of memory, else it returns non-zero.

Current users

libxl and the xl CLI

If the struct xc_dom_image passed by libxl to the libxenguest functions meminit_hvm() and meminit_pv() has it’s claim_enabled field set, they, before allocating the domain’s system memory using the allocation function xc_populate_physmap() which calls the hypercall to allocate and populate the domain’s main system memory, will attempt to claim the to-be allocated memory using a call to xc_domain_claim_pages(). In case this fails, they do not attempt to continue and return the error code of xc_domain_claim_pages().

Both functions also (unconditionally) reset the claim upon return.

But, the xl CLI uses this functionality (unless disabled in xl.conf) to make building the domains fail to prevent running out of memory inside the meminit_hvm and meminit_pv calls. Instead, they immediately return an error.

This means that in case the claim fails, xl avoids:

  • The effort of allocating the memory, thereby not blocking it for other domains.
  • The effort of potentially needing to scrub the memory after the build failure.

xenguest

While xenguest calls the libxenguest functions meminit_hvm() and meminit_pv() like libxl does, it does not set struct xc_dom_image.claim_enabled, so it does not enable the first call to xc_domain_claim_pages() which would claim the amount of memory that these functions will attempt to allocate and populate for the domain.

Future design ideas for improved NUMA support

For improved support for NUMA, xenopsd may want to call an updated version of this function for the domain, so it has a stake on the NUMA node’s memory before xenguest will allocate for the domain before assigning an NUMA node to a new domain.

Further, as PV drivers unmap and free memory for grant tables to Xen and then re-allocate memory for those grant tables, xenopsd may want to try to stake a very small claim for the domain on the NUMA node of the domain so that Xen can increase this claim when the PV drivers free this memory and re-use the resulting claimed amount for allocating the grant tables. This would ensure that the grant tables are then allocated on the local NUMA node of the domain, avoiding remote memory accesses when accessing the grant tables from inside the domain.

Note: In case the corresponding backend process in Dom0 is running on another NUMA node, it would access the domain’s grant tables from a remote NUMA node, but in this would enable a future improvement for Dom0, where it could prefer to run the corresponding backend process on the same or a neighbouring NUMA node.

xc_domain_node_setaffinity()

xc_domain_node_setaffinity() controls the NUMA node affinity of a domain, but it only updates the Xen hypervisor domain’s d->node_affinity mask. This mask is read by the Xen memory allocator as the 2nd preference for the NUMA node to allocate memory from for this domain.

[!info] Preferences of the Xen memory allocator:

  1. A NUMA node passed to the allocator directly takes precedence, if present.
  2. Then, if the allocation is for a domain, it’s node_affinity mask is tried.
  3. Finally, it falls back to spread the pages over all remaining NUMA nodes.

As this call has no practical effect on the Xen scheduler, vCPU affinities need to be set separately anyways.

The domain’s auto_node_affinity flag is enabled by default by Xen. This means that when setting vCPU affinities, Xen updates the d->node_affinity mask to consist of the NUMA nodes to which its vCPUs have affinity to.

See xc_vcpu_setaffinity() for more information on how d->auto_node_affinity is used to set the NUMA node affinity.

Thus, so far, there is no obvious need to call xc_domain_node_setaffinity() when building a domain.

Setting the NUMA node affinity using this call can be used, for example, when there might not be enough memory on the preferred NUMA node, but there are other NUMA nodes that have enough free memory to be used for the system memory of the domain.

In terms of future NUMA design, it might be even more favourable to have a strategy in xenguest where in such cases, the superpages of the preferred node are used first and a fallback to neighbouring NUMA nodes only happens to the extent necessary.

Likely, the future allocation strategy should be passed to xenguest using Xenstore like the other platform parameters for the VM.

Walk-through of xc_domain_node_setaffinity()

--- theme: '' --- classDiagram class `xc_domain_node_setaffinity()` { +xch: xc_interface #42; +domid: uint32_t +nodemap: xc_nodemap_t 0(on success) -EINVAL(if a node in the nodemask is not online) } click `xc_domain_node_setaffinity()` href " https://github.com/xen-project/xen/blob/master/tools/libs/ctrl/xc_domain.c#L122-L158" `xc_domain_node_setaffinity()` --> `Xen hypercall: do_domctl()` `xc_domain_node_setaffinity()` <-- `Xen hypercall: do_domctl()` class `Xen hypercall: do_domctl()` { Calls domain_set_node_affinity#40;#41; and returns its return value Passes: domain (struct domain *, looked up using the domid) Passes: new_affinity (modemask, converted from xc_nodemap_t) } click `Xen hypercall: do_domctl()` href " https://github.com/xen-project/xen/blob/master/xen/common/domctl.c#L516-L525" `Xen hypercall: do_domctl()` --> `domain_set_node_affinity()` `Xen hypercall: do_domctl()` <-- `domain_set_node_affinity()` class `domain_set_node_affinity()` { domain: struct domain new_affinity: nodemask 0(on success, the domain's node_affinity is updated) -EINVAL(if a node in the nodemask is not online) } click `domain_set_node_affinity()` href " https://github.com/xen-project/xen/blob/master/xen/common/domain.c#L943-L970"

xc_domain_node_setaffinity()

+xch: xc_interface *

+domid: uint32_t

+nodemap: xc_nodemap_t

0(on success)

-EINVAL(if a node in the nodemask is not online)

Xen hypercall: do_domctl()

Calls domain_set_node_affinity() and returns its return value

Passes: domain(struct domain *, looked up using the domid)

Passes: new_affinity(modemask, converted from xc_nodemap_t)

domain_set_node_affinity()

domain: struct domain

new_affinity: nodemask

0(on success, the domain's node_affinity is updated)

-EINVAL(if a node in the nodemask is not online)

domain_set_node_affinity()

This function implements the functionality of xc_domain_node_setaffinity to set the NUMA affinity of a domain as described above. If the new_affinity does not intersect the node_online_map, it returns -EINVAL. Otherwise, the result is a success, and it returns 0.

When the new_affinity is a specific set of NUMA nodes, it updates the NUMA node_affinity of the domain to these nodes and disables d->auto_node_affinity for this domain. With d->auto_node_affinity disabled, xc_vcpu_setaffinity() no longer updates the NUMA affinity of this domain.

If new_affinity has all bits set, it re-enables the d->auto_node_affinity for this domain and calls domain_update_node_aff() to re-set the domain’s node_affinity mask to the NUMA nodes of the current the hard and soft affinity of the domain’s online vCPUs.

Flowchart in relation to xc_set_vcpu_affinity()

The effect of domain_set_node_affinity() can be seen more clearly on this flowchart which shows how xc_set_vcpu_affinity() is currently used to set the NUMA affinity of a new domain, but also shows how domain_set_node_affinity() relates to it:

In the flowchart, two code paths are set in bold:

  • Show the path when Host.numa_affinity_policy is the default (off) in xenopsd.
  • Show the default path of xc_vcpu_setaffinity(XEN_VCPUAFFINITY_SOFT) in Xen, when the Domain’s auto_node_affinity flag is enabled (default) to show how it changes to the vCPU affinity update the domain’s node_affinity in this default case as well.

xenguest uses the Xenstore to read the static domain configuration that it needs reads to build the domain.

--- theme: '' --- flowchart TD subgraph VM.create["xenopsd VM.create"] %% Is xe vCPU-params:mask= set? If yes, write to Xenstore: is_xe_vCPUparams_mask_set?{" Is <tt>xe vCPU-params:mask=</tt> set? Example: <tt>1,2,3</tt> (Is used to enable vCPU<br>hard-affinity) "} --"yes"--> set_hard_affinity("Write hard-affinity to XenStore: <tt>platform/vcpu/#domid/affinity</tt> (xenguest will read this and other configuration data from Xenstore)") end subgraph VM.build["xenopsd VM.build"] %% Labels of the decision nodes is_Host.numa_affinity_policy_set?{ Is<p><tt>Host.numa_affinity_policy</tt><p>set?} has_hard_affinity?{ Is hard-affinity configured in <p><tt>platform/vcpu/#domid/affinity</tt>?} %% Connections from VM.create: set_hard_affinity --> is_Host.numa_affinity_policy_set? is_xe_vCPUparams_mask_set? == "no"==> is_Host.numa_affinity_policy_set? %% The Subgraph itself: %% Check Host.numa_affinity_policy is_Host.numa_affinity_policy_set? %% If Host.numa_affinity_policy is "best_effort": -- Host.numa_affinity_policy is<p><tt>best_effort --> %% If has_hard_affinity is set, skip numa_placement: has_hard_affinity? --"yes"-->exec_xenguest %% If has_hard_affinity is not set, run numa_placement: has_hard_affinity? --"no"-->numa_placement-->exec_xenguest %% If Host.numa_affinity_policy is off (default, for now), %% skip NUMA placement: is_Host.numa_affinity_policy_set? =="default: disabled"==> exec_xenguest end %% xenguest subgraph subgraph xenguest exec_xenguest ==> stub_xc_hvm_build("<tt>stub_xc_hvm_build()") ==> configure_vcpus("<tT>configure_vcpus()") %% Decision ==> set_hard_affinity?{" Is <tt>platform/<br>vcpu/#domid/affinity</tt> set?"} end %% do_domctl Hypercalls numa_placement --Set the NUMA placement using soft-affinity--> XEN_VCPUAFFINITY_SOFT("<tt>xc_vcpu_setaffinity(SOFT)") ==> do_domctl set_hard_affinity? --yes--> XEN_VCPUAFFINITY_HARD("<tt>xc_vcpu_setaffinity(HARD)") --> do_domctl xc_domain_node_setaffinity("<tt>xc_domain_node_setaffinity()</tt> and <tt>xc_domain_node_getaffinity()") <--> do_domctl %% Xen subgraph subgraph xen[Xen Hypervisor] subgraph domain_update_node_affinity["domain_update_node_affinity()"] domain_update_node_aff("<tt>domain_update_node_aff()") ==> check_auto_node{"Is domain's<br><tt>auto_node_affinity</tt><br>enabled?"} =="yes (default)"==>set_node_affinity_from_vcpu_affinities(" Calculate the domain's <tt>node_affinity</tt> mask from vCPU affinity (used for further NUMA memory allocation for the domain)") end do_domctl{"do_domctl()<br>op->cmd=?"} ==XEN_DOMCTL_setvcpuaffinity==> vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity") ==>domain_update_node_aff do_domctl --XEN_DOMCTL_setnodeaffinity (not used currently) -->is_new_affinity_all_nodes? subgraph domain_set_node_affinity["domain_set_node_affinity()"] is_new_affinity_all_nodes?{new_affinity<br>is #34;all#34;?} --is #34;all#34; --> enable_auto_node_affinity("<tt>auto_node_affinity=1") --> domain_update_node_aff is_new_affinity_all_nodes? --not #34;all#34; --> disable_auto_node_affinity("<tt>auto_node_affinity=0") --> domain_update_node_aff end %% setting and getting the struct domain's node_affinity: disable_auto_node_affinity --node_affinity=new_affinity--> domain_node_affinity set_node_affinity_from_vcpu_affinities ==> domain_node_affinity@{ shape: bow-rect,label: "domain:&nbsp;node_affinity" } --XEN_DOMCTL_getnodeaffinity--> do_domctl end click is_Host.numa_affinity_policy_set? "https://github.com/xapi-project/xen-api/blob/90ef043c1f3a3bc20f1c5d3ccaaf6affadc07983/ocaml/xenopsd/xc/domain.ml#L951-L962" click numa_placement "https://github.com/xapi-project/xen-api/blob/90ef043c/ocaml/xenopsd/xc/domain.ml#L862-L897" click stub_xc_hvm_build "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436" _blank click get_flags "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1164-L1288" _blank click do_domctl "https://github.com/xen-project/xen/blob/7cf163879/xen/common/domctl.c#L282-L894" _blank click domain_set_node_affinity "https://github.com/xen-project/xen/blob/7cf163879/xen/common/domain.c#L943-L970" _blank click configure_vcpus "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1297-L1348" _blank click set_hard_affinity? "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1305-L1326" _blank click xc_vcpu_setaffinity "https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank click vcpu_set_affinity "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank click domain_update_node_aff "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank click check_auto_node "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank click set_node_affinity_from_vcpu_affinities "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank

Xen Hypervisor

xenguest

xenopsd VM.build

xenopsd VM.create

domain_set_node_affinity()

domain_update_node_affinity()

yes

no

Host.numa_affinity_policy is

best_effort

yes

no

default: disabled

Set the NUMA placement using soft-affinity

yes

yes (default)

XEN_DOMCTL_setvcpuaffinity

XEN_DOMCTL_setnodeaffinity (not used currently)

is "all"

not "all"

node_affinity=new_affinity

XEN_DOMCTL_getnodeaffinity

Is
xe vCPU-params:mask=
set? Example: 1,2,3
(Is used to enable vCPU
hard-affinity)

Write hard-affinity to XenStore:
platform/vcpu/#domid/affinity
(xenguest will read this and other configuration data
from Xenstore)

Is

Host.numa_affinity_policy

set?

Is hard-affinity configured in

platform/vcpu/#domid/affinity?

exec_xenguest

numa_placement

stub_xc_hvm_build()

configure_vcpus()

Is platform/
vcpu/#domid/affinity

set?

xc_vcpu_setaffinity(SOFT)

do_domctl()
op->cmd=?

xc_vcpu_setaffinity(HARD)

xc_domain_node_setaffinity()
and
xc_domain_node_getaffinity()

domain_update_node_aff()

Is domain's
auto_node_affinity
enabled?

Calculate the domain's node_affinity mask from vCPU affinity
(used for further NUMA memory allocation for the domain)

vcpu_set_affinity()
set the vCPU affinity

new_affinity
is "all"?

auto_node_affinity=1

auto_node_affinity=0

domain: node_affinity

xc_domain_node_setaffinity can be used to set the domain’s node_affinity (which is normally set by xc_set_vcpu_affinity) to different NUMA nodes.

No effect on the Xen scheduler

Currently, the node affinity does not affect the Xen scheudler: In case d->node_affinity would be set before vCPU creation, the initial pCPU of the new vCPU is the first pCPU of the first NUMA node in the domain’s node_affinity. This is further changed when one of more cpupools are set up. As this is only the initial pCPU of the vCPU, this alone does not change the scheduling of Xen Credit scheduler as it reschedules the vCPUs to other pCPUs.

Notes on future design improvements

It may be possible to call it before vCPUs are created

When done early, before vCPU creation, some domain-related data structures could be allocated using the domain’s d->node_affinity NUMA node mask.

With further changes in Xen and xenopsd, Xen could allocate the vCPU structs on the affine NUMA nodes of the domain.

For this, would be that xenopsd would have to call xc_domain_node_setaffinity() before vCPU creation, after having decided the domain’s NUMA placement, preferably including claiming the required memory for the domain to ensure that the domain will be populated from the same NUMA node(s).

This call cannot influence the past: The xenopsd VM_create micro-ops calls Xenctrl.domain_create. It currently creates the domain’s data structures before numa_placement was done.

Improving Xenctrl.domain_create to pass a NUMA node for allocating the Hypervisor’s data structures (e.g. vCPU) of the domain would require changes to the Xen hypervisor and the xenopsd xenopsd VM_create micro-op.

xc_vcpu_setaffinity()

Introduction

In the Xen hypervisor, each vCPU has:

  • A soft affinity, This is the list of pCPUs where a vCPU prefers to run:

    This can be used in cases to make vCPUs prefer to run on a set on pCPUs, for example the pCPUs of a NUMA node, but in case those are already busy, the Credit schedule can still ignore the soft-affinity. A typical use case for this are NUMA machines, where the soft affinity for the vCPUs of a domain should be set equal to the pCPUs of the NUMA node where the domain’s memory shall be placed.

    See the description of the NUMA feature for more details.

  • A hard affinity, also known as pinning. This is the list of pCPUs where a vCPU is allowed to run

    Hard affinity is currently not used for NUMA placement, but can be configured manually for a given domain, either using xe VCPUs-params:mask= or the API.

    For example, the vCPU’s pinning can be configured using a template with:

    xe template-param-set uuid=<template_uuid> vCPUs-params:mask=1,2,3

    There are also host-level guest_VCPUs_params which are used by host-cpu-tune to exclusively pin Dom0 and guests (i.e. that their pCPUs never overlap). Note: This isn’t currently supported by the NUMA code: It could result that the NUMA placement picks a node that has reduced capacity or unavailable due to the host mask that host-cpu-tune has set.

Purpose

The libxenctrl library call xc_set_vcpu_affinity() controls the pCPU affinity of the given vCPU.

xenguest uses it when building domains if xenopsd added vCPU affinity information to the XenStore platform data path platform/vcpu/#domid/affinity of the domain.

Updating the NUMA node affinity of a domain

Besides that, xc_set_vcpu_affinity() can also modify the NUMA node affinity of the Xen domain if the vCPU:

When Xen creates a domain, it enables the domain’s d->auto_node_affinity feature flag.

When it is enabled, setting the vCPU affinity also updates the NUMA node affinity which is used for memory allocations for the domain:

Simplified flowchart

--- theme: '' --- flowchart TD subgraph libxenctrl xc_vcpu_setaffinity("<tt>xc_vcpu_setaffinity()")--hypercall-->xen end subgraph xen[Xen Hypervisor] direction LR vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity") -->check_auto_node{"Is the domain's<br><tt>auto_node_affinity</tt><br>enabled?"} --"yes<br>(default)"--> auto_node_affinity("Set the<br>domain's<br><tt>node_affinity</tt> mask as well<br>(used for further<br>NUMA memory<br>allocation)") click xc_vcpu_setaffinity "https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank click vcpu_set_affinity "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank click domain_update_node_aff "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank click check_auto_node "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank click auto_node_affinity "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank end

libxenctrl

hypercall

Xen Hypervisor

yes
(default)

vcpu_set_affinity()
set the vCPU affinity

Is the domain's
auto_node_affinity
enabled?

Set the
domain's
node_affinity
mask as well
(used for further
NUMA memory
allocation)

xc_vcpu_setaffinity()

Current use by xenopsd and xenguest

When Host.numa_affinity_policy is set to best_effort, xenopsd attempts NUMA node placement when building new VMs and instructs xenguest to set the vCPU affinity of the domain.

With the domain’s auto_node_affinity flag enabled by default in Xen, this automatically also sets the d->node_affinity mask of the domain.

This then causes the Xen memory allocator to prefer the NUMA nodes in the d->node_affinity NUMA node mask when allocating memory.

That is, (for completeness) unless Xen’s allocation function alloc_heap_pages() receives a specific NUMA node in its memflags argument when called.

See xc_domain_node_setaffinity() for more information about another way to set the node_affinity NUMA node mask of Xen domains and more depth on how it is used in Xen.

Flowchart of its current use for NUMA affinity

In the flowchart, two code paths are set in bold:

  • Show the path when Host.numa_affinity_policy is the default (off) in xenopsd.
  • Show the default path of xc_vcpu_setaffinity(XEN_VCPUAFFINITY_SOFT) in Xen, when the Domain’s auto_node_affinity flag is enabled (default) to show how it changes to the vCPU affinity update the domain’s node_affinity in this default case as well.

xenguest uses the Xenstore to read the static domain configuration that it needs reads to build the domain.

--- theme: '' --- flowchart TD subgraph VM.create["xenopsd VM.create"] %% Is xe vCPU-params:mask= set? If yes, write to Xenstore: is_xe_vCPUparams_mask_set?{" Is <tt>xe vCPU-params:mask=</tt> set? Example: <tt>1,2,3</tt> (Is used to enable vCPU<br>hard-affinity) "} --"yes"--> set_hard_affinity("Write hard-affinity to XenStore: <tt>platform/vcpu/#domid/affinity</tt> (xenguest will read this and other configuration data from Xenstore)") end subgraph VM.build["xenopsd VM.build"] %% Labels of the decision nodes is_Host.numa_affinity_policy_set?{ Is<p><tt>Host.numa_affinity_policy</tt><p>set?} has_hard_affinity?{ Is hard-affinity configured in <p><tt>platform/vcpu/#domid/affinity</tt>?} %% Connections from VM.create: set_hard_affinity --> is_Host.numa_affinity_policy_set? is_xe_vCPUparams_mask_set? == "no"==> is_Host.numa_affinity_policy_set? %% The Subgraph itself: %% Check Host.numa_affinity_policy is_Host.numa_affinity_policy_set? %% If Host.numa_affinity_policy is "best_effort": -- Host.numa_affinity_policy is<p><tt>best_effort --> %% If has_hard_affinity is set, skip numa_placement: has_hard_affinity? --"yes"-->exec_xenguest %% If has_hard_affinity is not set, run numa_placement: has_hard_affinity? --"no"-->numa_placement-->exec_xenguest %% If Host.numa_affinity_policy is off (default, for now), %% skip NUMA placement: is_Host.numa_affinity_policy_set? =="default: disabled"==> exec_xenguest end %% xenguest subgraph subgraph xenguest exec_xenguest ==> stub_xc_hvm_build("<tt>stub_xc_hvm_build()") ==> configure_vcpus("<tT>configure_vcpus()") %% Decision ==> set_hard_affinity?{" Is <tt>platform/<br>vcpu/#domid/affinity</tt> set?"} end %% do_domctl Hypercalls numa_placement --Set the NUMA placement using soft-affinity--> XEN_VCPUAFFINITY_SOFT("<tt>xc_vcpu_setaffinity(SOFT)") ==> do_domctl set_hard_affinity? --yes--> XEN_VCPUAFFINITY_HARD("<tt>xc_vcpu_setaffinity(HARD)") --> do_domctl xc_domain_node_setaffinity("<tt>xc_domain_node_setaffinity()</tt> and <tt>xc_domain_node_getaffinity()") <--> do_domctl %% Xen subgraph subgraph xen[Xen Hypervisor] subgraph domain_update_node_affinity["domain_update_node_affinity()"] domain_update_node_aff("<tt>domain_update_node_aff()") ==> check_auto_node{"Is domain's<br><tt>auto_node_affinity</tt><br>enabled?"} =="yes (default)"==>set_node_affinity_from_vcpu_affinities(" Calculate the domain's <tt>node_affinity</tt> mask from vCPU affinity (used for further NUMA memory allocation for the domain)") end do_domctl{"do_domctl()<br>op->cmd=?"} ==XEN_DOMCTL_setvcpuaffinity==> vcpu_set_affinity("<tt>vcpu_set_affinity()</tt><br>set the vCPU affinity") ==>domain_update_node_aff do_domctl --XEN_DOMCTL_setnodeaffinity (not used currently) -->is_new_affinity_all_nodes? subgraph domain_set_node_affinity["domain_set_node_affinity()"] is_new_affinity_all_nodes?{new_affinity<br>is #34;all#34;?} --is #34;all#34; --> enable_auto_node_affinity("<tt>auto_node_affinity=1") --> domain_update_node_aff is_new_affinity_all_nodes? --not #34;all#34; --> disable_auto_node_affinity("<tt>auto_node_affinity=0") --> domain_update_node_aff end %% setting and getting the struct domain's node_affinity: disable_auto_node_affinity --node_affinity=new_affinity--> domain_node_affinity set_node_affinity_from_vcpu_affinities ==> domain_node_affinity@{ shape: bow-rect,label: "domain:&nbsp;node_affinity" } --XEN_DOMCTL_getnodeaffinity--> do_domctl end click is_Host.numa_affinity_policy_set? "https://github.com/xapi-project/xen-api/blob/90ef043c1f3a3bc20f1c5d3ccaaf6affadc07983/ocaml/xenopsd/xc/domain.ml#L951-L962" click numa_placement "https://github.com/xapi-project/xen-api/blob/90ef043c/ocaml/xenopsd/xc/domain.ml#L862-L897" click stub_xc_hvm_build "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L2329-L2436" _blank click get_flags "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1164-L1288" _blank click do_domctl "https://github.com/xen-project/xen/blob/7cf163879/xen/common/domctl.c#L282-L894" _blank click domain_set_node_affinity "https://github.com/xen-project/xen/blob/7cf163879/xen/common/domain.c#L943-L970" _blank click configure_vcpus "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1297-L1348" _blank click set_hard_affinity? "https://github.com/xenserver/xen.pg/blob/65c0438b/patches/xenguest.patch#L1305-L1326" _blank click xc_vcpu_setaffinity "https://github.com/xen-project/xen/blob/7cf16387/tools/libs/ctrl/xc_domain.c#L199-L250" _blank click vcpu_set_affinity "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1353-L1393" _blank click domain_update_node_aff "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1809-L1876" _blank click check_auto_node "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1840-L1870" _blank click set_node_affinity_from_vcpu_affinities "https://github.com/xen-project/xen/blob/7cf16387/xen/common/sched/core.c#L1867-L1869" _blank

Xen Hypervisor

xenguest

xenopsd VM.build

xenopsd VM.create

domain_set_node_affinity()

domain_update_node_affinity()

yes

no

Host.numa_affinity_policy is

best_effort

yes

no

default: disabled

Set the NUMA placement using soft-affinity

yes

yes (default)

XEN_DOMCTL_setvcpuaffinity

XEN_DOMCTL_setnodeaffinity (not used currently)

is "all"

not "all"

node_affinity=new_affinity

XEN_DOMCTL_getnodeaffinity

Is
xe vCPU-params:mask=
set? Example: 1,2,3
(Is used to enable vCPU
hard-affinity)

Write hard-affinity to XenStore:
platform/vcpu/#domid/affinity
(xenguest will read this and other configuration data
from Xenstore)

Is

Host.numa_affinity_policy

set?

Is hard-affinity configured in

platform/vcpu/#domid/affinity?

exec_xenguest

numa_placement

stub_xc_hvm_build()

configure_vcpus()

Is platform/
vcpu/#domid/affinity

set?

xc_vcpu_setaffinity(SOFT)

do_domctl()
op->cmd=?

xc_vcpu_setaffinity(HARD)

xc_domain_node_setaffinity()
and
xc_domain_node_getaffinity()

domain_update_node_aff()

Is domain's
auto_node_affinity
enabled?

Calculate the domain's node_affinity mask from vCPU affinity
(used for further NUMA memory allocation for the domain)

vcpu_set_affinity()
set the vCPU affinity

new_affinity
is "all"?

auto_node_affinity=1

auto_node_affinity=0

domain: node_affinity