xc_domain_claim_pages()
Purpose
The purpose of xc_domain_claim_pages()
is to attempt to
stake a claim on an amount of memory for a given domain which guarantees that
memory allocations for the claimed amount will be successful.
The domain can still attempt to allocate beyond the claim, but those are not
guaranteed to be successful and will fail if the domain’s memory reaches it’s
max_mem
value.
Each domain can only have one claim, and the domid is the key of the claim. By killing the domain, the claim is also released.
Depending on the given size argument, the remaining stack of the domain can be set initially, updated to the given amount, or reset to no claim (0).
Management of claims
- The stake is centrally managed by the Xen hypervisor using a Hypercall.
- Claims are not reflected in the amount of free memory reported by Xen.
Reporting of claims
xl claims
reports the outstanding claims of the domains:[!info] Sample output of
xl claims
:Name ID Mem VCPUs State Time(s) Claimed Domain-0 0 2656 8 r----- 957418.2 0
xl info
reports the host-wide outstanding claims:[!info] Sample output from
xl info | grep outstanding
:outstanding_claims : 0
Tracking of claims
Xen only tracks:
- the outstanding claims of each domain and
- the outstanding host-wide claims.
Claiming zero pages effectively cancels the domain’s outstanding claim and is always successful.
[!info]
- Allocations for outstanding claims are expected to always be successful.
- But this reduces the amount of outstanding claims if the domain.
- Freeing memory of the domain increases the domain’s claim again:
- But, when a domain consumes its claim, it is reset.
- When the claim is reset, freed memory is longer moved to the outstanding claims!
- It would have to get a new claim on memory to have spare memory again.
[!warning] The domain’s
max_mem
value is used to deny memory allocation If an allocation would cause the domain to exceed it’smax_mem
value, it will always fail.
Implementation
Function signature of the libXenCtrl function to call the Xen hypercall:
long xc_memory_op(libxc_handle, XENMEM_claim_pages, struct xen_memory_reservation *)
struct xen_memory_reservation
is defined as :
struct xen_memory_reservation {
.nr_extents = nr_pages, /* number of pages to claim */
.extent_order = 0, /* an order 0 means: 4k pages, only 0 is allowed */
.mem_flags = 0, /* no flags, only 0 is allowed (at the moment) */
.domid = domid /* numerical domain ID of the domain */
};
Concurrency
Xen protects the consistency of the stake of the domain
using the domain’s page_alloc_lock
and the global heap_lock
of Xen.
Thse spin-locks prevent any “time-of-check-time-of-use” races.
As the hypercall needs to take those spin-locks, it cannot be preempted.
Return value
The call returns 0 if the hypercall successfully claimed the requested amount of memory, else it returns non-zero.
Current users
libxl and the xl CLI
If the struct xc_dom_image
passed by libxl
to the
libxenguest
functions
meminit_hvm()
and
meminit_pv()
has it’s claim_enabled
field set, they,
before allocating the domain’s system memory using the allocation function
xc_populate_physmap() which calls the hypercall to allocate and populate
the domain’s main system memory, will attempt to claim the to-be allocated
memory using a call to xc_domain_claim_pages()
.
In case this fails, they do not attempt to continue and return the error code
of xc_domain_claim_pages()
.
Both functions also (unconditionally) reset the claim upon return.
But, the xl
CLI uses this functionality (unless disabled in xl.conf
)
to make building the domains fail to prevent running out of memory inside
the meminit_hvm
and meminit_pv
calls.
Instead, they immediately return an error.
This means that in case the claim fails, xl
avoids:
- The effort of allocating the memory, thereby not blocking it for other domains.
- The effort of potentially needing to scrub the memory after the build failure.
xenguest
While xenguest calls the
libxenguest
functions
meminit_hvm()
and
meminit_pv()
like libxl
does, it does not set
struct xc_dom_image.claim_enabled,
so it does not enable the first call to xc_domain_claim_pages()
which would claim the amount of memory that these functions will
attempt to allocate and populate for the domain.
Future design ideas for improved NUMA support
For improved support for NUMA, xenopsd
may want to call an updated version of this function for the domain, so it has
a stake on the NUMA node’s memory before xenguest
will allocate for the domain
before assigning an NUMA node to a new domain.
Further, as PV drivers unmap
and free
memory for grant tables to Xen and
then re-allocate memory for those grant tables, xenopsd
may want to try to
stake a very small claim for the domain on the NUMA node of the domain so that
Xen can increase this claim when the PV drivers free
this memory and re-use
the resulting claimed amount for allocating the grant tables. This would ensure
that the grant tables are then allocated on the local NUMA node of the domain,
avoiding remote memory accesses when accessing the grant tables from inside
the domain.
Note: In case the corresponding backend process in Dom0 is running on another NUMA node, it would access the domain’s grant tables from a remote NUMA node, but in this would enable a future improvement for Dom0, where it could prefer to run the corresponding backend process on the same or a neighbouring NUMA node.