Design document | |
---|---|
Revision | v11 |
Status | confirmed |
Review | #139 |
Revision history | |
v1 | Initial version |
v2 | Added details about the VDI's binary format and size, and the SR capability name. |
v3 | Tar was not needed after all! |
v4 | Add details about discovering the VDI using a new vdi_type. |
v5 | Add details about the http handlers and interaction with xapi's database |
v6 | Add details about the framing of the data within the VDI |
v7 | Redesign semantics of the rrd_updates handler |
v8 | Redesign semantics of the rrd_updates handler (again) |
v9 | Magic number change in framing format of vdi |
v10 | Add details of new APIs added to xapi and xcp-rrdd |
v11 | Remove unneeded API calls |
Xapi has RRDs to track VM- and host-level metrics. There is a desire to have SR-level RRDs as a new category, because SR stats are not specific to a certain VM or host. Examples are size and free space on the SR. While recording SR metrics is relatively straightforward within the current RRD system, the main question is where to archive them, which is what this design aims to address.
All SR types, including the existing ones, should be able to have RRDs defined for them. Some RRDs, such as a “free space” one, may make sense for multiple (if not all) SR types. However, the way to measure something like free space will be SR specific. Furthermore, it should be possible for each type of SR to have its own specialised RRDs.
It follows that each SR will need its own xcp-rrdd
plugin, which runs on the SR master and defines and collects the stats. For the new thin-lvhd SR this could be xenvmd
itself. The plugin registers itself with xcp-rrdd
, so that the latter records the live stats from the plugin into RRDs.
SR-level RRDs will be archived in the SR itself, in a VDI, rather than in the local filesystem of the SR master. This way, we don’t need to worry about master failover.
The VDI will be 4MB in size. This is a little more space than we would need for the RRDs we have in mind at the moment, but will give us enough headroom for the foreseeable future. It will not have a filesystem on it for simplicity and performance. There will only be one RRD archive file for each SR (possibly containing data for multiple metrics), which is gzipped by xcp-rrdd
, and can be copied onto the VDI.
There will be a simple framing format for the data on the VDI. This will be as follows:
Offset | Type | Name | Comment |
---|---|---|---|
0 | 32 bit network-order int | magic | Magic number = 0x7ada7ada |
4 | 32 bit network-order int | version | 1 |
8 | 32 bit network-order int | length | length of payload |
12 | gzipped data | data |
Xapi will be in charge of the lifecycle of this VDI, not the plugin or xcp-rrdd
, which will make it a little easier to manage them. Only xapi will attach/detach and read from/write to this VDI. We will keep xcp-rrdd
as simple as possible, and have it archive to its standard path in the local file system. Xapi will then copy the RRDs in and out of the VDI.
A new value "rrd"
in the vdi_type
enum of the datamodel will be defined, and the VDI.type
of the VDI will be set to that value. The storage backend will write the VDI type to the LVM metadata of the VDI, so that xapi can discover the VDI containing the SR-level RRDs when attaching an SR to a new pool. This means that SR-level RRDs are currently restricted to LVM SRs.
Because we will not write plugins for all SRs at once, and therefore do not need xapi to set up the VDI for all SRs, we will add an SR “capability” for the backends to be able to tell xapi whether it has the ability to record stats and will need storage for them. The capability name will be: SR_STATS
.
The SR-stats VDI will be attached/detached on PBD.plug
/unplug
on the SR master.
PBD.plug
on the SR master, if the SR has the stats capability, xapi:
xcp-rrdd
where to put them).xcp-rrdd
about the RRDs so that it will load the RRDs and add newly recorded data to them (needs a function like push_rrd_local
for VM-level RRDs).PBD.unplug
on the SR master, if the SR has the stats capability xapi:
xcp-rrdd
to archive the RRDs for the SR, which it will do to the local filesystem.Xapi’s periodic scheduler regularly triggers xcp-rrdd
to archive the host and VM RRDs. It will need to do this for the SR ones as well. Furthermore, xapi will need to attach the stats VDI and copy the RRD archives into it (as on PBD.unplug
).
There will be a new handler for downloading an SR RRD:
http://<server>/sr_rrd?session_id=<SESSION HANDLE>&uuid=<SR UUID>
RRD updates are handled via a single handler for the host, VM and SR UUIDs
RRD updates for the host, VMs and SRs are handled by a a single handler at
/rrd_updates
. Exactly what is returned will be determined by the parameters
passed to this handler.
Whether the host RRD updates are returned is governed by the presence of
host=true
in the parameters. host=<anything else>
or the absence of the
host
key will mean the host RRD is not returned.
Whether the VM RRD updates are returned is governed by the vm_uuid
key in the
URL parameters. vm_uuid=all
will return RRD updates for all VM RRDs.
vm_uuid=xxx
will return the RRD updates for the VM with uuid xxx
only.
If vm_uuid
is none
(or any other string which is not a valid VM UUID) then
the handler will return no VM RRD updates. If the vm_uuid
key is absent, RRD
updates for all VMs will be returned.
Whether the SR RRD updates are returned is governed by the sr_uuid
key in the
URL parameters. sr_uuid=all
will return RRD updates for all SR RRDs.
sr_uuid=xxx
will return the RRD updates for the SR with uuid xxx
only.
If sr_uuid
is none
(or any other string which is not a valid SR UUID) then
the handler will return no SR RRD updates. If the sr_uuid
key is absent, no
SR RRD updates will be returned.
It will be possible to mix and match these parameters; for example to return RRD updates for the host and all VMs, the URL to use would be:
http://<server>/rrd_updates?session_id=<SESSION HANDLE>&start=10258122541&host=true&vm_uuid=all&sr_uuid=none
Or, to return RRD updates for all SRs but nothing else, the URL to use would be:
http://<server>/rrd_updates?session_id=<SESSION HANDLE>&start=10258122541&host=false&vm_uuid=none&sr_uuid=all
While behaviour is defined if any of the keys host
, vm_uuid
and sr_uuid
is
missing, this is for backwards compatibility and it is recommended that clients
specify each parameter explicitly.
If the SR is presenting a data source called ‘physical_utilisation’, xapi will record this periodically in its database. In order to do this, xapi will fork a thread that, every n minutes (2 suggested, but open to suggestions here), will query the attached SRs, then query RRDD for the latest data source for these, and update the database.
The utilisation of VDIs will not be updated in this way until scalability worries for RRDs are addressed.
Xapi will cache whether it is SR master for every attached SR and only attempt to update if it is the SR master.
Get the filesystem location where sr rrds are archived: val sr_rrds_path : uid:string -> string
Archive the sr rrds to the filesystem: val archive_sr_rrd : sr_uuid:string -> unit
Load the sr rrds from the filesystem: val push_sr_rrd : sr_uuid:string -> unit