RRDD
The xcp-rrdd
daemon (hereafter simply called “rrdd”) is a component in the
xapi toolstack that is responsible for collecting metrics, storing them as
“Round-Robin Databases” (RRDs) and exposing these to clients.
The code is in ocaml/xcp-rrdd.
The xcp-rrdd
daemon (hereafter simply called “rrdd”) is a component in the
xapi toolstack that is responsible for collecting metrics, storing them as
“Round-Robin Databases” (RRDs) and exposing these to clients.
The code is in ocaml/xcp-rrdd.
Design document | |
---|---|
Revision | v1 |
Status | released (7,0) |
Current problems with rrdd:
This determines the host to which rrdd will archive a VM’s rrd when the VM’s domain disappears - rrdd will always try to archive to the master. However, when a host joins a pool as a slave rrdd is not restarted so this knowledge is out of date. When a VM shuts down on the slave rrdd will archive the rrd locally. When starting this VM again the master xapi will attempt to push any locally-existing rrd to the host on which the VM is being started, but since no rrd archive exists on the master the slave rrdd will end up creating a new rrd and the previous rrd will be lost.
When rebooting a VM, there is a chance rrdd will attempt to update that VM’s rrd during the brief period when there is no domain for that VM. If this happens, rrdd will archive the VM’s rrd to the master, and then create a new rrd for the VM when it sees the new domain. If rrdd doesn’t attempt to update that VM’s rrd during this period, rrdd will continue to add data for the new domain to the old rrd.
To solve these problems, we will remove some of the intelligence from rrdd and make it into more of a slave process of xapi. This will entail removing all knowledge from rrdd of whether it is running on a master or a slave, and also modifying rrdd to only start monitoring a VM when it is told to, and only archiving an rrd (to a specified address) when it is told to. This matches the way xenopsd only manages domains which it has been told to manage.
For most VM lifecycle operations, xapi and rrdd processes (sometimes across more than one host) cooperate to start or stop recording a VM’s metrics and/or to restore or backup the VM’s archived metrics. Below we will describe, for each relevant VM operation, how the VM’s rrd is currently handled, and how we propose it will be handled after the redesign.
The master xapi makes a remove_rrd call to the local rrdd, which causes rrdd to to delete the VM’s archived rrd from disk. This behaviour will remain unchanged.
The master xapi makes a push_rrd call to the local rrdd, which causes rrdd to send any locally-archived rrd for the VM in question to the rrdd of the host on which the VM is starting. This behaviour will remain unchanged.
Every update cycle rrdd compares its list of registered VMs to the list of domains actually running on the host. Any registered VMs which do not have a corresponding domain have their rrds archived to the rrdd running on the host believed to be the master. We will change this behaviour by stopping rrdd from doing the archiving itself; instead we will expose a new function in rrdd’s interface:
val archive_rrd : vm_uuid:string -> remote_address:string -> unit
This will cause rrdd to remove the specified rrd from its table of registered VMs, and archive the rrd to the specified host. When a VM has finished shutting down or suspending, the xapi process on the host on which the VM was running will call archive_rrd to ask the local rrdd to archive back to the master rrdd.
Removing rrdd’s ability to automatically archive the rrds for disappeared domains will have the bonus effect of fixing how the rrds of rebooting VMs are handled, as we don’t want the rrds of rebooting VMs to be archived at all.
This will be handled automatically, as internally VM.checkpoint carries out a VM.suspend followed by a VM.resume.
The source host’s xapi makes a migrate_rrd call to the local rrd, with a destination address and an optional session ID. The session ID is only required for cross-pool migration. The local rrdd sends the rrd for that VM to the destination host’s rrdd as an HTTP PUT. This behaviour will remain unchanged.
Design document | |
---|---|
Revision | v1 |
Status | released (7.0) |
Revision history | |
v1 | Initial version |
rrdd plugins currently report datasources via a shared-memory file, using the following format:
DATASOURCES
000001e4
dba4bf7a84b6d11d565d19ef91f7906e
{
"timestamp": 1339685573.245,
"data_sources": {
"cpu-temp-cpu0": {
"description": "Temperature of CPU 0",
"type": "absolute",
"units": "degC",
"value": "64.33"
"value_type": "float",
},
"cpu-temp-cpu1": {
"description": "Temperature of CPU 1",
"type": "absolute",
"units": "degC",
"value": "62.14"
"value_type": "float",
}
}
}
This format contains four main components:
DATASOURCES
This should always be present.
000001e4
dba4bf7a84b6d11d565d19ef91f7906e
{
"timestamp": 1339685573.245,
"data_sources": {
"cpu-temp-cpu0": {
"description": "Temperature of CPU 0",
"type": "absolute",
"units": "degC",
"value": "64.33"
"value_type": "float",
},
"cpu-temp-cpu1": {
"description": "Temperature of CPU 1",
"type": "absolute",
"units": "degC",
"value": "62.14"
"value_type": "float",
}
}
}
The disadvantage of this protocol is that rrdd has to parse the entire JSON structure each tick, even though most of the time only the values will change.
For this reason a new protocol is proposed.
value | bits | format | notes |
---|---|---|---|
header string | (string length)*8 | string | “DATASOURCES” as in the V1 protocol |
data checksum | 32 | int32 | binary-encoded crc32 of the concatenation of the encoded timestamp and datasource values |
metadata checksum | 32 | int32 | binary-encoded crc32 of the metadata string (see below) |
number of datasources | 32 | int32 | only needed if the metadata has changed - otherwise RRDD can use a cached value |
timestamp | 64 | double | Unix epoch |
datasource values | n * 64 | int64 | double | n is the number of datasources exported by the plugin, type dependent on the setting in the metadata for value_type [int64|float] |
metadata length | 32 | int32 | |
metadata | (string length)*8 | string |
All integers/double are bigendian. The metadata will have the same JSON-based format as
in the V1 protocol, minus the timestamp and value
key-value pair for each
datasource.
field | values | notes | required |
---|---|---|---|
description | string | Description of the datasource | no |
owner | host | vm | sr | The object to which the data relates | no, default host |
value_type | int64 | float | The type of the datasource | yes |
type | absolute | derive | gauge | The type of measurement being sent. Absolute for counters which are reset on reading, derive stores the derivative of the recorded values (useful for metrics which continually increase like amount of data written since start), gauge for things like temperature | no, default absolute |
default | true | false | Whether the source is default enabled or not | no, default false |
units | The units the data should be displayed in | no | |
min | The minimum value for the datasource | no, default -infinity | |
max | The maximum value for the datasource | no, default +infinity |
{
"datasources": {
"memory_reclaimed": {
"description":"Host memory reclaimed by squeezed",
"owner":"host",
"value_type":"int64",
"type":"absolute",
"default":"true",
"units":"B",
"min":"-inf",
"max":"inf"
},
"memory_reclaimed_max": {
"description":"Host memory that could be reclaimed by squeezed",
"owner":"host",
"value_type":"int64",
"type":"absolute",
"default":"true",
"units":"B",
"min":"-inf",
"max":"inf"
},
{
"cpu-temp-cpu0": {
"description": "Temperature of CPU 0",
"owner":"host",
"value_type": "float",
"type": "absolute",
"default":"true",
"units": "degC",
"min":"-inf",
"max":"inf"
},
"cpu-temp-cpu1": {
"description": "Temperature of CPU 1",
"owner":"host",
"value_type": "float",
"type": "absolute",
"default":"true",
"units": "degC",
"min":"-inf",
"max":"inf"
}
}
}
The above formatting is not required, but added here for readability.
if header != expected_header:
raise InvalidHeader()
if data_checksum == last_data_checksum:
raise NoUpdate()
if data_checksum != crc32(encoded_timestamp_and_values):
raise InvalidChecksum()
if metadata_checksum == last_metadata_checksum:
for datasource, value in cached_datasources, values:
update(datasource, value)
else:
if metadata_checksum != crc32(metadata):
raise InvalidChecksum()
cached_datasources = create_datasources(metadata)
for datasource, value in cached_datasources, values:
update(datasource, value)
This means that for a normal update, RRDD will only have to read the header plus the first (16 + 16 + 4 + 8 + 8*n) bytes of data, where n is the number of datasources exported by the plugin. If the metadata changes RRDD will have to read all the data (and parse the metadata).
Design document | |
---|---|
Revision | v11 |
Status | confirmed |
Review | #139 |
Revision history | |
v1 | Initial version |
v2 | Added details about the VDI's binary format and size, and the SR capability name. |
v3 | Tar was not needed after all! |
v4 | Add details about discovering the VDI using a new vdi_type. |
v5 | Add details about the http handlers and interaction with xapi's database |
v6 | Add details about the framing of the data within the VDI |
v7 | Redesign semantics of the rrd_updates handler |
v8 | Redesign semantics of the rrd_updates handler (again) |
v9 | Magic number change in framing format of vdi |
v10 | Add details of new APIs added to xapi and xcp-rrdd |
v11 | Remove unneeded API calls |
Xapi has RRDs to track VM- and host-level metrics. There is a desire to have SR-level RRDs as a new category, because SR stats are not specific to a certain VM or host. Examples are size and free space on the SR. While recording SR metrics is relatively straightforward within the current RRD system, the main question is where to archive them, which is what this design aims to address.
All SR types, including the existing ones, should be able to have RRDs defined for them. Some RRDs, such as a “free space” one, may make sense for multiple (if not all) SR types. However, the way to measure something like free space will be SR specific. Furthermore, it should be possible for each type of SR to have its own specialised RRDs.
It follows that each SR will need its own xcp-rrdd
plugin, which runs on the SR master and defines and collects the stats. For the new thin-lvhd SR this could be xenvmd
itself. The plugin registers itself with xcp-rrdd
, so that the latter records the live stats from the plugin into RRDs.
SR-level RRDs will be archived in the SR itself, in a VDI, rather than in the local filesystem of the SR master. This way, we don’t need to worry about master failover.
The VDI will be 4MB in size. This is a little more space than we would need for the RRDs we have in mind at the moment, but will give us enough headroom for the foreseeable future. It will not have a filesystem on it for simplicity and performance. There will only be one RRD archive file for each SR (possibly containing data for multiple metrics), which is gzipped by xcp-rrdd
, and can be copied onto the VDI.
There will be a simple framing format for the data on the VDI. This will be as follows:
Offset | Type | Name | Comment |
---|---|---|---|
0 | 32 bit network-order int | magic | Magic number = 0x7ada7ada |
4 | 32 bit network-order int | version | 1 |
8 | 32 bit network-order int | length | length of payload |
12 | gzipped data | data |
Xapi will be in charge of the lifecycle of this VDI, not the plugin or xcp-rrdd
, which will make it a little easier to manage them. Only xapi will attach/detach and read from/write to this VDI. We will keep xcp-rrdd
as simple as possible, and have it archive to its standard path in the local file system. Xapi will then copy the RRDs in and out of the VDI.
A new value "rrd"
in the vdi_type
enum of the datamodel will be defined, and the VDI.type
of the VDI will be set to that value. The storage backend will write the VDI type to the LVM metadata of the VDI, so that xapi can discover the VDI containing the SR-level RRDs when attaching an SR to a new pool. This means that SR-level RRDs are currently restricted to LVM SRs.
Because we will not write plugins for all SRs at once, and therefore do not need xapi to set up the VDI for all SRs, we will add an SR “capability” for the backends to be able to tell xapi whether it has the ability to record stats and will need storage for them. The capability name will be: SR_STATS
.
The SR-stats VDI will be attached/detached on PBD.plug
/unplug
on the SR master.
On PBD.plug
on the SR master, if the SR has the stats capability, xapi:
xcp-rrdd
where to put them).xcp-rrdd
about the RRDs so that it will load the RRDs and add newly recorded data to them (needs a function like push_rrd_local
for VM-level RRDs).On PBD.unplug
on the SR master, if the SR has the stats capability xapi:
xcp-rrdd
to archive the RRDs for the SR, which it will do to the local filesystem.Xapi’s periodic scheduler regularly triggers xcp-rrdd
to archive the host and VM RRDs. It will need to do this for the SR ones as well. Furthermore, xapi will need to attach the stats VDI and copy the RRD archives into it (as on PBD.unplug
).
There will be a new handler for downloading an SR RRD:
http://<server>/sr_rrd?session_id=<SESSION HANDLE>&uuid=<SR UUID>
RRD updates are handled via a single handler for the host, VM and SR UUIDs
RRD updates for the host, VMs and SRs are handled by a a single handler at
/rrd_updates
. Exactly what is returned will be determined by the parameters
passed to this handler.
Whether the host RRD updates are returned is governed by the presence of
host=true
in the parameters. host=<anything else>
or the absence of the
host
key will mean the host RRD is not returned.
Whether the VM RRD updates are returned is governed by the vm_uuid
key in the
URL parameters. vm_uuid=all
will return RRD updates for all VM RRDs.
vm_uuid=xxx
will return the RRD updates for the VM with uuid xxx
only.
If vm_uuid
is none
(or any other string which is not a valid VM UUID) then
the handler will return no VM RRD updates. If the vm_uuid
key is absent, RRD
updates for all VMs will be returned.
Whether the SR RRD updates are returned is governed by the sr_uuid
key in the
URL parameters. sr_uuid=all
will return RRD updates for all SR RRDs.
sr_uuid=xxx
will return the RRD updates for the SR with uuid xxx
only.
If sr_uuid
is none
(or any other string which is not a valid SR UUID) then
the handler will return no SR RRD updates. If the sr_uuid
key is absent, no
SR RRD updates will be returned.
It will be possible to mix and match these parameters; for example to return RRD updates for the host and all VMs, the URL to use would be:
http://<server>/rrd_updates?session_id=<SESSION HANDLE>&start=10258122541&host=true&vm_uuid=all&sr_uuid=none
Or, to return RRD updates for all SRs but nothing else, the URL to use would be:
http://<server>/rrd_updates?session_id=<SESSION HANDLE>&start=10258122541&host=false&vm_uuid=none&sr_uuid=all
While behaviour is defined if any of the keys host
, vm_uuid
and sr_uuid
is
missing, this is for backwards compatibility and it is recommended that clients
specify each parameter explicitly.
If the SR is presenting a data source called ‘physical_utilisation’, xapi will record this periodically in its database. In order to do this, xapi will fork a thread that, every n minutes (2 suggested, but open to suggestions here), will query the attached SRs, then query RRDD for the latest data source for these, and update the database.
The utilisation of VDIs will not be updated in this way until scalability worries for RRDs are addressed.
Xapi will cache whether it is SR master for every attached SR and only attempt to update if it is the SR master.
Get the filesystem location where sr rrds are archived: val sr_rrds_path : uid:string -> string
Archive the sr rrds to the filesystem: val archive_sr_rrd : sr_uuid:string -> unit
Load the sr rrds from the filesystem: val push_sr_rrd : sr_uuid:string -> unit