RRDD

The xcp-rrdd daemon (hereafter simply called “rrdd”) is a component in the xapi toolstack that is responsible for collecting metrics, storing them as “Round-Robin Databases” (RRDs) and exposing these to clients.

The code is in ocaml/xcp-rrdd.

Subsections of RRDD

Design document
Revisionv1
Statusreleased (7,0)

RRDD archival redesign

Introduction

Current problems with rrdd:

  • rrdd stores knowledge about whether it is running on a master or a slave

This determines the host to which rrdd will archive a VM’s rrd when the VM’s domain disappears - rrdd will always try to archive to the master. However, when a host joins a pool as a slave rrdd is not restarted so this knowledge is out of date. When a VM shuts down on the slave rrdd will archive the rrd locally. When starting this VM again the master xapi will attempt to push any locally-existing rrd to the host on which the VM is being started, but since no rrd archive exists on the master the slave rrdd will end up creating a new rrd and the previous rrd will be lost.

  • rrdd handles rebooting VMs unpredictably

When rebooting a VM, there is a chance rrdd will attempt to update that VM’s rrd during the brief period when there is no domain for that VM. If this happens, rrdd will archive the VM’s rrd to the master, and then create a new rrd for the VM when it sees the new domain. If rrdd doesn’t attempt to update that VM’s rrd during this period, rrdd will continue to add data for the new domain to the old rrd.

Proposal

To solve these problems, we will remove some of the intelligence from rrdd and make it into more of a slave process of xapi. This will entail removing all knowledge from rrdd of whether it is running on a master or a slave, and also modifying rrdd to only start monitoring a VM when it is told to, and only archiving an rrd (to a specified address) when it is told to. This matches the way xenopsd only manages domains which it has been told to manage.

Design

For most VM lifecycle operations, xapi and rrdd processes (sometimes across more than one host) cooperate to start or stop recording a VM’s metrics and/or to restore or backup the VM’s archived metrics. Below we will describe, for each relevant VM operation, how the VM’s rrd is currently handled, and how we propose it will be handled after the redesign.

VM.destroy

The master xapi makes a remove_rrd call to the local rrdd, which causes rrdd to to delete the VM’s archived rrd from disk. This behaviour will remain unchanged.

VM.start(_on) and VM.resume(_on)

The master xapi makes a push_rrd call to the local rrdd, which causes rrdd to send any locally-archived rrd for the VM in question to the rrdd of the host on which the VM is starting. This behaviour will remain unchanged.

VM.shutdown and VM.suspend

Every update cycle rrdd compares its list of registered VMs to the list of domains actually running on the host. Any registered VMs which do not have a corresponding domain have their rrds archived to the rrdd running on the host believed to be the master. We will change this behaviour by stopping rrdd from doing the archiving itself; instead we will expose a new function in rrdd’s interface:

val archive_rrd : vm_uuid:string -> remote_address:string -> unit

This will cause rrdd to remove the specified rrd from its table of registered VMs, and archive the rrd to the specified host. When a VM has finished shutting down or suspending, the xapi process on the host on which the VM was running will call archive_rrd to ask the local rrdd to archive back to the master rrdd.

VM.reboot

Removing rrdd’s ability to automatically archive the rrds for disappeared domains will have the bonus effect of fixing how the rrds of rebooting VMs are handled, as we don’t want the rrds of rebooting VMs to be archived at all.

VM.checkpoint

This will be handled automatically, as internally VM.checkpoint carries out a VM.suspend followed by a VM.resume.

VM.pool_migrate and VM.migrate_send

The source host’s xapi makes a migrate_rrd call to the local rrd, with a destination address and an optional session ID. The session ID is only required for cross-pool migration. The local rrdd sends the rrd for that VM to the destination host’s rrdd as an HTTP PUT. This behaviour will remain unchanged.

Design document
Revisionv1
Statusreleased (7.0)
Revision history
v1Initial version

RRDD plugin protocol v2

Motivation

rrdd plugins currently report datasources via a shared-memory file, using the following format:

DATASOURCES
000001e4
dba4bf7a84b6d11d565d19ef91f7906e
{
  "timestamp": 1339685573,
  "data_sources": {
    "cpu-temp-cpu0": {
      "description": "Temperature of CPU 0",
      "type": "absolute",
      "units": "degC",
      "value": "64.33"
      "value_type": "float",
    },
    "cpu-temp-cpu1": {
      "description": "Temperature of CPU 1",
      "type": "absolute",
      "units": "degC",
      "value": "62.14"
      "value_type": "float",
    }
  }
}

This format contains four main components:

  • A constant header string

DATASOURCES

This should always be present.

  • The JSON data length, encoded as hexadecimal

000001e4

  • The md5sum of the JSON data

dba4bf7a84b6d11d565d19ef91f7906e

  • The JSON data itself, encoding the values and metadata associated with the reported datasources.
{
  "timestamp": 1339685573,
  "data_sources": {
    "cpu-temp-cpu0": {
      "description": "Temperature of CPU 0",
      "type": "absolute",
      "units": "degC",
      "value": "64.33"
      "value_type": "float",
    },
    "cpu-temp-cpu1": {
      "description": "Temperature of CPU 1",
      "type": "absolute",
      "units": "degC",
      "value": "62.14"
      "value_type": "float",
    }
  }
}

The disadvantage of this protocol is that rrdd has to parse the entire JSON structure each tick, even though most of the time only the values will change.

For this reason a new protocol is proposed.

Protocol V2

valuebitsformatnotes
header string(string length)*8string“Datasources” as in the V1 protocol
data checksum32int32binary-encoded crc32 of the concatenation of the encoded timestamp and datasource values
metadata checksum32int32binary-encoded crc32 of the metadata string (see below)
number of datasources32int32only needed if the metadata has changed - otherwise RRDD can use a cached value
timestamp64int64Unix epoch
datasource valuesn * 64int64n is the number of datasources exported by the plugin
metadata length32int32
metadata(string length)*8string

All integers are bigendian. The metadata will have the same JSON-based format as in the V1 protocol, minus the timestamp and value key-value pair for each datasource, for example:

{
  "datasources": {
    "memory_reclaimed": {
      "description":"Host memory reclaimed by squeezed",
      "owner":"host",
      "value_type":"int64",
      "type":"absolute",
      "default":"true",
      "units":"B",
      "min":"-inf",
      "max":"inf"
    },
    "memory_reclaimed_max": {
      "description":"Host memory that could be reclaimed by squeezed",
      "owner":"host",
      "value_type":"int64",
      "type":"absolute",
      "default":"true",
      "units":"B",
      "min":"-inf",
      "max":"inf"
    }
  }
}

The above formatting is not required, but added here for readability.

Reading algorithm

if header != expected_header:
    raise InvalidHeader()
if data_checksum == last_data_checksum:
    raise NoUpdate()
if data_checksum != md5sum(encoded_timestamp_and_values):
    raise InvalidChecksum()
if metadata_checksum == last_metadata_checksum:
    for datasource, value in cached_datasources, values:
        update(datasource, value)
else:
    if metadata_checksum != md5sum(metadata):
        raise InvalidChecksum()
    cached_datasources = create_datasources(metadata)
    for datasource, value in cached_datasources, values:
        update(datasource, value)

This means that for a normal update, RRDD will only have to read the header plus the first (16 + 16 + 4 + 8 + 8*n) bytes of data, where n is the number of datasources exported by the plugin. If the metadata changes RRDD will have to read all the data (and parse the metadata).

n.b. the timestamp reported by plugins is not currently used by RRDD - it uses its own global timestamp.

Design document
Revisionv11
Statusconfirmed
Review#139
Revision history
v1Initial version
v2Added details about the VDI's binary format and size, and the SR capability name.
v3Tar was not needed after all!
v4Add details about discovering the VDI using a new vdi_type.
v5Add details about the http handlers and interaction with xapi's database
v6Add details about the framing of the data within the VDI
v7Redesign semantics of the rrd_updates handler
v8Redesign semantics of the rrd_updates handler (again)
v9Magic number change in framing format of vdi
v10Add details of new APIs added to xapi and xcp-rrdd
v11Remove unneeded API calls

SR-Level RRDs

Introduction

Xapi has RRDs to track VM- and host-level metrics. There is a desire to have SR-level RRDs as a new category, because SR stats are not specific to a certain VM or host. Examples are size and free space on the SR. While recording SR metrics is relatively straightforward within the current RRD system, the main question is where to archive them, which is what this design aims to address.

Stats Collection

All SR types, including the existing ones, should be able to have RRDs defined for them. Some RRDs, such as a “free space” one, may make sense for multiple (if not all) SR types. However, the way to measure something like free space will be SR specific. Furthermore, it should be possible for each type of SR to have its own specialised RRDs.

It follows that each SR will need its own xcp-rrdd plugin, which runs on the SR master and defines and collects the stats. For the new thin-lvhd SR this could be xenvmd itself. The plugin registers itself with xcp-rrdd, so that the latter records the live stats from the plugin into RRDs.

Archiving

SR-level RRDs will be archived in the SR itself, in a VDI, rather than in the local filesystem of the SR master. This way, we don’t need to worry about master failover.

The VDI will be 4MB in size. This is a little more space than we would need for the RRDs we have in mind at the moment, but will give us enough headroom for the foreseeable future. It will not have a filesystem on it for simplicity and performance. There will only be one RRD archive file for each SR (possibly containing data for multiple metrics), which is gzipped by xcp-rrdd, and can be copied onto the VDI.

There will be a simple framing format for the data on the VDI. This will be as follows:

OffsetTypeNameComment
032 bit network-order intmagicMagic number = 0x7ada7ada
432 bit network-order intversion1
832 bit network-order intlengthlength of payload
12gzipped datadata

Xapi will be in charge of the lifecycle of this VDI, not the plugin or xcp-rrdd, which will make it a little easier to manage them. Only xapi will attach/detach and read from/write to this VDI. We will keep xcp-rrdd as simple as possible, and have it archive to its standard path in the local file system. Xapi will then copy the RRDs in and out of the VDI.

A new value "rrd" in the vdi_type enum of the datamodel will be defined, and the VDI.type of the VDI will be set to that value. The storage backend will write the VDI type to the LVM metadata of the VDI, so that xapi can discover the VDI containing the SR-level RRDs when attaching an SR to a new pool. This means that SR-level RRDs are currently restricted to LVM SRs.

Because we will not write plugins for all SRs at once, and therefore do not need xapi to set up the VDI for all SRs, we will add an SR “capability” for the backends to be able to tell xapi whether it has the ability to record stats and will need storage for them. The capability name will be: SR_STATS.

Management of the SR-stats VDI

The SR-stats VDI will be attached/detached on PBD.plug/unplug on the SR master.

  • On PBD.plug on the SR master, if the SR has the stats capability, xapi:

    • Creates a stats VDI if not already there (search for an existing one based on the VDI type).
    • Attaches the stats VDI if it did already exist, and copies the RRDs to the local file system (standard location in the filesystem; asks xcp-rrdd where to put them).
    • Informs xcp-rrdd about the RRDs so that it will load the RRDs and add newly recorded data to them (needs a function like push_rrd_local for VM-level RRDs).
    • Detaches stats VDI.
  • On PBD.unplug on the SR master, if the SR has the stats capability xapi:

    • Tells xcp-rrdd to archive the RRDs for the SR, which it will do to the local filesystem.
    • Attaches the stats VDI, copies the RRDs into it, detaches VDI.

Periodic Archiving

Xapi’s periodic scheduler regularly triggers xcp-rrdd to archive the host and VM RRDs. It will need to do this for the SR ones as well. Furthermore, xapi will need to attach the stats VDI and copy the RRD archives into it (as on PBD.unplug).

Exporting

There will be a new handler for downloading an SR RRD:

http://<server>/sr_rrd?session_id=<SESSION HANDLE>&uuid=<SR UUID>

RRD updates are handled via a single handler for the host, VM and SR UUIDs RRD updates for the host, VMs and SRs are handled by a a single handler at /rrd_updates. Exactly what is returned will be determined by the parameters passed to this handler.

Whether the host RRD updates are returned is governed by the presence of host=true in the parameters. host=<anything else> or the absence of the host key will mean the host RRD is not returned.

Whether the VM RRD updates are returned is governed by the vm_uuid key in the URL parameters. vm_uuid=all will return RRD updates for all VM RRDs. vm_uuid=xxx will return the RRD updates for the VM with uuid xxx only. If vm_uuid is none (or any other string which is not a valid VM UUID) then the handler will return no VM RRD updates. If the vm_uuid key is absent, RRD updates for all VMs will be returned.

Whether the SR RRD updates are returned is governed by the sr_uuid key in the URL parameters. sr_uuid=all will return RRD updates for all SR RRDs. sr_uuid=xxx will return the RRD updates for the SR with uuid xxx only. If sr_uuid is none (or any other string which is not a valid SR UUID) then the handler will return no SR RRD updates. If the sr_uuid key is absent, no SR RRD updates will be returned.

It will be possible to mix and match these parameters; for example to return RRD updates for the host and all VMs, the URL to use would be:

http://<server>/rrd_updates?session_id=<SESSION HANDLE>&start=10258122541&host=true&vm_uuid=all&sr_uuid=none

Or, to return RRD updates for all SRs but nothing else, the URL to use would be:

http://<server>/rrd_updates?session_id=<SESSION HANDLE>&start=10258122541&host=false&vm_uuid=none&sr_uuid=all

While behaviour is defined if any of the keys host, vm_uuid and sr_uuid is missing, this is for backwards compatibility and it is recommended that clients specify each parameter explicitly.

Database updating.

If the SR is presenting a data source called ‘physical_utilisation’, xapi will record this periodically in its database. In order to do this, xapi will fork a thread that, every n minutes (2 suggested, but open to suggestions here), will query the attached SRs, then query RRDD for the latest data source for these, and update the database.

The utilisation of VDIs will not be updated in this way until scalability worries for RRDs are addressed.

Xapi will cache whether it is SR master for every attached SR and only attempt to update if it is the SR master.

New APIs.

xcp-rrdd:

  • Get the filesystem location where sr rrds are archived: val sr_rrds_path : uid:string -> string

  • Archive the sr rrds to the filesystem: val archive_sr_rrd : sr_uuid:string -> unit

  • Load the sr rrds from the filesystem: val push_sr_rrd : sr_uuid:string -> unit