Tasks

Some operations performed by Xenopsd are blocking, for example:

  • suspend/resume/migration
  • attaching disks (where the SMAPI VDI.attach/activate calls can perform network I/O)

We want to be able to

  • present the user with an idea of progress (perhaps via a “progress bar”)
  • allow the user to cancel a blocked operation that is taking too long
  • associate logging with the user/client-initiated actions that spawned them

Principles

  • all operations which may block (the vast majority) should be written in an asynchronous style i.e. the operations should immediately return a Task id
  • all operations should guarantee to respond to a cancellation request in a bounded amount of time (30s)
  • when cancelled, the system should always be left in a valid state
  • clients are responsible for destroying Tasks when they are finished with the results

Types

A task has a state, which may be Pending, Completed or failed:

	type async_result = unit

	type completion_t = {
		duration : float;
		result : async_result option
	}

	type state =
		| Pending of float
		| Completed of completion_t
		| Failed of Rpc.t

When a task is Failed, we assocate it with a marshalled exception (a value of type Rpc.t). This exception must be one from the set defined in the Xenops_interface. To see how they are marshalled, see Xenops_server.

From the point of view of a client, a Task has the immutable type (which can be queried with a Task.stat):

	type t = {
		id: id;
		dbg: string;
		ctime: float;
		state: state;
		subtasks: (string * state) list;
		debug_info: (string * string) list;
	}

where

  • id is a unique (integer) id generated by Xenopsd. This is how a Task is represented to clients
  • dbg is a client-provided debug key which will be used in log lines, allowing lines from the same Task to be associated together
  • ctime is the creation time
  • state is the current state (Pending/Completed/Failed)
  • subtasks lists logical internal sub-operations for debugging
  • debug_info includes miscellaneous key/value pairs used for debugging

Internally, Xenopsd uses a mutable record type to track Task state. This is broadly similar to the interface type except

  • the state is mutable: this allows Tasks to complete
  • the task contains a “do this now” thunk
  • there is a “cancelling” boolean which is toggled to request a cancellation.
  • there is a list of cancel callbacks
  • there are some fields related to “cancel points”

Persistence

The Tasks are intended to represent activities associated with in-memory queues and threads. Therefore the active Tasks are kept in memory in a map, and will be lost over a process restart. This is desirable since we will also lose the queued items and the threads, so there is no need to resync on start.

Note that every operation must ensure that the state of the system is recoverable on restart by not leaving it in an invalid state. It is not necessary to either guarantee to complete or roll-back a Task. Tasks are not expected to be transactional.

Lifecycle of a Task

All Tasks returned by API functions are created as part of the enqueue functions: queue_operation_*. Even operations which are performed internally are normally wrapped in Tasks by the function immediate_operation.

A queued operation will be processed by one of the queue worker threads. It will

  • set the thread-local debug key to the Task.dbg
  • call task.Xenops_task.run, taking care to catch exceptions and update the task.Xenops_task.state
  • unset the thread-local debug key
  • generate an event on the Task to provoke clients to query the current state.

Task implementations must update their progress as they work. For the common case of a compound operation like VM_start which is decomposed into multiple “micro-ops” (e.g. VM_create VM_build) there is a useful helper function perform_atomics which divides the progress ‘bar’ into sections, where each “micro-op” can have a different size (weight). A progress callback function is passed into each Xenopsd backend function so it can be updated with fine granulatiry. For example note the arguments to B.VM.save

Clients are expected to destroy Tasks they are responsible for creating. Xenopsd cannot do this on their behalf because it does not know if they have successfully queried the Task status/result.

When Xenopsd is a client of itself, it will take care to destroy the Task properly, for example see immediate_operation.

Cancellation

The goal of cancellation is to unstick a blocked operation and to return the system to some valid state, not any valid state in particular. Xenopsd does not treat operations as transactions; when an operation is cancelled it may

  • fully complete (e.g. if it was about to do this anyway)
  • fully abort (e.g. if it had made no progress)
  • enter some other valid state (e.g. if it had gotten half way through)

Xenopsd will never leave the system in an invalid state after cancellation.

Every Xenopsd operation should unblock and return the system to a valid state within a reasonable amount of time after a cancel request. This should be as quick as possible but up to 30s may be acceptable. Bear in mind that a human is probably impatiently watching a UI say “please wait” and which doesn’t have any notion of progress itself. Keep it quick!

Cancellation is triggered by TASK.cancel which calls cancel. This

  • sets the cancelling boolean
  • calls all registered cancel callbacks

Implementations respond to cancellation by

Xenopsd’s libxc backend can block in 2 different ways, and therefore has 2 different types of cancel callback:

  1. cancellable Xenstore watches
  2. cancellable subprocesses

Xenstore watches are used for device hotplug and unplug. Xenopsd has to wait for the backend or for a udev script to do something. If that blocks then we need a way to cancel the watch. The easiest way to cancel a watch is to watch an additional path (a “cancel path”) and delete it, see cancellable_watch. The “cancel paths” are placed within the VM’s Xenstore directory to ensure that cleanup code which does xenstore-rm will automatically “cancel” all outstanding watches. Note that we trigger a cancel by deleting rather than creating, to avoid racing with delete and creating orphaned Xenstore entries.

Subprocesses are used for suspend/resume/migrate. Xenopsd hands file descriptors to libxenguest by running a subprocess and passing the fds to it. Xenopsd therefore gets the process id and can send it a signal to cancel it. See Cancellable_subprocess.run.

Testing with cancel points

Cancellation is difficult to test, as it is completely asynchronous. Therefore Xenopsd has some built-in cancellation testing infrastructure known as “cancel points”. A “cancel point” is a point in the code where a Cancelled exception could be thrown, either by checking the cancelling boolean or as a side-effect of a cancel callback. The check_cancelling function increments a counter every time it passes one of these points, and this value is returned to clients in the Task.debug_info.

A test harness runs a series of operations. Each operation is first run all the way through to completion to discover the total number of cancel points. The operation is then re-run with a request to cancel at a particular point. The test then waits for the system to stabilise and verifies that it appears to be in a valid state.

Preventing Tasks leaking

The client who creates a Task must destroy it when the Task is finished, and they have processed the result. What if a client like xapi is restarted while a Task is running?

We assume that, if xapi is talking to a xenopsd, then xapi completely owns it. Therefore xapi should destroy any completed tasks that it doesn’t recognise.

If a user wishes to manage VMs with xenopsd in parallel with xapi, the user should run a separate xenopsd.