Design document | |
---|---|
Revision | v1 |
Status | released (6.0) |
Review | create new issue |
This document contains the software design for GPU pass-through. This code was originally included in the version of Xapi used in XenServer 6.0.
Rather than modelling GPU pass-through from a PCI perspective, and having the user manipulate PCI devices directly, we are taking a higher-level view by introducing a dedicated graphics model. The graphics model is similar to the networking and storage model, in which virtual and physical devices are linked through an intermediate abstraction layer (e.g. the “Network” class in the networking model).
The basic graphics model is as follows:
Currently, the following restrictions apply:
The design introduces a new generic class called PCI to capture state and information about relevant PCI devices in a host. By default, xapi would not create PCI objects for all PCI devices, but only for the ones that are managed and configured by xapi; currently only GPU devices.
The PCI class has no fields specific to the type of the PCI device (e.g. a graphics card or NIC). Instead, device specific objects will contain a link to their underlying PCI device’s object.
The new XenAPI classes and changes to existing classes are detailed below.
Fields:
Name | Type | Description |
---|---|---|
uuid | string | Unique identifier/object reference. |
class_id | string | PCI class ID (hidden field) |
class_name | string | PCI class name (GPU, NIC, …) |
vendor_id | string | Vendor ID (hidden field). |
vendor_name | string | Vendor name. |
device_id | string | Device ID (hidden field). |
device_name | string | Device name. |
host | host ref | The host that owns the PCI device. |
pci_id | string | BDF (domain/Bus/Device/Function identifier) of the (physical) PCI function, e.g. “0000:00:1a.1”. The format is hhhh:hh:hh.h, where h is a hexadecimal digit. |
functions | int | Number of (physical + virtual) functions; currently fixed at 1 (hidden field). |
attached_VMs | VM ref set | List of VMs that have this PCI device “currently attached”, i.e. plugged, i.e. passed-through to (hidden field). |
dependencies | PCI ref set | List of dependent PCI devices: all of these need to be passed-thru to the same VM (co-location). |
other_config | (string -> string) map | Additional optional configuration (as usual). |
Hidden fields are only for use by xapi internally, and not visible to XenAPI users.
Messages: none.
A physical GPU device (pGPU).
Fields:
Name | Type | Description |
---|---|---|
uuid | string | Unique identifier/object reference. |
PCI | PCI ref | Link to the underlying PCI device. |
other_config | (string -> string) map | Additional optional configuration (as usual). |
host | host ref | The host that owns the GPU. |
GPU_group | GPU_group ref | GPU group the pGPU is contained in. Can be Null. |
Messages: none.
A group of identical GPUs across hosts. A VM that is associated with a GPU group can use any of the GPUs in the group. A VM does not need to install new GPU drivers if moving from one GPU to another one in the same GPU group.
Fields:
Name | Type | Description |
---|---|---|
VGPUs | VGPU ref set | List of vGPUs in the group. |
uuid | string | Unique identifier/object reference. |
PGPUs | PGPU ref set | List of pGPUs in the group. |
other_config | (string -> string) map | Additional optional configuration (as usual). |
name_label | string | A human-readable name. |
name_description | string | A notes field containing human-readable description. |
GPU_types | string set | List of GPU types (vendor+device ID) that can be in this group (hidden field). |
Messages: none.
A virtual GPU device (vGPU).
Fields:
Name | Type | Description |
---|---|---|
uuid | string | Unique identifier/object reference. |
VM | VM ref | VM that owns the vGPU. |
GPU_group | GPU_group ref | GPU group the vGPU is contained in. |
currently_attached | bool | Reflects whether the virtual device is currently “connected” to a physical device. |
device | string | Order in which the devices are plugged into the VM. Restricted to “0” for now. |
other_config | (string -> string) map | Additional optional configuration (as usual). |
Messages:
Prototype | Description | |
---|---|---|
VGPU ref create (GPU_group ref, string, VM ref) | Manually assign the vGPU device to the VM given a device number, and link it to the given GPU group. | |
void destroy (VGPU ref) | Remove the association between the GPU group and the VM. |
It is possible to assign more vGPUs to a group than number number of pGPUs in the group. When a VM is started, a pGPU must be available; if not, the VM will not start. Therefore, to guarantee that a VM has access to a pGPU at any time, one must manually enforce that the number of vGPUs in a GPU group does not exceed the number of pGPUs. XenCenter might display a warning, or simply refuse to assign a vGPU, if this constraint is violated. This is analogous to the handling of memory availability in a pool: a VM may not be able to start if there is no host having enough free memory.
Fields:
PCI_bus
fieldVGPU ref set VGPUs
: List of vGPUs.PCI ref set attached_PCIs
: List of PCI devices that are
“currently attached” (plugged, passed-through) (hidden field).Fields:
PCI ref set PCIs
: List of PCI devices.PGPU ref set PGPUs
: List of physical GPU devices.(string -> string) map chipset_info
, which contains at
least the key iommu
. If "true"
, this key indicates whether the
host has IOMMU/VT-d support build in, and this functionality is
enabled by Xen; the value will be "false"
otherwise.(This may not be needed in Xen 4.1. Confirm with Simon.)
Provide a command that does this:
/opt/xensource/libexec/xen-cmdline --set-xen iommu=1
Definitions:
pci_id
,
vendor_id
, and device_id
.First boot and any subsequent xapi start:
host.chipset_info:iommu
accordingly.dependencies
on all PCI objects.VGPU.currently_attached
on all VGPU
objects.For any VMs that have VM.other_config:pci
set to use a GPU, create an
appropriate vGPU, and remove the other_config
option.
A generic PCI interface exposed to higher-level code, such as the networking and GPU management modules within Xapi. This functionality relies on Xenops.
The PCI module exposes the following functions:
PCI.attached_VMs
is
smaller than PCI.functions
.currently_attached
field on dependent VGPU
objects etc.PCI.attached_VMs
.currently_attached
field on dependent VGPU
objects etc.PCI.attached_VMs
.VGPU.create:
VGPU
object in the DB.VGPU.currently_attached = false
.VGPU.destroy:
VGPU.currently_attached = true
and the VM is running.VGPU
object.VM.start(_on):
host.chipset_info:iommu = "false"
, raise VM_REQUIRES_IOMMU.VGPU.currently_attached
to true
. As a side-effect,
any dependent PCI devices would be plugged.VM.shutdown:
VGPU.currently_attached
to false
for all the VM’s VGPUs.VM.suspend, VM.resume(_on):
VGPU
objects, as suspend/resume for VMs with GPUs is currently not
supported.VM.pool_migrate:
VGPU
objects, as live migration for VMs with GPUs is currently not
supported.VM.clone, VM.copy, VM.snapshot:
VGPU
objects along with the VM.VM.import, VM.export:
VGPU
and GPU_group
objects in the VM export format.VM.checkpoint
VGPU
objects, as checkpointing for VMs with GPUs is currently not
supported.Pool join:
PGPU
:
GPU_group
of identical PGPUs, or a new one.VGPU
to the pool together with the VM that owns it, and
add it to the GPU group containing the same PGPU
as before the
join.Step 1 is done automatically by the xapi startup code, and step 2 is handled by the VM export/import code. Hence, no work needed.
Pool eject:
VGPU
objects will be automatically GC’ed when the VMs are removed.PGPU
and GPU_group
objects.Hence, no work needed.
Xapi needs a way to obtain a list of all PCI devices present on a host. For each device, xapi needs to know:
/usr/share/hwdata/pci.ids
)./usr/share/hwdata/pci.ids
).