Linux – KVM graphics card pass-through or other device pass-through

The CPU used here is I9-9900K, the graphics card is 2080Ti, and the host system and virtual machine system are Debian GNU/Linux 12.

First, make sure that the motherboard supports the VT-d function and enable it in the BIOS.

Start IOMMU

IOMMU is the common name for Intel VT-d and AMD-Vi.

Enabling IOMMU in the host system is as simple as giving the Linux kernel an additional boot parameter.

Can be modified directly /etc/default/grub middle GRUB_CMDLINE_LINUX_DEFAULT item.

GRUB_CMDLINE_LINUX_DEFAULT="quiet"

Just add the IOMMU parameters at the end.

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"

Then update GRUB to make the configuration take effect.

$ sudo update-grub

Then restart the computer. If you get results similar to the following, it means that IOMMU is successfully enabled.

$ sudo dmesg | grep -i iommu
...
[    0.018093] DMAR: IOMMU enabled
...
[    0.467648] pci 0000:00:00.0: Adding to iommu group 0
[    0.467660] pci 0000:00:01.0: Adding to iommu group 1
[    0.467674] pci 0000:00:14.0: Adding to iommu group 2
[    0.467682] pci 0000:00:14.2: Adding to iommu group 2
[    0.467690] pci 0000:00:14.3: Adding to iommu group 3
[    0.467701] pci 0000:00:16.0: Adding to iommu group 4
[    0.467716] pci 0000:00:1b.0: Adding to iommu group 5
[    0.467726] pci 0000:00:1b.4: Adding to iommu group 6
...

View IOMMU Group

IOMMU Group is a grouping on hardware implementation.

An IOMMU Group is the smallest unit that passes physical devices to a virtual machine. If different devices are assigned to an IOMMU Group, we must pass them all to the virtual machine, otherwise it will cause IO bus conflicts and may cause the host to crash.

Next we can get the IOMMU Group grouping status through commands. Only the devices we care about are kept here.

$ sudo dmesg | grep -i iommu
...
[    0.459200] pci 0000:00:01.0: Adding to iommu group 1
...
[    0.459359] pci 0000:01:00.0: Adding to iommu group 1
[    0.459363] pci 0000:01:00.1: Adding to iommu group 1
[    0.459367] pci 0000:01:00.2: Adding to iommu group 1
[    0.459371] pci 0000:01:00.3: Adding to iommu group 1
...

Then enumerate PCI devices with the following command.

$ lspci
...
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 0a)
...
01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
...

among them 00:01.0 is the PCI root device, representing the PCI controller provided by the CPU, and 01:00.0 ~ 01:00.3 These are all graphics cards we want to pass through. In particular, the 2080Ti comes with a USB controller, sound card and other devices, and we must pass them all through to the virtual machine.

Reserve PCI Device

Here we use vfio-pci plan. If it is an older system, you may use pci-stub more convenient.

If the device used by the client is plugged into the PCI-E slot provided by the CPU, such as the 2080Ti above, where the PCI root device 00:01.0 is part of the IOMMU Group, thenNotTo bind a root device to vfio-pci .

The existing driver mechanism of Linux is that when initializing, each driver looks for the hardware that has not been initialized and is concerned about it.

So we need to use it before other drivers vfio-pci Take over the device.

The first step is to use the command to view the driver occupancy of the current device and the device ID.

$ lspci -nnv
...
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] [10de:1e04] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: GALAX TU102 [GeForce RTX 2080 Ti] [1b4c:12ae]
    Flags: bus master, fast devsel, latency 0, IRQ 187, IOMMU group 1
    Memory at a4000000 (32-bit, non-prefetchable) [size=16M]
    Memory at 90000000 (64-bit, prefetchable) [size=256M]
    Memory at a2000000 (64-bit, prefetchable) [size=32M]
    I/O ports at 4000 [size=128]
    Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: <access denied>
    Kernel driver in use: nouveau
    Kernel modules: nouveau

01:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
    Subsystem: GALAX TU102 High Definition Audio Controller [1b4c:12ae]
    Flags: bus master, fast devsel, latency 0, IRQ 17, IOMMU group 1
    Memory at a5080000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel

01:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:1ad6] (rev a1) (prog-if 30 [XHCI])
    Subsystem: GALAX TU102 USB 3.1 Host Controller [1b4c:12ae]
    Flags: fast devsel, IRQ 147, IOMMU group 1
    Memory at a0000000 (64-bit, prefetchable) [size=256K]
    Memory at a0040000 (64-bit, prefetchable) [size=64K]
    Capabilities: <access denied>
    Kernel driver in use: xhci_hcd
    Kernel modules: xhci_pci

01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:1ad7] (rev a1)
    Subsystem: GALAX TU102 USB Type-C UCSI Controller [1b4c:12ae]
    Flags: bus master, fast devsel, latency 0, IRQ 11, IOMMU group 1
    Memory at a5084000 (32-bit, non-prefetchable) [size=4K]
    Capabilities: <access denied>
...

Each device name is enclosed in square brackets. [10de:1e04] etc. represents the ID of the device.

under each device Kernel driver in use: Represents the driver that owns the device.

So obviously our graphics card is currently driven by nouveau,snd_hda_intelxhci_hcd occupied.

So in order to let vfio-pci To take over the device, we first need to start the relevant modules.

Directly at /etc/initramfs-tools/modules Just add the corresponding module name.

# List of modules that you want to include in your initramfs.
# They will be loaded at boot time in the order below.
...
vfio_pci
vfio
vfio_iommu_type1

then in /etc/modprobe.d Add in vfio-pci configuration file vfio.conf .

# Bind GPU device to vfio-pci for virtual machine.
options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7 disable_vga=1

# Ensure that vfio-pci is initialized before GPU driver.
softdep nouveau       pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
softdep xhci_pci      pre: vfio-pci

among them ids The option represents the device ID we want to occupy, which can be obtained using the command mentioned above.

disable_vga Indicates that it is prohibited to use this graphics card as a VGA device in the host system. You can use the following command to check whether a graphics card is used as a VGA device in the host system, where you get 0 means no, get 1 means yes.

$ cat /sys/bus/pci/devices/0000\:00\:02.0/boot_vga
1
$ cat /sys/bus/pci/devices/0000\:01\:00.0/boot_vga
0

The former is a core display, and the latter is a graphics card that requires pass-through.

Some BIOS default settings will turn off core graphics when a discrete graphics card is present, or give priority to using a high-performance graphics card as the host system VGA device.

The bottom three lines softdep express to make vfio-pci The module is loaded before other modules that may occupy the device, ensuring that the device can be occupied earlier and preventing failure to occupy it. The three modules here correspond to the three drivers previously queried.

Finally, use the following command to make the above two configuration files take effect.

$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-6.1.0-10-amd64

You can use the command again to check whether all devices are vfio-pci Proper possession.

$ lspci -nnv
...
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] [10de:1e04] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: GALAX TU102 [GeForce RTX 2080 Ti] [1b4c:12ae]
    Flags: fast devsel, IRQ 255, IOMMU group 2
    Memory at a4000000 (32-bit, non-prefetchable) [disabled] [size=16M]
    Memory at 90000000 (64-bit, prefetchable) [disabled] [size=256M]
    Memory at a0000000 (64-bit, prefetchable) [disabled] [size=32M]
    I/O ports at 4000 [disabled] [size=128]
    Expansion ROM at a5000000 [disabled] [size=512K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci
    Kernel modules: nouveau

01:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
    Subsystem: GALAX TU102 High Definition Audio Controller [1b4c:12ae]
    Flags: fast devsel, IRQ 255, IOMMU group 2
    Memory at a5080000 (32-bit, non-prefetchable) [disabled] [size=16K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

01:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:1ad6] (rev a1) (prog-if 30 [XHCI])
    Subsystem: GALAX TU102 USB 3.1 Host Controller [1b4c:12ae]
    Flags: bus master, fast devsel, latency 0, IRQ 18, IOMMU group 2
    Memory at a2000000 (64-bit, prefetchable) [size=256K]
    Memory at a2040000 (64-bit, prefetchable) [size=64K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci
    Kernel modules: xhci_pci

01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:1ad7] (rev a1)
    Subsystem: GALAX TU102 USB Type-C UCSI Controller [1b4c:12ae]
    Flags: fast devsel, IRQ 255, IOMMU group 2
    Memory at a5084000 (32-bit, non-prefetchable) [disabled] [size=4K]
    Capabilities: <access denied>
    Kernel driver in use: vfio-pci
...

Ensure all devices requiring pass-through Kernel driver in use: for vfio-pci Just fine.

If it is different than expected, retry the above operation. Do not proceed directly to the next step, otherwise the hardware may be damaged.

Configure KVM

Use it directly here virt-install Create a virtual machine.

$ sudo virt-install \
        --name=linux-vm \
        --os-variant=debian11 \
        --vcpu=4 \
        --ram=16384 \
        --disk path=/vol/kvm/linux-vm.img,size=128 \
        --graphics vnc,listen=0.0.0.0 \
        --cdrom /vol/kvm/debian-12.0.0-amd64-DVD-1.iso \
        --features kvm_hidden=on \
        --network bridge:virbr0 \
        --boot uefi

Starting install...
Allocating 'linux-vm.img' |  18 MB  00:00:05
Creating domain...        |    0 B  00:00:00
Running graphical console command: virt-viewer --connect qemu:///system --wait linux-vm

Domain is still running. Installation may be in progress.
You can reconnect to the console to complete the installation process.

Pay attention to use kvm_hidden=on Prevent the driver from reporting errors when discovering the virtual machine, and use UEFI to correctly identify the device.

Install the system first, then shut down the virtual machine and use commands to edit the virtual machine XML configuration file.

$ sudo EDITOR=vim virsh edit linux-vm

Add the following configuration to map the pass-through device to the virtual machine.

among them source Of address Indicates the physical location parameters on the host side, which is the one before the device name above. 01:00.0 , used here bus='0x01' slot='0x00' function='0x0' express. And outside address It is the physical location parameter on the virtual machine side. Note that bus Don't conflict with other devices. Because graphics cards are made up of different devices, you need to use multifunction='on' .

<domain type='kvm'>
  ...
  <devices>
    ...
    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x2'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0000' bus='0x01' slot='0x00' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x3'/>
    </hostdev>
  </devices>
</domain>

Then start the virtual machine and install the device driver. Note that when installing the NVIDIA official driver, you may be prompted to install some software packages.

$ sudo apt-get install gcc
$ sudo apt-get install make
$ sudo apt-get install linux-headers-[uname -r] build-essential

If during the installation process it is prompted that the kernel loading failed due to Secure Boot.

ERROR: The kernel module failed to load. Secure boot is enabled on this system, so this is likely because it was not signed by a key that is trusted by the kernel. Please try installing the driver again, and sign the kernel module when prompted to do so.

Or found after the installation is complete nvidia-smi Driver not found.

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

This problem can be solved by editing the virtual machine XML configuration file and turning off Secure Boot. That is, add the following items.

<domain type='kvm'>
  ...
  <os firmware='efi'>
    ...
    <firmware>
      <feature enabled='no' name='secure-boot'/>
    </firmware>
  </os>
</domain>

Note that after modifying the BIOS related configuration, you need to update the NVRAM, that is, add the following parameters when starting for the first time.

$ sudo virsh start linux-vm --reset-nvram
Domain 'linux-vm' started

If you update NVRAM after installing the system, the BIOS boot may be lost, and you need to enter the BIOS to manually boot from the file (Boot From File).

Finally, use nvidia-smi Find the device and you're done.

$ nvidia-smi
Wed Aug 16 15:18:04 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98                 Driver Version: 535.98       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:05:00.0 Off |                  N/A |
| 33%   57C    P0              65W / 250W |      0MiB / 11264MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

reference

Post Reply