The CPU used here is I9-9900K, the graphics card is 2080Ti, and the host system and virtual machine system are Debian GNU/Linux 12.
First, make sure that the motherboard supports the VT-d function and enable it in the BIOS.
Start IOMMU
IOMMU is the common name for Intel VT-d and AMD-Vi.
Enabling IOMMU in the host system is as simple as giving the Linux kernel an additional boot parameter.
Can be modified directly /etc/default/grub
middle GRUB_CMDLINE_LINUX_DEFAULT
item.
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
Just add the IOMMU parameters at the end.
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
Then update GRUB to make the configuration take effect.
$ sudo update-grub
Then restart the computer. If you get results similar to the following, it means that IOMMU is successfully enabled.
$ sudo dmesg | grep -i iommu
...
[ 0.018093] DMAR: IOMMU enabled
...
[ 0.467648] pci 0000:00:00.0: Adding to iommu group 0
[ 0.467660] pci 0000:00:01.0: Adding to iommu group 1
[ 0.467674] pci 0000:00:14.0: Adding to iommu group 2
[ 0.467682] pci 0000:00:14.2: Adding to iommu group 2
[ 0.467690] pci 0000:00:14.3: Adding to iommu group 3
[ 0.467701] pci 0000:00:16.0: Adding to iommu group 4
[ 0.467716] pci 0000:00:1b.0: Adding to iommu group 5
[ 0.467726] pci 0000:00:1b.4: Adding to iommu group 6
...
View IOMMU Group
IOMMU Group is a grouping on hardware implementation.
An IOMMU Group is the smallest unit that passes physical devices to a virtual machine. If different devices are assigned to an IOMMU Group, we must pass them all to the virtual machine, otherwise it will cause IO bus conflicts and may cause the host to crash.
Next we can get the IOMMU Group grouping status through commands. Only the devices we care about are kept here.
$ sudo dmesg | grep -i iommu
...
[ 0.459200] pci 0000:00:01.0: Adding to iommu group 1
...
[ 0.459359] pci 0000:01:00.0: Adding to iommu group 1
[ 0.459363] pci 0000:01:00.1: Adding to iommu group 1
[ 0.459367] pci 0000:01:00.2: Adding to iommu group 1
[ 0.459371] pci 0000:01:00.3: Adding to iommu group 1
...
Then enumerate PCI devices with the following command.
$ lspci
...
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 0a)
...
01:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU102 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU102 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller: NVIDIA Corporation TU102 USB Type-C UCSI Controller (rev a1)
...
among them 00:01.0
is the PCI root device, representing the PCI controller provided by the CPU, and 01:00.0
~ 01:00.3
These are all graphics cards we want to pass through. In particular, the 2080Ti comes with a USB controller, sound card and other devices, and we must pass them all through to the virtual machine.
Reserve PCI Device
Here we use vfio-pci
plan. If it is an older system, you may use pci-stub
more convenient.
If the device used by the client is plugged into the PCI-E slot provided by the CPU, such as the 2080Ti above, where the PCI root device 00:01.0 is part of the IOMMU Group, thenNotTo bind a root device to
vfio-pci
.
The existing driver mechanism of Linux is that when initializing, each driver looks for the hardware that has not been initialized and is concerned about it.
So we need to use it before other drivers vfio-pci
Take over the device.
The first step is to use the command to view the driver occupancy of the current device and the device ID.
$ lspci -nnv
...
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] [10de:1e04] (rev a1) (prog-if 00 [VGA controller])
Subsystem: GALAX TU102 [GeForce RTX 2080 Ti] [1b4c:12ae]
Flags: bus master, fast devsel, latency 0, IRQ 187, IOMMU group 1
Memory at a4000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=256M]
Memory at a2000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: nouveau
Kernel modules: nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
Subsystem: GALAX TU102 High Definition Audio Controller [1b4c:12ae]
Flags: bus master, fast devsel, latency 0, IRQ 17, IOMMU group 1
Memory at a5080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
01:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:1ad6] (rev a1) (prog-if 30 [XHCI])
Subsystem: GALAX TU102 USB 3.1 Host Controller [1b4c:12ae]
Flags: fast devsel, IRQ 147, IOMMU group 1
Memory at a0000000 (64-bit, prefetchable) [size=256K]
Memory at a0040000 (64-bit, prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:1ad7] (rev a1)
Subsystem: GALAX TU102 USB Type-C UCSI Controller [1b4c:12ae]
Flags: bus master, fast devsel, latency 0, IRQ 11, IOMMU group 1
Memory at a5084000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <access denied>
...
Each device name is enclosed in square brackets. [10de:1e04]
etc. represents the ID of the device.
under each device Kernel driver in use:
Represents the driver that owns the device.
So obviously our graphics card is currently driven by nouveau
,snd_hda_intel
和 xhci_hcd
occupied.
So in order to let vfio-pci
To take over the device, we first need to start the relevant modules.
Directly at /etc/initramfs-tools/modules
Just add the corresponding module name.
# List of modules that you want to include in your initramfs.
# They will be loaded at boot time in the order below.
...
vfio_pci
vfio
vfio_iommu_type1
then in /etc/modprobe.d
Add in vfio-pci
configuration file vfio.conf
.
# Bind GPU device to vfio-pci for virtual machine.
options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7 disable_vga=1
# Ensure that vfio-pci is initialized before GPU driver.
softdep nouveau pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
softdep xhci_pci pre: vfio-pci
among them ids
The option represents the device ID we want to occupy, which can be obtained using the command mentioned above.
disable_vga
Indicates that it is prohibited to use this graphics card as a VGA device in the host system. You can use the following command to check whether a graphics card is used as a VGA device in the host system, where you get 0
means no, get 1
means yes.
$ cat /sys/bus/pci/devices/0000\:00\:02.0/boot_vga
1
$ cat /sys/bus/pci/devices/0000\:01\:00.0/boot_vga
0
The former is a core display, and the latter is a graphics card that requires pass-through.
Some BIOS default settings will turn off core graphics when a discrete graphics card is present, or give priority to using a high-performance graphics card as the host system VGA device.
The bottom three lines softdep
express to make vfio-pci
The module is loaded before other modules that may occupy the device, ensuring that the device can be occupied earlier and preventing failure to occupy it. The three modules here correspond to the three drivers previously queried.
Finally, use the following command to make the above two configuration files take effect.
$ sudo update-initramfs -u
update-initramfs: Generating /boot/initrd.img-6.1.0-10-amd64
You can use the command again to check whether all devices are vfio-pci
Proper possession.
$ lspci -nnv
...
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] [10de:1e04] (rev a1) (prog-if 00 [VGA controller])
Subsystem: GALAX TU102 [GeForce RTX 2080 Ti] [1b4c:12ae]
Flags: fast devsel, IRQ 255, IOMMU group 2
Memory at a4000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Memory at 90000000 (64-bit, prefetchable) [disabled] [size=256M]
Memory at a0000000 (64-bit, prefetchable) [disabled] [size=32M]
I/O ports at 4000 [disabled] [size=128]
Expansion ROM at a5000000 [disabled] [size=512K]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
Subsystem: GALAX TU102 High Definition Audio Controller [1b4c:12ae]
Flags: fast devsel, IRQ 255, IOMMU group 2
Memory at a5080000 (32-bit, non-prefetchable) [disabled] [size=16K]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
01:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Host Controller [10de:1ad6] (rev a1) (prog-if 30 [XHCI])
Subsystem: GALAX TU102 USB 3.1 Host Controller [1b4c:12ae]
Flags: bus master, fast devsel, latency 0, IRQ 18, IOMMU group 2
Memory at a2000000 (64-bit, prefetchable) [size=256K]
Memory at a2040000 (64-bit, prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
Kernel modules: xhci_pci
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 USB Type-C UCSI Controller [10de:1ad7] (rev a1)
Subsystem: GALAX TU102 USB Type-C UCSI Controller [1b4c:12ae]
Flags: fast devsel, IRQ 255, IOMMU group 2
Memory at a5084000 (32-bit, non-prefetchable) [disabled] [size=4K]
Capabilities: <access denied>
Kernel driver in use: vfio-pci
...
Ensure all devices requiring pass-through Kernel driver in use:
for vfio-pci
Just fine.
If it is different than expected, retry the above operation. Do not proceed directly to the next step, otherwise the hardware may be damaged.
Configure KVM
Use it directly here virt-install
Create a virtual machine.
$ sudo virt-install \
--name=linux-vm \
--os-variant=debian11 \
--vcpu=4 \
--ram=16384 \
--disk path=/vol/kvm/linux-vm.img,size=128 \
--graphics vnc,listen=0.0.0.0 \
--cdrom /vol/kvm/debian-12.0.0-amd64-DVD-1.iso \
--features kvm_hidden=on \
--network bridge:virbr0 \
--boot uefi
Starting install...
Allocating 'linux-vm.img' | 18 MB 00:00:05
Creating domain... | 0 B 00:00:00
Running graphical console command: virt-viewer --connect qemu:///system --wait linux-vm
Domain is still running. Installation may be in progress.
You can reconnect to the console to complete the installation process.
Pay attention to use kvm_hidden=on
Prevent the driver from reporting errors when discovering the virtual machine, and use UEFI to correctly identify the device.
Install the system first, then shut down the virtual machine and use commands to edit the virtual machine XML configuration file.
$ sudo EDITOR=vim virsh edit linux-vm
Add the following configuration to map the pass-through device to the virtual machine.
among them source
Of address
Indicates the physical location parameters on the host side, which is the one before the device name above. 01:00.0
, used here bus='0x01' slot='0x00' function='0x0'
express. And outside address
It is the physical location parameter on the virtual machine side. Note that bus
Don't conflict with other devices. Because graphics cards are made up of different devices, you need to use multifunction='on'
.
<domain type='kvm'>
...
<devices>
...
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</source>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0' multifunction='on'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x1'/>
</source>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x1'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x2'/>
</source>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x2'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<source>
<address domain='0x0000' bus='0x01' slot='0x00' function='0x3'/>
</source>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x3'/>
</hostdev>
</devices>
</domain>
Then start the virtual machine and install the device driver. Note that when installing the NVIDIA official driver, you may be prompted to install some software packages.
$ sudo apt-get install gcc
$ sudo apt-get install make
$ sudo apt-get install linux-headers-[uname -r] build-essential
If during the installation process it is prompted that the kernel loading failed due to Secure Boot.
ERROR: The kernel module failed to load. Secure boot is enabled on this system, so this is likely because it was not signed by a key that is trusted by the kernel. Please try installing the driver again, and sign the kernel module when prompted to do so.
Or found after the installation is complete nvidia-smi
Driver not found.
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
This problem can be solved by editing the virtual machine XML configuration file and turning off Secure Boot. That is, add the following items.
<domain type='kvm'>
...
<os firmware='efi'>
...
<firmware>
<feature enabled='no' name='secure-boot'/>
</firmware>
</os>
</domain>
Note that after modifying the BIOS related configuration, you need to update the NVRAM, that is, add the following parameters when starting for the first time.
$ sudo virsh start linux-vm --reset-nvram
Domain 'linux-vm' started
If you update NVRAM after installing the system, the BIOS boot may be lost, and you need to enter the BIOS to manually boot from the file (Boot From File).
Finally, use nvidia-smi
Find the device and you're done.
$ nvidia-smi
Wed Aug 16 15:18:04 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98 Driver Version: 535.98 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:05:00.0 Off | N/A |
| 33% 57C P0 65W / 250W | 0MiB / 11264MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+