GPU passthrough with libvirt qemu kvm

From Gentoo Wiki
Jump to:navigation Jump to:search

GPU passthrough is a technology that allows the Linux kernel to directly present an internal PCI GPU to a virtual machine.

The device acts as if it were directly driven by the VM, and the VM detects the PCI device as if it were physically connected. GPU passthrough is also often known as IOMMU, although this is a bit of a misnomer, since the IOMMU is the hardware technology that provides this feature but also provides other features such as some protection from DMA attacks or ability to address 64-bit memory spaces with 32-bit addresses.

The most common application for GPU passthrough is gaming, since GPU passthrough allows a VM direct access to the graphics card with the end result of being able to play games with nearly the same performance as bare metal running the operating system.

QEMU (Quick EMUlator) is a generic, open source hardware emulator and virtualization suite.

Note
This article typically uses KVM as the accelerator of choice due to its GPL licensing and availability. Without KVM nearly all commands described here will still work (unless KVM specific).

Installation

BIOS and UEFI firmware

In order to utilize KVM either VT-x or AMD-V must be supported by the processor. VT-x or AMD-V are Intel and AMD's respective technologies for permitting multiple operating systems to concurrently execute operations on the processors.

To inspect hardware for virtualization support issue the following command:

user $grep --color -E "vmx|svm" /proc/cpuinfo

For a period manufacturers were shipping with virtualization turned off by default in the system BIOS

Hardware

  • A CPU that supports Intel VT-d or AMD-Vi. Check List of compatible Intel CPUs (Intel VT-x and Intel VT-d).
  • A motherboard that supports the aforementioned technologies. To find this out, check in your motherboard's BIOS configuration for an option to enable IOMMU or something similar. Chances are that your motherboard will support it if it's from 2013 or newer, but make sure to check since this is a niche technology and some manufacturers may save costs by axing it from their motherboards or delivering a defective implementation (such as Gigabyte's 2015-2016 series) simply because NORPs never use it.
  • At least two GPUs: one for your physical OS, another for your VM. (You can in theory run your computer headless through SSH or a serial console, but it might not work and you risk locking the user away from your computer if done so).
  • Optional but recommended: Additional monitor, keyboard and mouse.

EFI configuration

Go into BIOS (EFI) settings and turn on VT-d and IOMMU support.

Note
VT-d and Virtualization configuration params are same
Note
Some EFI doesn't have IOMMU configuration settings

IOMMU

IOMMU – or input–output memory management unit – is a memory management unit (MMU) that connects a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. The IOMMU maps a device-visible virtual address ( I/O virtual address or IOVA) to a physical memory address. In other words, it translates the IOVA into a real physical address.

In an ideal world, every device would have its own IOVA address space and no two devices would share the same IOVA. But in practice this is often not the case. Moreover, the PCI-Express (PCIe) specifications allow PCIe devices to communicate with each other directly, called peer-to-peer transactions, thereby escaping the IOMMU.

That is where PCI Access Control Services (ACS) come to the rescue. ACS is able to tell whether or not these peer-to-peer transactions are possible between any two or more devices, and can disable them. ACS features are implemented within the CPU and the chipset.

Unfortunately the implementation of ACS varies greatly between different CPU or chip-set models.

IOMMU kernel configuration

To enable IOMMU support in kernel:

KERNEL
Device Drivers --->
  [*] IOMMU Hardware Support --->
            Generic IOMMU Pagetable Support ----
      [*]   AMD IOMMU support
      <*>     AMD IOMMU Version 2 driver
      [*]   Support for Intel IOMMU using DMA Remapping Devices
      [*]     Support for Shared Virtual Memory with Intel IOMMU
      [*]     Enable Intel DMA Remapping Devices by default
      [*]   Support for Interrupt Remapping

If the kernel has CONFIG_TRIM_UNUSED_KSYMS (Trim unused exported kernel symbols) enabled, then there will be a need to whitelist some symbols. Otherwise, error messages of the form Failed to add group <n> to KVM VFIO device: Invalid argument may occur. See the gentoo forum thread kernel 4.7.0 breaks pci passthrough [SOLVED] and the kvm mailing list thread KVM/VFIO passthrough not working when TRIM_UNUSED_KSYMS is enabled (list of symbols to whitelist in the second post).

KERNEL
[*] Enable loadable module support --->
    [*]   Trim unused exported kernel symbols
    (/path/to/whitelist) Whitelist of symbols to keep in ksymtab
FILE /path/to/whitelist
vfio_group_get_external_user
vfio_external_group_match_file
vfio_group_put_external_user
vfio_group_set_kvm
vfio_external_check_extension
vfio_external_user_iommu_id
mdev_get_iommu_device
mdev_bus_type

Rebuild the kernel.

GRUB bootloader

When using GRUB as the secondary bootloader, IOMMU will need to be enabled by modifying kernel's commandline parameters. Edit the /etc/default/grub file and add the following values to the GRUB_CMDLINE_LINUX variable:

FILE /etc/default/grub
GRUB_CMDLINE_LINUX="... iommu=pt intel_iommu=on pcie_acs_override=downstream,multifunction ..."
Note
If the system hangs after rebooting, check the BIOS and IOMMU settings.

Apply changes:

root #grub-mkconfig -o /boot/grub/grub.cfg

Verify IOMMU has been enabled and is operational:

user $dmesg | grep 'IOMMU enabled'
[    0.000000] DMAR: IOMMU enabled
Note
For CPU on XEN architecture, run:
user $lspci -vv | grep -i 'Access Control Services'

IOMMU groups

Passing through PCI or VGA devices requires you to pass through all devices within an IOMMU group. The exception to this rule is PCI root devices that reside in the same IOMMU group with the device(s) we want to pass through. These root devices cannot be passed through as they often perform important tasks for the host. A number of (Intel) CPUs, usually consumer-grade CPUs with integrated graphics (IGD), share a root device in the same IOMMU group as the first PCIe 16x slot.

user $for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU Group %s ' "$n"; lspci -nns "${d##*/}"; done;
...

IOMMU Group 13 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)

IOMMU Group 15 02:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100] [1002:67c4]

IOMMU Group 16 02:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0]

...

Nvidia in IOMMU Group 13 and AMD Video Card in IOMMU group 15 and 16. Everything looks fine. But if you have buggy IOMMU support and all devices within one IOMMU group, hardware can't guarantee good device isolation. Unfortunately, it is not possible to fix that. The only workaround is to use ACS override patch which ignores the IOMMU hardware check. See ACS override patch (Arch Wiki).

VFIO

Kernel drivers:

KERNEL
Device Drivers --->
  <M> VFIO Non-Privileged userpsace driver framework --->
      [*]   VFIO No-IOMMU support ----
      <M>   VFIO support for PCI devices
      [*]     VFIO PCI support for VGA devices
      < >   Mediated device driver framework

Search for VGA card IDs. Run:

root #lspci -nn
...
04:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c1)
04:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:aaf8]
..


Add VGA PCI IDs to VFIO

FILE /etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:687f,1002:aaf8

Libvirt

Windows

Create a Windows 10 VM as per usual with virt-manager. Edit it, click Add Hardware, select AMD/ATI Vega 64 and AMD/ATI Device. Click Apply. Now boot it.

AMD GPUs have two devices on the PCIe bus, one with video and one with audio output. Windows drivers will only work if both of them are passed to Windows.

Sound

root #mkdir /home/qemu
root #cp /home/<user>/.config/pulse /home/qemu
root #chown qemu:qemu -R /home/qemu

Change the home directory for the qemu user:

root #usermod -d /home/qemu qemu

Input Devices

One of the easiest ways of dealing with mouse and keyboard issues when using passthrough is through evdev proxy. This allows the ability to switch the mouse and keyboard between the guest and host with special key combinations. First, identify the mouse and keyboard in /dev/input. The easiest way to do this is through the symlink found in /dev/input/by-id/.

user $ls -l /dev/input/by-id/*-event-{k,m}*

This a list of symlinks to event devices limited to mouse and keyboard entries. In order to access these nodes, either add the user Qemu runs as in the input group or, if using libvirt, edit /etc/libvirt/qemu.conf looking for

FILE /etc/libvirt/qemu.conf
cgroup_device_acl = [ 
...
]

Add the symlinks and then restart libvirtd. Next, edit the XML libvirt uses for the domain. Do this by either through virsh or using virt-manager. With virt-manager, select the XML tab in the Overview option at the top of the device tree. With virsh, enter interactive:

user $virsh --connect qemu:///system
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit
virsh #list --all
virsh #edit $DOMAIN

Within the XML tree under the <devices> node, add the following lines

CODE
    <input type="evdev">
      <source dev="/dev/input/by-id/$YOURMOUSE-event-mouse"/>
    </input>
    <input type="evdev">
      <source dev="/dev/input/by-id/$YOURKEYBOARD-event-kbd" grab="all" repeat="on"/>
    </input>

By default, the key combination to change input between host and guest is both Ctrl keys. If multiple GPUs have been passed through to multiple VMs, use the grabToggle argument to change the combination to a fixed set of key combinations that can be found in the Libvirt documentation.

QEMU

In case someone wants to use QEMU directly, here are some configurations to get started. In general, as a typical QEMU call will usually require many command-line flags, it is typically advised to place the QEMU call in a bash script and to run it that way. Don't forget to make the script file executable!

Minimal

This minimal configuration will simply boot into the BIOS - there aren't any drives connected, so there is nothing else for QEMU to do. However, this allows us to verify that the GPU passthrough is actually working.

FILE MinimalPassthrough.sh
#!/bin/bash

virsh nodedev-detach pci_0000_09_00_0
virsh nodedev-detach pci_0000_09_00_1
qemu-system-x86_64 \
    -nodefaults \
    -enable-kvm \
    -cpu host,kvm=off \
    -m 8G \
    -name "BlankVM" \
    -smp cores=4 \
    -device pcie-root-port,id=pcie.1,bus=pcie.0,addr=1c.0,slot=1,chassis=1,multifunction=on \
    -device vfio-pci,host=09:00.0,bus=pcie.1,addr=00.0,x-vga=on,multifunction=on,romfile=GP107_patched.rom \
    -device vfio-pci,host=09:00.1,bus=pcie.1,addr=00.1 \
    -monitor stdio \
    -nographic \
    -vga none \
    $@

virsh nodedev-reattach pci_0000_09_00_0
virsh nodedev-reattach pci_0000_09_00_1

Here's an explanation of each line:

  1. -nodefaults stops qemu from creating some default devices. Specifically, it creates a VGA device by default, which interferes with our attempt to pass through the video card (in a multi-video card host system this may not be an issue)
  2. -enable-kvm enables acceleration
  3. -cpu host, kvm=off \ this makes the virtual machine match the CPU architecture of the host. kvm=off hides the KVM signature from the guest.
  4. -m 8G give the guest 8 gigabytes of RAM
  5. -name "BlankVM" Gives the virtual machine a name
  6. -smp cores=4 how many cores the guest should have.
  7. -device pcie-root-port,id=pcie.1... a dedicate root port other than pcie.0 is required by amd gpu for windows driver
  8. -device vfio-pci,host=09:00.0... add a device using vfio-pci kernel module, from the host's address "09:00.0"
  9. ...addr=.. video must on .0 and audio on .1 while both video and audio must be on the same pci-root-port other than pcie.0
  10. ...x-vga=on this is an option for the vfio-pci module (citation needed)
  11. ...multifunction=on since our card is doing both audio and video, it needs multifunction
  12. ...romfile=GP107_patched.rom due to known issues on NVIDIA cards, it may be necessary to use a modified vbios. This is how you make qemu use that modified vbios.
  13. -device vfio-pci,host=09:00.1 just like above - this is the audio device that is in the same IOMMU group as the video device.
  14. -monitor stdio this will drop you into a qemu "command line" (they call it a monitor) once you launch the VM, allowing you to do things.
  15. -vga none this is probably redundant.

As noted above, there are certain known issues with NVIDIA drivers. I used this tool to patch my vbios, after first downloading my vbios in windows 10 using this gpuz tool.

Linux Guest

Here is a slightly more complicated qemu call, that actually loads a Gentoo VM.

FILE GentooPassthrough.sh
#!/bin/bash

exec qemu-system-x86_64 \
    -nodefaults \
    -enable-kvm \
    -cpu host,kvm=off,hv_vendor_id=1234567890ab \
    -m 8G \
    -name "Gentoo VM" \
    -smp cores=4 \
    -boot order=d \
    -drive file=Gentoo_VM.img,if=virtio \
    -monitor stdio \
    -serial none \
    -net nic \
    -net user,hostfwd=tcp::50000-:22,hostfwd=tcp::50001-:5900,hostname=gentoo_qemu \
    -nographic \
    -vga none \
    -device vfio-pci,host=09:00.0,x-vga=on,multifunction=on,romfile=GP107_patched.rom \
    -device vfio-pci,host=09:00.1 \
    -usb \
    -device usb-host,vendorid=0x1532,productid=0x0101,id=mouse \
    -device usb-host,vendorid=0x04f2,productid=0x0833,id=keyboard \
    $@

Here is an explanation of the new configuration options:

  1. ...hv_vendor_id=... despite the patched vbios, the NVIDIA driver still recognized that it is being run in a virtual machine and refuses to load. This "spoofs" the vendor id (somewhere) and tricks the driver
  2. -boot order=d boot the hard drive first
  3. -drive file=Gentoo_VM.img,if=virtio this is a drive that is emulated in the VM. The "Gentoo_VM.img" file is a qcow QEMU-style virtual drive file.
  4. -serial none May no longer be required.
  5. -net nic create a Ethernet in the guest vm
  6. -net user,hostfwd... forwards the ports from host 50000 and 50001 to the guest ports 22 and 5900. Now, from the host, you can ssh into the guest using `ssh -p 50000 myuser@127.0.0.1`, and if a vnc server running in the guest on port 5900, it can access it using port 50001 in the host
  7. -nographic this may not be needed if you have a dedicated graphics card for the guest
  8. -usb emulate a USB device on the guest
  9. -device usb-host,... these two lines forward the keyboard and mouse from the host to the guest. The vendorid and productid can be found using lsusb in the host.

Please note that without the `hv_vendor_id` portion, a user can boot in and use the console in the guest with the forwarded graphics card. But whenever X is launch, which initialized the proprietary NVIDIA driver, it will fail.


The following example does not work with latest nvidia-drivers. The latest driver where it works for me is NVIDIA-Linux-x86_64-470.63.01.run. Here is a little variation of the above qemu script for Gentoo host and Gentoo guest. It uses separate CPUs for the guest. Works on a notebook with Ryzen CPU, where the 2nd NVIDIA GPU is passed through to the guest. The guest runs the NVIDIA driver. Installation is performed according to the Gentoo installation guide using UEFI and a GPT partition table. It uses no custom ROMs.

FILE gentooPassthrough.sh
#!/bin/bash

name=genpass
pid="${$}"
cpus="8-15"
ncpus=8
cgrouprootfs="/sys/fs/cgroup"
cgroupfs="${cgrouprootfs}/${name}"

echo "PID: ${pid}"

# using separate CPUs for VM
# cgroup usage see https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
# 'lscpu -e' to see which cpus to use
echo "+cpuset" > ${cgrouprootfs}/cgroup.subtree_control
mkdir -p ${cgroupfs}
echo ${cpus} > ${cgroupfs}/cpuset.cpus
echo "root" > ${cgroupfs}/cpuset.cpus.partition
echo "${pid}" > ${cgroupfs}/cgroup.procs

# setting performance governor for QEMU CPUs
for i in `seq 8 15` ; do
  echo performance >/sys/devices/system/cpu/cpu${i}/cpufreq/scaling_governor
done

qemu-system-x86_64 \
    -M q35 \
    -monitor stdio \
    -bios /usr/share/edk2-ovmf/OVMF_CODE.fd \
    -accel kvm,kernel-irqchip=on \
    -cpu host,kvm=off \
    -smp ${ncpus} \
    -m 4G \
    -name "${name}" \
    -device vfio-pci,host=01:00.0,multifunction=on \
    -device vfio-pci,host=01:00.1 \
    -nographic \
    -vga none \
    -serial none \
    -parallel none \
    -hda hda.qcow2 \
    -usb \
    -device usb-host,vendorid=0x046D,productid=0xC52B \
    $@ 

# removing cgroup cpuset
echo "${pid}" > ${cgrouprootfs}/cgroup.procs
rmdir ${cgroupfs}

# setting schedutil governor for qemu cpus
for i in `seq 8 15` ; do
  echo schedutil >/sys/devices/system/cpu/cpu${i}/cpufreq/scaling_governor
done

The kernel of the Gentoo host has been build with genkernel --virtio all. The NVIDIA GPU has been bound to vfio-pci with /etc/modprobe.d/local.conf on the host:

FILE /etc/modprobe.d/local.conf
alias pci:v000010DEd00001F95sv0000103Csd000087B2bc03sc00i00 vfio-pci
alias pci:v000010DEd000010FAsv0000103Csd000087B2bc04sc03i00 vfio-pci
options vfio-pci ids=10de:1f95,10de:10fa

This way the internal graphic of the Ryzen processor shows the host on the laptop display, Gentoo guest is displayed on the monitor connected to the HDMI of the NVIDIA graphic. To get sound in the VM, i have to replug the HDMI cable after the VM has booted. Maybe this issue is related to the HDMI cable or the external monitor.

Using Multiple Monitors

An example setup is:

980ti -> Gentoo Host
1650 -> Kali VM
3090 -> Windows 10 VM

This example uses six displays and often want to rotate between guests. If the monitors are able to auto switch to the active link then this will work. For example, to turn off the main display for Linux and switch to Windows use:

xrandr --output $DISPLAY --off

If using a WM like i3, setting the hotkey that to $mod4+shift+k. On Windows then it is possible to use the presentation settings to make the change back.

<windows-key> + p, set secondary monitor

See also

  • QEMU — a generic, open source hardware emulator and virtualization suite.
  • Arch Wiki PCI Passthrough via OVMF - Useful page to learn some of the more advanced configurations a VM can be set with.

External resources