User:Egberts/Drafts/QEMU

From Gentoo Wiki
Jump to:navigation Jump to:search
Note
ROUGH DRAFT: seeded by LXC page (outline looks more comprehensive than current QEMU

QEMU (Quick EMUlator) is a generic, open source hardware emulator and virtualization suite. Often it is used in conjunction with acceleration in the form of a Type-I hypervisor such as KVM (Kernel-based Virtual Machine) or Xen. If no accelerator is used, QEMU will run entirely in user-space using its built in binary translator TCG (Tiny Code Generator). Using QEMU without an accelerator is relatively inefficient and slow.

Note
This article typically uses KVM as the accelerator of choice due to its GPL licensing and availability. Without KVM nearly all commands described here will still work (unless KVM specific).

Concepts

Virtualization concepts

para-virtualization (emulated microcode)

Full virtualization (direct machine code execution)

QEMU/KVM is a full virtualization.

Limitations of QEMU

Security concerns

QEMU components

Control groups

POSIX file capabilities

Host setup

This section details the QEMU setup of the Linux kernel for a host that is using Gentoo OS on either a x86_64 or an amd64 CPU hardware platform.

Note
Consult distro-specific for their setup of a QEMU host, if the host is not installed with Gentoo Linux.

BIOS and UEFI firmware

In order to utilize KVM either Vt-x (vmx) or AMD-V (svm) must be supported by the processor. Vt-x or AMD-V are Intel and AMD's respective technologies for permitting multiple operating systems to concurrently execute operations on the processors.

To inspect hardware for virtualization support issue the following command:

user $grep --color -E "vmx|svm" /proc/cpuinfo

For a period manufacturers were shipping with virtualization turned off by default in the system BIOS. Note that changing this feature in the BIOS may actually require full removal of power from the system to take effect. If restarting the system does not work try shutting down, unplugging the system and pressing the power button in an unplugged state to discharge any residual energy from the power supply unit (PSU). Reapply power to the system to verify success.

If KVM support is available there should be a "kvm" device listed at /dev/kvm. This will take effect after the system has booted to a KVM enabled kernel.

Kernel

Described below are the basic requirements for KVM kernel configuration for the host OS. A more complete and up-to-date list can be found at the KVM Tuning Kernel page.

Kernel options required for QEMU

Note
Different guest (virtualized) OS may require additional kernel options. These are covered in the corresponding #Usage section pages.
KERNEL Enable high resolution timer support (CONFIG_HIGH_RES_TIMERS)
General setup  --->
    Timers subsystem  --->
        <*>   High Resolution Timer Support
KERNEL Enable KVM Support (CONFIG_KVM)
[*] Virtualization  --->
    <*>   Kernel-based Virtual Machine (KVM) support
Note
This includes support for ARM64 processors.
Physical CPU Processor Support - Host

For the QEMU host, choose one of the two CPU architectures found on its hardware platform:

KERNEL Enable KVM support for Intel processors (CONFIG_KVM_INTEL)
[*] Virtualization  --->
    <*>   KVM for Intel processors support

or

KERNEL Enable KVM support for AMD processors (CONFIG_KVM_AMD)
[*] Virtualization  --->
    <*>   KVM for AMD processors support
Warning
If both "KVM for Intel processors support" and "KVM for AMD processors support" are set as built into the kernel (*) an error message will appear from kprint from early boot. Since the system has only one type processor (Intel or AMD) enabling one or both options as modules (M) will make the error message disappear.
Virtual CPU Processor Support - Guest(s)

For a list of supporting CPUs that the guest platform(s) on this host should support, consult the [QEMU use flags] and insert desire target(s) into the USE= string in the portage configuration /etc/portage/make.conf file.

General options
Scheduling options
Memory/swap accounting
CPU accounting
Networking options
KERNEL libvirt (CONFIG_BRIDGE_EBT_MARK, CONFIG_NETFILTER_ADVANCED, CONFIG_NETFILTER_XT_CONNMARK, CONFIG_NETFILTER_XT_TARGET_CHECKSUM, CONFIG_IP6_NF_NAT)
[*] Networking support
    Networking Options  --->
        [*] Network packet filtering framework (Netfilter)  --->
            [*] Advanced netfilter configuration
            Core Netfilter Configuration  --->
                <*> "conntrack" connection tracking match support
                <*> CHECKSUM target support
            IPv6: Netfilter Configuration  --->
                <*> ip6tables NAT support
                
            <*> Ethernet Bridge tables (ebtables) support  --->
                <*> ebt: nat table support
                <*> ebt: mark filter support
        [*] QoS and/or fair queueing  --->
            <*> Hierarchical Token Bucket (HTB)
            <*> Stochastic Fairness Queueing (SFQ)
            <*> Ingress/classifier-action Qdisc
            <*> Netfilter mark (FW)
            <*> Universal 32bit comparisons w/ hashing (U32)
            [*] Actions
            <*>    Traffic Policing

Handling Kernel Config at CLI

To set the various kernel configuration settings from the command lines, the linux/scripts/kconfig/merge_config.sh shall be used here:

Mandatory kernel configuration options to set:

FILE /usr/src/kernel-kconfig-qemu-host.config
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=y
CONFIG_KVM_INTEL=y
CONFIG_KVM_AMD=y
root #cd /usr/src/linux
root #scripts/kconfig/merge_config.sh /usr/src/kernel-kconfig-qemu-host.config

Useful kernel configuration options to use:

FILE /usr/src/kernel-kconfig-qemu-host-optional.config
CONFIG_VHOST_NET=y
CONFIG_HIGH_RES_TIMER=y
CONFIG_HPET=y
CONFIG_COMPACTION=y
CONFIG_MIGRATION=y
CONFIG_KSM=y
CONFIG_SYSFS=y
CONFIG_PROCFS=y
CONFIG_HUGEPAGE=y
CONFIG_CGROUPS=y
root #scripts/kconfig/merge_config.sh /usr/src/kernel-kconfig-qemu-host-optional.config

Accelerated networking, required for vhost-net USE flag (recommend):

KERNEL vhost-net kernel 5.7 and later (CONFIG_VHOST_NET)
Device Drivers  --->
    [*] VHOST drivers  --->
        <*>   Host kernel accelerator for virtio net
KERNEL vhost-net (before kernel 5.7)
[*] Virtualization --->
    <*>   Host kernel accelerator for virtio net
KERNEL Optional advanced networking support (CONFIG_NET_CORE, CONFIG_TUN)
Device Drivers  --->
    [*] Network device support  --->
        [*]   Network core driver support
        <*>   Universal TUN/TAP device driver support

Needed for 802.1d Ethernet bridging:

KERNEL Enabling 802.1d Ethernet Bridging support (CONFIG_IPV6, CONFIG_BRIDGE)
[*] Networking support  --->
        Networking options  --->
            <*> The IPv6 protocol
            <*> 802.1d Ethernet Bridging


Intel VT-g (integrated graphics adapter virtualization)

Mediated device passthrough for Intel GPUs (Broadwell and newer) [1].

KERNEL Intel VT-g (CONFIG_VFIO_MDEV, CONFIG_DRM_I915_GVT, CONFIG_DRM_I915_GVT_KVMGT)
Device Drivers  --->
        <*> VFIO Non-Privileged userspace driver framework
            <*> Mediated device driver framework
        Graphics Support  --->
            <*> Intel 8xx/9xx/G3x/G4x/HD Graphics
                [*] Enable Intel GVT-g graphics virtualization host support
                <*>   Enable KVM/VFIO support for Intel GVT-g

QEMU userspace utilities

Mounted cgroup filesystem

Network configuration

Important
If a QEMU front-end is to be used (instead of /usr/bin/virsh), disregard the rest of this Network configuration section and consult the QEMU front-ends wiki page for a network configuration to be maintained by this desired QEMU front-end.

Simple network configuration

Host configuration for VLANs inside the bridge which are connected to container's virtual Ethernet pair device

Host configuration with NAT networking (nftables)

Host configuration with NAT networking (iptables)

Guest configuration for a virtual Ethernet pair device connected by bridge

Adjusting guest config of the container after using template script

USE flags

Some packages are aware of the qemu USE flag.

Review the possible USE flags for QEMU:

USE flags for app-emulation/qemu QEMU + Kernel-based Virtual Machine userland tools

accessibility Adds support for braille displays using brltty
aio Enables support for Linux's Async IO
alsa Enable alsa output for sound emulation
bpf Enable eBPF support for RSS implementation.
bzip2 Use the bzlib compression library
caps Use Linux capabilities library to control privilege
capstone Enable disassembly support with dev-libs/capstone
curl Support ISOs / -cdrom directives via HTTP or HTTPS.
debug Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
doc Add extra documentation (API, Javadoc, etc). It is recommended to enable per package instead of globally
fdt Enables firmware device tree support
filecaps Use Linux file capabilities to control privilege rather than set*id (this is orthogonal to USE=caps which uses capabilities at runtime e.g. libcap)
fuse Enables FUSE block device export
glusterfs Enables GlusterFS cluster fileystem via sys-cluster/glusterfs
gnutls Enable TLS support for the VNC console server. For 1.4 and newer this also enables WebSocket support. For 2.0 through 2.3 also enables disk quorum support.
gtk Add support for x11-libs/gtk+ (The GIMP Toolkit)
infiniband Enable Infiniband RDMA transport support
io-uring Enable efficient I/O via sys-libs/liburing.
iscsi Enable direct iSCSI support via net-libs/libiscsi instead of indirectly via the Linux block layer that sys-block/open-iscsi does.
jack Add support for the JACK Audio Connection Kit
jemalloc Enable jemalloc allocator support
jpeg Enable jpeg image support for the VNC console server
lzo Enable support for lzo compression
multipath Enable multipath persistent reservation passthrough via sys-fs/multipath-tools.
ncurses Enable the ncurses-based console
nfs Enable NFS support
nls Add Native Language Support (using gettextGNU locale utilities)
numa Enable NUMA support
opengl Add support for OpenGL (3D graphics)
oss Add support for OSS (Open Sound System)
pam Add support for PAM (Pluggable Authentication Modules)DANGEROUS to arbitrarily flip
pin-upstream-blobs Pin the versions of BIOS firmware to the version included in the upstream release. This is needed to sanely support migration/suspend/resume/snapshotting/etc... of instances. When the blobs are different, random corruption/bugs/crashes/etc... may be observed.
plugins Enable qemu plugin API via shared library loading.
png Enable png image support for the VNC console server
pulseaudio Enable pulseaudio output for sound emulation
python Add optional support/bindings for the Python language
rbd Enable rados block device backend support, see https://docs.ceph.com/en/mimic/rbd/qemu-rbd/
sasl Add support for the Simple Authentication and Security Layer
sdl Enable the SDL-based console
sdl-image SDL Image support for icons
seccomp Enable seccomp (secure computing mode) to perform system call filtering at runtime to increase security of programs
selinux !!internal use only!! Security Enhanced Linux support, this must be set by the selinux profile or breakage will occur
slirp Enable TCP/IP in hypervisor via net-libs/libslirp
smartcard Enable smartcard support
snappy Enable support for Snappy compression (as implemented in app-arch/snappy)
spice Enable Spice protocol support via app-emulation/spice
ssh Enable SSH based block device support via net-libs/libssh2
static Build the User and Software MMU (system) targets as well as tools as static binaries
static-user Build the User targets as static binaries
systemtap Enable SystemTAP/DTrace tracing
test Enable dependencies and/or preparations necessary to run tests (usually controlled by FEATURES=test but can be toggled independently)
udev Enable virtual/udev integration (device discovery, power and storage device support, etc)
usb Enable USB passthrough via dev-libs/libusb
usbredir Use sys-apps/usbredir to redirect USB devices to another machine over TCP
vde Enable VDE-based networking
vhost-net Enable accelerated networking using vhost-net, see https://www.linux-kvm.org/page/VhostNet
vhost-user-fs Enable shared file system access using the FUSE protocol carried over virtio.
virgl Enable experimental Virgil 3d (virtual software GPU)
virtfs Enable VirtFS via virtio-9p-pci / fsdev. See https://wiki.qemu.org/Documentation/9psetup
vnc Enable VNC (remote desktop viewer) support
vte Enable terminal support (x11-libs/vte) in the GTK+ interface
xattr Add support for getting and setting POSIX extended attributes, through sys-apps/attr. Requisite for the virtfs backend.
xen Enables support for Xen backends
zstd Enable support for ZSTD compression

Note
More than one USE flag (gtk, ncurses, sdl, or spice) can be enabled for graphical output. If graphics are desired it is generally recommended to enable more than one graphical USE flag.
Note
If virt-manager is going to be used, be sure to enable the usbredir and spice USE flags on the qemu package for correct operation.
USE_EXPAND

Additional ebuild configuration frobs are provided as the USE_EXPAND variables QEMU_USER_TARGETS and QEMU_SOFTMMU_TARGETS. See app-emulation/qemu for a list of all the available targets (there are a heck of a lot of them; most of them are very obscure and may be ignored; leaving these variables at their default values will disable almost everything which is probably just fine for most users).

For each target specified, a qemu executable will be built. A softmmu target is the standard qemu use-case of emulating an entire system (like VirtualBox or VMWare, but with optional support for emulating CPU hardware along with peripherals). user targets execute user-mode code only; the (somewhat shockingly ambitious) purpose of these targets is to "magically" allow importing user-space linux ELF binaries from a different architecture into the native system (that is, they are like multilib, without the awkward need for a software stack or CPU capable of running it).

In order to enable QEMU_USER_TARGETS and QEMU_SOFTMMU_TARGETS we can edit the variables globally in /etc/portage/make.conf, i.e.:

FILE /etc/portage/make.conf
QEMU_SOFTMMU_TARGETS="arm x86_64 sparc"
QEMU_USER_TARGETS="x86_64"

Or, the /etc/portage/package.use file(s) can be modified. Two equivalent syntaxes are available: traditional USE flag syntax, i.e.:

FILE /etc/portage/package.use
app-emulation/qemu qemu_softmmu_targets_arm qemu_softmmu_targets_x86_64 qemu_softmmu_targets_sparc
app-emulation/qemu qemu_user_targets_x86_64

Another alternative is to use the newer sexy USE_EXPAND-specific syntax:

FILE /etc/portage/package.use
app-emulation/qemu QEMU_SOFTMMU_TARGETS: arm x86_64 sparc QEMU_USER_TARGETS: x86_64


Install QEMU host

After reviewing and adding any desired USE flags, emerge app-emulation/qemu:

root #emerge --ask app-emulation/qemu

Guest setup

Note
We could take this entire 'Guest setup' section and put it under a new QEMU/guest wiki group but a trend has already been established as QEMU/Linux guest, QEMU/Windows guest, ...

Kernel Configuration

As a demonstration of completeness of configuration settings for the Linux kernel, the kernel configuration file will be initialized to tinyconfig which is the smallest buildable Linux kernel (but not necessarily the bootable or functional one).

Important
Backup your /usr/src/linux/.config, if needed

Now blow away the .config with the smallest default setting possible, execute:

root #cd /usr/src/linux make tinyconfig

Template scripts

Gentoo

Automatic setup: QEMU standard Gentoo template script

Automatic setup: qemu-gentoo

Other distributions

Using the guest OS/VM

Manual use

Use from Gentoo init system

Use from Gentoo systemd

Accessing the guest

qemu-console

qemu-attach

Accessing the container with sshd

Filesystem layout

Unprivileged containers

Prerequisites

QEMU pre-built images

Configuring unprivileged QEMU

Create user namespace using systemd

Create user namespace manually (no systemd)

OpenRC configuration pre-check
Manage namespaces by libcgroup (cgroupv2)
Validate configuration

Create container example

Troubleshooting

Containters freeze on 'qemu stop' with OpenRC

newuidmap error

Could not set clone_children to 1 for cpuset hierarchy in parent cgroup

tee: /sys/fs/cgroup/memory/memory.use_hierarchy: Device or resource busy

qemu-console: no promt / console (may be a temporary as of 2019-10)

See also

External resources

References