Kernel/Optimization

From Gentoo Wiki
< Kernel
Jump to:navigation Jump to:search
Article status
This article has some todo items:
  • Cover LLVM PGO

This article describes various optimizations for the Linux kernel.

Prerequisites

Warning
Many optimization methods described here MAY break the kernel in unexpected ways, including slower kernel runtime.
Note
Some CONFIG options require other CONFIG options to be set/unset, including if the architecture/compiler supports such CONFIG option. A optimization method describe here may need to sacrifice other kinds of optimization e.g enabling register zeroing on function exit increases hardening at the cost of performance.

This article assumes /usr/src/linux is the symbolic link to the current kernel. Change directory to /usr/src/linux before continuing:

user $cd /usr/src/linux

One way to optimize the kernel is to remove what users don't need. For example, if not using KVM, then remove CONFIG_KVM:

KERNEL Disable KVM (CONFIG_KVM) support
Virtualization --->
  < >   Kernel-based Virtual Machine (KVM) support

Kbuild

The Kernel build system can be used to change how Kernel builds in a more advanced way than make *config, similar to GNU Make. Kbuild also support Environment Variables like LLVM=1. For example, the kernel will be build with LLVM and with aggressive optimization flags:

root #make LLVM=1 KCFLAGS="-O3 -march=native -pipe"

Clang/LLVM

Warning
DO NOT MIX GNU binutils and LLVM binutils for non advanced usage!!! For example make CC=gcc LD=ld.lld AR=llvm-ar will not work because LLVM's ar and ld is not compatible with GCC.

Make sure the LLVM toolchain is installed before proceeding:

user $emerge --pretend --noreplace sys-devel/clang sys-devel/llvm sys-libs/compiler-rt sys-libs/llvm-libunwind sys-devel/lld

By default, the kernel is build under GNU binutils. The following environment variables are used: CC, LD, AR, NM, STRIP, OBJCOPY, OBJDUMP, READELF, HOSTCC, HOSTCXX, HOSTAR, and HOSTLD. But instead, the kernel may be build using LLVM binutils:

root #make LLVM=1 LLVM_IAS=1

The same but more verbose command:

root #make CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm STRIP=llvm-strip OBJCOPY=llvm-objcopy OBJDUMP=llvm-objdump READELF=llvm-readelf HOSTCC=clang HOSTCXX=clang++ HOSTAR=llvm-ar HOSTLD=ld.lld LLVM_IAS=1

For more information, see this link.

*FLAGS

Note
When Rust code gets to Linux kernel, mention RUSTFLAGS. For more list of *FLAGS to play with, see the GCC manual and Clang manual or man 1 gcc and man 1 clang commands.

By default, most of the kernel is build with C's -O2 (some code, like Random Number Generation, does not work with optimizations and sometimes checked with the C macro __OPTIMIZE__). This can be changed via Kbuild mention earlier. Before making any KCFLAGS and similar flags, please check the kernel's Makefiles before it gets any changes. For example, -fallow-store-data-races is disabled on this Makefile.

This section will list what to do with some flags:

*FLAGS type *FLAGS Compiler What to do
CFLAGS -O3 GCC/Clang Usually safe to do in non serious systems. There was a attempt to officially add it to the kernel but Linus Torvalds reject it due to -O3 usually outputting worse code than -O2 and "should have in-depth actual performance numbers for a real load". Phoronix ran a -O3 kernel benchmark and found nearly all tested programs to have no measurable benefit.
CFLAGS -Ofast GCC/Clang Don't use as they have no effect on the kernel. -Ofast mainly enables optimizations concerning floating point, which the kernel doesn't have.
CFLAGS -flto GCC/Clang LTO memory usage may surpass RAM for 32 bit systems, so it may need to be disabled. Except Clang's ThinLTO, the whole kernel will be recompiled if at least one CONFIG option change. See Clang LTO and GCC LTO for more information.

Performance

Performance means how fast the kernel runs.

GCC LTO

Enabling Link Time Optimization is not simple as make KCFLAGS="-flto". Andi Kleen and others has a experimental patches for this and will used to apply GCC LTO. For more information, see LWN article.

There are many ways to apply the LKML patch but only one will covered. To install, download the following 2 patches:

Important
Read the entire patches before proceeding.

Then, git apply the patch and update the .config file:

root #git apply gcc-lto.patch
root #git apply gcc-lto-no-pie.patch
root #make oldconfig
Link Time Optimization (LTO)
> 1. None (LTO_NONE)
  2. gcc LTO (LTO_GCC) (NEW)
choice[1-2?]: 2
Allow aggressive cloning for function specialization (LTO_CP_CLONE) [N/y/?] (NEW) n

To remove the patch:

root #git apply gcc-lto.patch --reverse
root #git apply gcc-lto-no-pie.patch --reverse

Clang LTO

Clang's Link Time Optimization can be either FullLTO or ThinLTO for 5.12+ Linux kernel:

KERNEL Enable Clang's LTO (CONFIG_LTO_CLANG_FULL and CONFIG_LTO_CLANG_THIN) support
General architecture-dependent options --->
  Link Time Optimization (LTO) (Clang ThinLTO (EXPERIMENTAL)) --->
    ( ) None
    ( ) Clang Full LTO (EXPERIMENTAL)
    (X) Clang ThinLTO (EXPERIMENTAL)

The difference between these two are that ThinLTO compiles faster due to parallelization at the cost of performance.

GCC PGO

Note
See these 3 links for more information:

Yuan-ApSys-14, Yuan-APSys-15, and

Yuan-ScienceChina-18.

To use Profile-Guided Optimization, activate debugfs and gcov support (See this for modern info):

KERNEL Enable debugfs (CONFIG_DEBUG_FS) and gcov (CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL) support
Kernel hacking --->
  Generic Kernel Debugging Instruments --->
    [*] Debug Filesystem
General architecture-dependent options --->
  GCOV-based kernel profiling --->
    [*] Enable gcov-based kernel profiling
    [*] Profile entire Kernel

The environment variable CFLAGS_GCOV, used when CONFIG_GCOV_KERNEL is on, defaults to -fprofile-arcs -ftest-coverage, but can be changed to -fprofile-generate -ftest-coverage or similar in Instrumentation Options:

root #make CFLAGS_GCOV="-fprofile-generate -ftest-coverage"

Then build as usual, setup the kernel and reboot the system using the command:

root #reboot
Important
The kernel will run slower and increase in size because the kernel has been instrumented to collect data like how many times a line of code executes. This will be necessary to build the PGO kernel.

After booted back to system, run the system with many programs: play sound, game, run Firefox and so on. The longer the system is run and with more different programs, the higher instrumented data gets. When satisfied with the instrumented data, copy /sys/kernel/debug/gcov/usr/src/linux/*gcda files to /usr/src/linux:

root #cd /sys/kernel/debug/gcov/usr/src/linux
root #find . -name '*.gcda' -exec cp {} /usr/src/linux/{} \;

Then disable CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL and edit the KCFLAGS:

KERNEL Disable gcov (CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL) support
General architecture-dependent options --->
  GCOV-based kernel profiling --->
    [ ] Enable gcov-based kernel profiling
    [ ] Profile entire Kernel
root #make KCFLAGS="-fprofile-use -fprofile-correction -Wno-error=missing-profile -Wno-error=coverage-mismatch"

Like before, setup the kernel and finally reboot. To remove /usr/src/linux/*gcda files, run the command:

root #cd /usr/src/linux
root #find . -name '*.gcda' -exec rm {} \;

Hardened

Hardening refers to reducing the potential for malware to damage the system.

Important
Removing module support (CONFIG_MODULES) prevents the kernel from loading code at runtime but many drivers will not work without module. The alternative is to use only signed modules.
KERNEL Enable hardening
Processor type and features  --->
  [*]   Randomize the address of the kernel image (KASLR)
Power management and ACPI options  --->
  [ ] Hibernation (aka 'suspend to disk')
Memory Management options  --->
  [ ] Disable heap randomization
Security options --->
  Kernel hardening options --->
    Memory initialization --->
      Initialize kernel stack variables at function entry (zero-init everything (strongest and safest)) --->
        ( ) no automatic stack variable initialization (weakest)
        ( ) pattern-init everything (strongest)
        (X) zero-init everything (strongest and safest)
      [*] Poison kernel stack before returning from syscalls
      [*] Enable heap memory zeroing on allocation by default 
      [*] Enable heap memory zeroing on free by default
      [*] Enable register zeroing on function exit
Kernel hacking  --->
  Memory Debugging  --->
    [*] Debug VM translations

Pietinger, Kicksecure, and Clip OS has more hardened config options to the kernel.

Size

This section describes reducing kernel memory usage (useful for embedded systems).

5.4+ kernel officially support -Os flag:

KERNEL Enable -Os (CONFIG_CC_OPTIMIZE_FOR_SIZE)
General setup --->
  Compiler optimization level (Optimize for performance (-Os)) --->
    ( ) Optimize for performance (-O2)
    (X) Optimize for size (-Os)

-Oz may also be instead use to more aggressively reduce size than -Os:

root #make KCFLAGS="-Oz"

References