Kernel/Optimization
- Cover LLVM PGO
This article describes various optimizations for the Linux kernel.
Prerequisites
Many optimization methods described here MAY break the kernel in unexpected ways, including slower kernel runtime.
Some CONFIG options require other CONFIG options to be set/unset, including if the architecture/compiler supports such CONFIG option.
This article assumes /usr/src/linux is the symbolic link to the current kernel. Change directory to /usr/src/linux before continuing:
user $
cd /usr/src/linux
One way to optimize the kernel is to remove what users don't need. For example, if not using KVM, then remove CONFIG_KVM:
Virtualization ---> < > Kernel-based Virtual Machine (KVM) support
Kbuild
The Kernel build system can be used to change how Kernel builds in a more advanced way than make *config, similar to GNU Make. Kbuild also support Environment Variables like LLVM=1. For example, the kernel will be build with LLVM and with aggressive optimization flags:
root #
make LLVM=1 KCFLAGS="-O3 -march=native -pipe"
Clang/LLVM
DO NOT MIX GNU binutils and LLVM binutils!!! For example make CC=gcc LD=ld.lld AR=llvm-ar will not work because LLVM's ar and ld is not compatible with GCC.
Make sure the LLVM toolchain is installed before proceeding:
user $
emerge --pretend --noreplace sys-devel/clang sys-devel/llvm sys-libs/compiler-rt sys-libs/llvm-libunwind sys-devel/lld
By default, the kernel is build under GNU binutils. The following environment variables are used: CC, LD, AR, NM, STRIP, OBJCOPY, OBJDUMP, READELF, HOSTCC, HOSTCXX, HOSTAR, and HOSTLD. But instead, the kernel may be build using LLVM binutils:
root #
make LLVM=1
The same but more verbose command:
root #
make CC=clang LD=ld.lld AR=llvm-ar NM=llvm-nm STRIP=llvm-strip OBJCOPY=llvm-objcopy OBJDUMP=llvm-objdump READELF=llvm-readelf HOSTCC=clang HOSTCXX=clang++ HOSTAR=llvm-ar HOSTLD=ld.lld
For more information, see this link.
CFLAGS
By default, most of the kernel is build with -O2
(some code, like Random Number Generation, does not work with optimizations and sometimes checked with the C macro __OPTIMIZE__
).
Performance
GCC LTO
Enabling Link Time Optimization is not simple as make KCFLAGS="-flto=auto", although there is ongoing work from Andi Kleen (LWN article and LKML patch).
CachyOS will be used to LTO the kernel. To install, download https://raw.githubusercontent.com/CachyOS/kernel-patches/master/6.2/misc/gcc-lto/0001-gcc-LTO-support-for-the-kernel.patch, where 6.2 is the user's kernel version:
Read the entire patch before proceeding.
Then, git apply the patch and update the .config file:
root #
git apply gcc-lto.patch
root #
make oldconfig
Link Time Optimization (LTO) > 1. None (LTO_NONE) 2. gcc LTO (LTO_GCC) (NEW) choice[1-2?]: 2 Allow aggressive cloning for function specialization (LTO_CP_CLONE) [N/y/?] (NEW) n
To remove the patch:
root #
git apply gcc-lto.patch --reverse
Clang LTO
Clang's Link Time Optimization can be either FullLTO or ThinLTO for the Linux kernel:
General architecture-dependent options ---> Link Time Optimization (LTO) (Clang ThinLTO (EXPERIMENTAL)) ---> ( ) None ( ) Clang Full LTO (EXPERIMENTAL) (X) Clang ThinLTO (EXPERIMENTAL)
The difference between these two are that ThinLTO compiles faster due to parallelization at the cost of performance.
GCC PGO
To use Profile-Guided Optimization, activate debugfs and gcov support (See this for modern info):
Kernel hacking ---> Generic Kernel Debugging Instruments ---> [*] Debug Filesystem General architecture-dependent options ---> GCOV-based kernel profiling ---> [*] Enable gcov-based kernel profiling [*] Profile entire Kernel
Then build as usual, setup the kernel and reboot the system using the command:
root #
reboot
The kernel will run slower and increase in size because the kernel has been instrumented to collect data like how many times a line of code executes. This will be necessary to build the PGO kernel.
When booted back to system, /sys/kernel/debug/gcov/usr/src/linux should have *.gcda files, which is needed later for -fprofile-use
. Run the system with many programs: play sound, game, run Firefox and so on. The longer the system is run and with more different programs, the higher instrumented data gets. When satisfied with the instrumented data, run this bash script as root [1]:
collect_gcda.sh
Collect *.gcda files#!/bin/bash cd /sys/kernel/debug/gcov/usr/src/linux find . -name '*.gcda' -exec cp {} /usr/src/linux/{} \;
Then disable CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL and edit the KCFLAGS:
General architecture-dependent options ---> GCOV-based kernel profiling ---> [ ] Enable gcov-based kernel profiling [ ] Profile entire Kernel
root #
make KCFLAGS="-fprofile-use -fprofile-correction -Wno-error=missing-profile -Wno-error=coverage-mismatch"
Like before, setup the kernel and finally reboot.
Hardened
Hardening refers to reducing the potential for malware to damage the system.
Generally, the more hardening is enabled, the less performant the kernel is.
Most options can be found under Security options:
Memory Management options ---> [ ] Disable heap randomization Security options ---> Kernel hardening options ---> Memory initialization ---> Initialize kernel stack variables at function entry (zero-init everything (strongest and safest)) ---> ( ) no automatic stack variable initialization (weakest) ( ) pattern-init everything (strongest) (X) zero-init everything (strongest and safest) [*] Poison kernel stack before returning from syscalls [*] Enable heap memory zeroing on allocation by default [*] Enable heap memory zeroing on free by default [*] Enable register zeroing on function exit
Peter B. (Pietinger) has more config options to harden the kernel.
Size
This section describes reducing kernel memory usage (useful for embedded systems).
The kernel officially support -Os
flag:
-Os
(CONFIG_CC_OPTIMIZE_FOR_SIZE)General setup ---> Compiler optimization level (Optimize for performance (-Os)) ---> ( ) Optimize for performance (-O2) (X) Optimize for size (-Os)
-Oz
may also be instead use to more aggressively reduce size than -Os
:
root #
make KCFLAGS="-Oz"
External resources
- Original Linux GCC PGO script part 1
- Original Linux GCC PGO script part 2
- Original Linux GCC PGO script part 3