Kernel Crash Dumps
This article explains how to capture the kernel crash dumps (also known as kdumps). Kdumps are produced by kernel panic or lockup. To be simple, just a single kernel is used both for the ordinary system and recovery. The described method is almost distribution independent.
Activate the following kernel options:
Processor type and features ---> [*] kexec system call [*] kernel crash dumps [*] Build a relocatable kernel Kernel hacking ---> [*] Kernel debugging Compile-time checks and compiler options ---> [*] Compile the kernel with debug info File systems ---> Pseudo filesystems ---> -*- /proc file system support [*] /proc/vmcore support
CONFIG_PHYSICAL_START might need to be set greater than 2 MB (
0x200000) on some motherboards to offset the kernel's memory space enough to avoid the BIOS clobber. Try setting
0x1000000(16 MB) if the above Kernel options are not working as expected.
emerge --ask sys-apps/kexec-tools
Create /etc/local.d/kdump.start containing:
#!/bin/bash kexec -p /[path-to-kernel] --append="root=[root-device] single irqpoll maxcpus=1 reset_devices"
Your system may require core headers in ELF32 or ELF64 format for the kernel to boot. Check the manpage for details.
When using an initramfs, a reference to it will need passed as a parameter. For example:
#!/bin/bash kexec -p /boot/kernel-genkernel-x86_64-3.16.1-gentoo \ --initrd=/boot/initramfs-genkernel-x86_64-3.16.1-gentoo \ --append="root=/dev/mapper/lvm-slash single irqpoll maxcpus=1 reset_devices dolvm softlevel=kdump"
Now make this file executable:
chmod u+x /etc/local.d/kdump.start
Note the kernel has to be readable. A typical Gentoo configuration leaves /boot unmounted, so either remove noauto from the fstab file or place a copy of the kernel in a place that is mounted during a crash.
crashkernel=64M nokaslr argument to the kernel command-line via the bootloader (most likely GRUB2) for systems with up to around 12 GB of RAM.
nokaslrdisables KASLR security feature. You can omit this option, but then you will have to manually load symbols from all kernel sections in gdb because kernel location is randomized.
First, run the above script:
It loads the rescue kernel image which is run after kernel crash.
Whenever a kernel panic or lockup (hard/soft if the kernel is set to detect them) occurs, kexec runs the kernel in crash mode, relocated to a reserved area of memory. The rest of RAM will be untouched. When the system boots up log in and copy /proc/vmcore to a file - this is the crash dump. Then reboot the system to get back to a normal configuration; the system might not be stable and should not continue to operate in this state.
A kernel panic can be forced on demand by executing the following command (do not forget to save all data, log-out other users, and leave the filesystems in a clean state by the invocation of the sync command before doing this):
echo c | tee /proc/sysrq-trigger
Kernel is not loading
If the kernel is not loading when kexec is called, check to to see if kernel compression was set to xz (lzma) format.
If xz compression is used the sys-apps/kexec-tools package will need to be re-emerged with the
lzma USE flag enabled.
VGA not resetting
After loading a kexec crash kernel and after a kernel panic kexec does not appear to load the crash kernel. The output on the display freezes.
This might be caused by the VGA port not being reset. The solution may be to tell kexec to reset the display output on the VGA port. Something like the following could work (the important options being
kexec -p /boot/kernel-gentoo --initrd=/boot/initramfs-gentoo --reset-vga --console-vga --command-line="root=/dev/sda3 maxcpus=1 irqpoll"