Kernel Crash Dumps
This article explains how to capture the kernel crash dumps (kdump). Kdumps are produced by kernel panic or lockup. To be simple, just a single kernel is used both for the ordinary system and recovery. The described method is almost distro independent. This article is based on KDump on Gentoo by rich0, and the first version is posted by the author.
Contents |
Installation
Kernel
You need to activate the following kernel options:
Processor type and features --->
[*] kexec system call
[*] kernel crash dumps
[*] Build a relocatable kernel
Kernel hacking --->
[*] Kernel debugging
[*] Compile the kernel with debug info
File systems --->
Pseudo filesystems --->
-*- /proc file system support
[*] /proc/vmcore supportSoftware
Install sys-apps/kexec-tools:
| USE flag | Default | Recommended | Description |
|---|---|---|---|
| lzma | No | Enables support for LZMA compressed kernel images | |
| xen | No | Enable extended xen support | |
| zlib | Yes | Adds support for zlib (de)compression |
root # emerge --ask kexec-toolsConfiguration
local.d script
Create /etc/local.d/kdump.start containing:
#!/bin/bash kexec -p /[path-to-kernel] --append="root=[root-device] single irqpoll maxcpus=1 reset_devices"
Now make this file executable:
root # chmod u+x /etc/local.d/kdump.startNote that your kernel has to be readable. (A typical gentoo config leaves /boot unmounted, so you'll either need to remove noauto from your fstab or place a copy of your kernel elsewhere.)
Bootloader
To the kernel boot option, add crashkernel=64M for up to around 12GB of system RAM.
Usage
First, run the above script.
root # /etc/local.d/kdump.startIt loads the rescue kernel image which is run after kernel crash.
Whenever you get a kernel panic or lockup (hard/soft if the kernel is set to detect them), kexec runs the kernel in crash mode, relocated to a reserved area of memory. The rest of RAM will be untouched. When the system boots up log in and copy /proc/vmcore to a file - this is your crash dump. Then reboot your system to get back to a normal configuration; you shouldn't continue to operate in this state.