Project:Quality Assurance/Backtraces

This guide is meant to provide users with a simple explanation of why a default Gentoo installation does not provide meaningful backtraces and how to set it up to get them.

What are backtraces?
A backtrace (sometimes also called bt, trace, or stack trace) is a human readable report of the calling stack of a program. It tells you at which point of a program you are and how you reached that point through all the functions up to (at least in theory). Backtraces are usually analyzed when error conditions such as segmentation faults or aborts are reached using debuggers like  (GNU debugger), to find the cause of the error.

A meaningful backtrace contains not only the shared objects where the call was generated, but also the name of the function, the filename and the line where it stopped. Unfortunately on a system optimised for performance and conserved disk space, the backtraces are useless and show only the pointers on the stack and a series of ?? instead of the functions' names and position.

This guide will show how it's possible to get useful, meaningful backtraces in Gentoo, by using Portage features.

Compiler flags
By default  does not build debug information inside the objects (libraries and programs) it builds, as that creates larger objects. Also, many optimisations interfere with how the debug information is saved. For these reasons, the first thing to pay attention to is that the CFLAGS are set to generate useful debug information.

The basic flag to add in this case is. That tells the compiler to include extra information in objects, such as filenames and line numbers. This is usually enough to have basic backtraces, but the flag  adds more information. There is actually another flag, but its use is not recommended. It seems to break binary interfaces and might lead to extra crashes. For instance, breaks when built with that flag. If you want to provide as much information as possible, you should use the  flag.

Example of CFLAGS with debug information

High optimisation levels, such as  might cause the backtrace to be less faithful, or incorrect. Generally speaking,  and   can be used safely to get an approximate backtrace, down to the function called and the area of the source file where the crash happened. For more precise backtraces, you should instead use.

Note for x86 architecture users: x86 users frequently have  in their CFLAGS. The x86 architecture has a limited set of general registers, and this flag can make an extra register available, which improves performance. However there is a cost: it makes it impossible for  to "walk the stack" — in other words, to generate a backtrace reliably. Remove this flag from CFLAGS to build something easier for  to understand. Most other platforms do not have to worry; either they generally don't set  anyway, or the code generated by   does not confuse   (in which case the flag is already enabled by   optimisation level).

Hardened users have other things to worry about. The hardened FAQ provides the extra hints and tips you need to know.

Stripping
Just changing your CFLAGS and re-emerging world won't give you meaningful backtraces anyway, as you have to solve the stripping problem. By default Portage strips binaries. In other words, it removes the sections unneeded to run them to reduce the size of the installed files. This is a good thing for an average user not needing useful backtraces, but removes all the debug information generated by  flags, and also the symbol tables that are used to find the base information to show a backtrace in human readable form.

There are two ways to stop stripping from interfering with debugging and useful backtraces. The first is to tell Portage to not strip binaries at all, by adding nostrip to FEATURES. This will leave the installed files exactly as  created them, with all the debug information and symbol tables, which increases the disk space occupied by executables and libraries. To avoid this problem, in Portage version 2.0.54-r1 and the 2.1 series, it's possible to use the splitdebug FEATURE instead.

With splitdebug enabled, Portage will still strip the binaries installed in the system. But before doing that, all the useful debug information is copied to a ".debug" file, which is then installed inside (the complete name of the file would be given by appending to that the path where the file is actually installed). The path to that file is then saved in the original file inside an ELF section called ".gnu_debuglink", so that  knows which file to load the symbols from.

Another advantage of splitdebug is that it doesn't require you to rebuild the package to get rid of the debug information. This is helpful when you build some packages with debugging to get a backtrace of a single error. Once it's fixed, you just need to remove the directory.

To be sure to not strip binaries, you must also be sure you don't have the  flag set in your LDFLAGS. That tells the linker to strip the resulting binaries in the link phase. Also note that using that flag might lead to further problems. It won't respect the strip restrictions imposed by some packages that stop working when entirely stripped.

debug USE flag
Some ebuilds provide a debug USE flag. While some mistakenly use it to provide debug information and play with compiler flags when it is enabled, that is not its purpose.

If you're trying to debug a reproduceable crash, you want to leave this USE flag alone, as it'll be building a different source than what you had before. It is more efficient to get first a backtrace without changing the code, by simply emitting symbol information, and just afterward enable debug features to track the issue further down.

Debug features that are enabled by the USE flag include assertions, debug logs on screen, debug files, leak detection and extra-safe operations (such as scrubbing memory before use). Some of them might be taxing, especially for complex software or software where performance is an important issue.

For these reasons, please exercise caution when enabling the debug USE flag, and only consider it a last-chance card.

Introducing gdb
Once your packages are built with debug information and are not stripped, you just need to get the backtrace. To do so you need the package. It contains the GNU debugger. After installing that, you can proceed with getting the backtrace. The simplest way to get one is to run the program from inside. To do so, you need to point  to the path of the program to run, give it the arguments it will need, and then run it:

Running ls through gdb

The message "Program exited normally" means that the program exited with the code 0. That means that no errors were reached. You shouldn't trust that too much, as there are programs that exit with status 0 when they reach error conditions. Another common message is "Program exited with code nn ". That simply tells you which non-zero status code they returned. That might imply a handled or expected error condition. For segmentation faults and aborts, you get instead a "Program received signal SIGsomething" message.

When a program receives a signal, it might be for many different reasons. In case of SIGSEGV and SIGABRT (respectively segmentation fault and abort), it usually means the code is doing something wrong, like doing a wrong syscall or trying to access memory through a broken pointer. Other common signals are SIGTERM, SIGQUIT and SIGINT (the latter is received when CTRL-C is sent to the program, and usually gets caught by  and ignored by the program).

Finally there is the series of "Real-Time events". They are named SIG nn with nn being a number greater than 31. The pthread implementation usually uses them to syncronise the different threads of the program, and thus they don't represent error conditions of any sort. It's easy to provide meaningless backtraces when confusing the Real-Time signals with error conditions. To prevent this, you can tell  to not stop the program when they are received, and instead pass them directly to the program, like in the following example.

Running xine-ui through gdb, ignoring real-time signals.

The  command tells   what it should do when the given signal is sent to the command; in this case the flags are   (don't stop the program returning the command to the debugger),   (don't bother printing the reception of such a signal),   (don't ignore the signal — ignoring signals is dangerous, as it means discarding them without passing them to the program),   (pass the signal to the debugged program).

After the eventual Real-Time events are being ignored by , you should try to reproduce the crash you want to report. If you can reproduce it systematically, it's quite easy. When  tells you that the program received the SIGSEGV or SIGABRT signal (or whatever else signal might represent the error condition for the program), you'll have to actually ask for the backtrace, possibly saving it somewhere. The basic command to do that is , which is short for   , which will show you the backtrace of the current thread (if the program is single-threaded, there's only one thread).

An alternative command to get a more detailed backtrace is. That also gives you the information about parameters and local variables to the function where calls are being made (when they are available and not removed by optimisations). This makes the trace longer but also more useful when trying to find, for example, why a pointer is uninitialised.

Lately it's not rare that even simple programs are written with multiple threads, making the use of a simple  output, albeit meaningful, quite useless, as it might represent the status of a thread different from the one in which the signal is thrown, or from the one where the error condition manifested (in case there's another thread responsible for throwing signals). For this reason, you should instead get the trace with the longer command , that tells the debug to print the full tracing of all the threads currently running.

If the backtrace is short, it's easy to copy and paste it out of the terminal (unless the failure happens on a terminal without X), but sometimes it's just too long to be copied easily, because it spans over multiple pages. To be able to get the backtraces on a file to attach to a bug, you can use the  feature:

Using logging feature to save the backtrace to file

Now you can get the backtrace in the file, and just send it via email or attach that file to the related bug.

Core dumps
Sometimes the crashes are difficult to reproduce, the program is vastly threaded, it's too slow to run in  or it's messed up when run through it (shouldn't surprise anybody that running inside the debugger there are more bugs than are reproducible without the debugger itself). In these cases, there is one tool that comes in useful: the core dump.

A core dump is a file that contains the whole memory area of a program when it crashed. Using that file, it's possible to extract the stack backtrace even if the program has crashed outside , assuming core dumps are enabled. By default core dumps are not enabled on Gentoo Linux (they are, however, enabled by default on Gentoo/FreeBSD ), so you have to enable them.

The core dump files are generated directly by the kernel; for this reason, the kernel need to have the feature enabled at build time to work properly. While all the default configurations enable core dump files, if you're running an embedded kernel, or you have configured otherwise standard kernel features, you should verify the following options:

Kernel options to enable core dumps

Core dumps can be enabled on the system level or the shell session level. In the first case, everything in the system that crashes and does not have already a crash handler (see later for more notes about KDE's crash handler) will dump. When enabled at shell session level, only the programs started from that session will leave behind a dump.

To enable core dumps on a system level, you have to edit either (if you're using PAM, as is the default) or. In the first case, you must define a limit (whether hard or, most commonly, soft; for core files, that might be anywhere from 0 to no limit). In the latter case, you just need to set the variable C to the size limit of a core file (here there's no "unlimited").

Example of rule to get unlimited core files when using PAM

Example of rule to get core files up to 20MB when not using PAM

To enable core files on a single shell session you can use the  command with the   option. 0 means disabled; any other positive number is the size in KB of the generated core file, while unlimited simply removes the limit on core file dimension. From that point on, all the programs that exit because of a signal like SIGABRT or SIGSEGV will leave behind a core file that might be called either "core" or "core. pid " (where pid is replaced with the actual pid of the program that died).

Example of ulimit use

After you get a core dump, you can run  on it, specifying both the path to the file that generated the core dump (it has to be the same exact binary, so if you recompile, the core dump is useless) and the path to the core file. Once you have  open on it, you can follow the same instructions given above as it had just received the signal killing it.

Starting gdb on a core file

As an alternative, you can use  's command-line capabilities to get the backtrace without entering the interactive mode. This also makes it easier to save the backtrace in a file or to send it to a pipe of any kind. The trick lies in the  and   options that are accepted by. You can use the following bash function to get the full backtrace of a core dump (including all threads) on the standard output stream.

Function to get the whole backtrace out of a core dump

KDE crash handler's notes
KDE-based applications runs by default with their own crash handler, which is presented by the user by the means of "Dr. Konqi" if it's installed (the package is either or  (included in  ). This crash handler shows the user an informative dialog telling him that the program has crashed. On this dialog there is a "Backtrace" tab that, when loaded, calls   and makes it load the data and generate the full backtrace on the behalf of the user, showing it in the main text box and allowing it to be saved directly to a file. That backtrace is usually good enough for reporting a problem.

When drkonqi is not installed, the crashes won't generate a core dump anyway, and the user will receive no information by default. To avoid this, it's possible to use the  argument on all the KDE-based applications. That disables the crash handler entirely and leaves the signals to be handled by the operating system as usual. This is useful to generate core files when drkonqi is not available or when wanting to inspect stack frames by hand.

Acknowledgements
We would like to thank the following authors and editors for their contributions to this guide:


 * Diego E. Pettenò
 * Ned Ludd
 * Kevin Quinn
 * Donnie Berkholz