Project Talk:Quality Assurance/Backtraces

From Gentoo Wiki
Jump to: navigation, search

Core dump naming and save path

Talk status
This discussion is still ongoing.

Written:

«…a core file that might be called either "core" or "core.pid"
(where pid is replaced with the actual pid of the program that died).»

Should be written something like:

By default core dump file is written to the current directory (usually, but not always — $HOME of user, running program). The name of core dump file is managed by "kernel.core_pattern" sysctl setting. The default value is "core". Rememeber: if you want to perform permanent change, you should put your value into /etc/sysctl.conf.

The second notable setting is the clearest, but not only, way to add pid suffix: the "kernel.core_uses_pid", by default disabled (so, if process provides several crashes in the same directory, see bug #504760 as example, you'll see only the last one).

There is another, better way to customize core dump file name.

I prefer the following value, specifying not only crashed program name and signal killed in filename, but writing cores into dedicated directory, not to search for cores, that also allows to catch otherwise missed crashes (again see, and maybe follow bug #504760):

kernel.core_pattern = /tmp/cores/core_%e-%s.%p

see man 5 core for detailed explanation and alternatives. The dedicated directory for cores must have full access for everybody (if user has no write access to the current directory, core dump probably willn't bew written). To create it in /tmp I use the following script:

/etc/local.d/mk_core_dir.start 
#!/bin/sh
#
mkdir -m 0777 /tmp/cores

You may want to use permanent directory instead. For example /var/tmp/cores.

Alternatively you can handle a directory using utils from sys-apps/opentmpfiles.

After configuring you could (and probably should) want to verify its correctness. To do it just run a program (for example — text editor), terminate it with followed with writing core signal, for example 6 (see man 1 kill and man 7 signal for details and alternatives) and see the result core file.

Performing described check I was wondered with the fact of necessity of execution bit on dedicated core dump directory (needed for cd?).

--Anarchist 2014

I have emailed the QA team. Hopefully they can look this over and (potentially) fix up the article... --Maffblaster (talk) 09:16, 5 March 2017 (UTC)

systemd/coredumpctl

Talk status
This discussion is still ongoing.

If anyone gets around to do it: some instructions and details on how to configure and use systemd's coredump handling and coredumpctl would fit in nicely here.

--Eliasp (talk) 18:07, 29 December 2014 (UTC)

Getting useful backtraces for X errors [Please add this as a section]

Talk status
This discussion is still ongoing.

X errors have the problem, that the program that uses X likely just exits normally or with an exit code. This makes it not obvious how to get a real backtrace in GDB. To achieve this, a number of additional steps have to be performed.

Also, a table of X error codes and request codes might come in handy.

As normally, make sure your program contains debug symbols:
CODE
CFLAGS="-Og -march=$YOUR_ARCH -pipe -ggdb" CXXFLAGS="$CFLAGS" FEATURES="nostrip" emerge --oneshot --nodeps $YOUR_PACKAGE

(--nodeps is used to save time. It is assumed the package has been installed normally, and does not lack any dependencies.)

Run gdb as follows:

(Using hexchat as an example.)

CODE
someuser@somemachine ~ $ export GDK_SYNCHRONIZE=1
someuser@somemachine ~ $ gdb hexchat
[...]
(gdb) break _XError
(gdb) run --sync
Starting program: /usr/bin/hexchat --sync
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffe35d1700 (LWP 4563)]

(hexchat:4553): Gdk-WARNING **: GdkWindow 0x56000ed unexpectedly destroyed

Thread 1 "hexchat" hit Breakpoint 1, _XError (dpy=dpy@entry=0x73f080, rep=rep@entry=0x14e0b70)
    at /var/tmp/portage/x11-libs/libX11-1.6.3/work/libX11-1.6.3/src/XlibInt.c:1396
1396    /var/tmp/portage/x11-libs/libX11-1.6.3/work/libX11-1.6.3/src/XlibInt.c: File or directory not found.
(gdb) set logging file backtrace.log
(gdb) set logging on
Copying output to backtrace.log.
(gdb) thread apply all bt full

Thread 2 (Thread 0x7fffe35d1700 (LWP 5197)):
#0  0x00007ffff55ae80d in poll () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fffe8e2558b in ?? () from /usr/lib64/libpulse.so.0
No symbol table info available.
[...]

(gdb) set logging off
Done logging to backtrace.log.
(gdb) quit
A debugging session is active.

        Inferior 1 [process 32061] will be killed.

Quit anyway? (y or n) y
someuser@somemachine ~ $

Setting a breakpoint at _XError is the key!

As you can see, the backtrace might not be particularly useful yet, due to the libraries not having been compiled with debug information yet.

The simplest way to solve that, is by finding the packages those libraries belong to, with (e.g. for libpulse):

CODE
someuser@somemachine ~ $ equery belongs /lib64/libc.so.6 /usr/lib64/libpulse.so.0 [...]
sys-libs/glibc-2.23-r2 (/lib64/libc.so.6 -> libc-2.23.so)
sys-libs/glibc-2.23-r2 (/lib64/libc-2.23.so)
media-sound/pulseaudio-8.0 (/usr/lib64/libpulse.so.0.19.0)
media-sound/pulseaudio-8.0 (/usr/lib64/libpulse.so.0 -> libpulse.so.0.19.0
[...]

and then re-emerging all those libraries with debug information too:

CODE
CFLAGS="-Og -march=$YOUR_ARCH -pipe -ggdb" CXXFLAGS="$CFLAGS" FEATURES="nostrip" emerge --oneshot --nodeps =sys-libs/glibc-2.23-r2 =media-sound/pulseaudio-8.0  [...]

Of course, splitdebug can be used too.

Now re-running step #2 will result in a useful stack trace in the backtrace.log file, which can be used in bug reports.