Project:Toolchain/SFrame

From Gentoo Wiki
Jump to:navigation Jump to:search

Motivation

Users want to be able to trace and profile applications. To obtain backtraces, frame pointers are often advocated as a solution because they don't require debug information, are fast to unwind via, and (somewhat) reliable. But using -fno-omit-frame-pointer means the compiler loses a general purpose register (GPR) and might cause more spills to the stack.

Distributions have come under pressure to go against the GCC and Clang default (-fomit-frame-pointer with optimization) to facilitate profiling and in some cases debugging. Notably Fedora and Ubuntu changed their defaults a few years ago.

Florian Weimer recently found that even with -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer, unwinding can be defeated by optimisations like shrink-wrapping (which GCC 16 trunk will do more often on x86-64).

Users are generally more willing to accept slightly increased disk space than they are decreased runtime performance for a usecase they may not care about.

SFrames provide the needed & minimal information to allow fast unwinding in a compact representation, without costing a GPR.

Patches

GNU Binutils

Modern versions of sys-devel/binutils already support SFrames but sys-devel/binutils-2.45, to be released on 2025-07-27, will contain support for relocatable links which is useful for the kernel.

Indu Bhagat has additional patches currently being upstreamed on a branch (aim is to get them all in for 2.45).

For multilib (32-bit x86 builds on amd64), it can be awkward to pass specific flags, so to make gas only warn (not error) on -Wa,--gsframe:

FILE binutils-warn-on-no-sframe.patch
--- a/gas/dw2gencfi.c
+++ b/gas/dw2gencfi.c
@@ -2617,8 +2617,10 @@ cfi_finish (void)
 				    alignment);
 	  output_sframe (sframe_seg);
 	}
-      else
-	as_bad (_(".sframe not supported for target"));
+      else {
+	as_warn (_(".sframe not supported for target"));
+	return;
+      }
     }
 
   if ((all_cfi_sections & CFI_EMIT_debug_frame) != 0)

Bugs

Kernel

The patches are available combined at https://github.com/thesamesam/linux/tree/sframe-combined. A hacked up sys-kernel/vanilla-kernel with patches applied is available in sam's overlay.

perf

dev-util/perf will be one of the main consumers of SFrames.

The patches are available combined at https://github.com/thesamesam/linux/tree/sframe-combined. A hacked up dev-util/perf with patches applied is available in sam's overlay.

glibc

glibc needs a way to unwind for the purposes of backtrace() (and some other more complex uses).

sys-libs/glibc-2.42 will be released on 2025-08-01. The plan is to get the unwinder changes in for that.

Impact

TODO: disk space measurements

Testing

To check, using perf maintainer Namhyung Kim's suggestion, whether perf is asking the kernel for a deferred callchain:

user $perf record -g -vv true |& grep defer
  defer_callchain                  1
  defer_callchain                  1
  defer_callchain                  1
  defer_callchain                  1
  defer_callchain                  1
  defer_callchain                  1
  defer_callchain                  1
  defer_callchain                  1

As long as "switching off deferred callchain support" doesn't appear, it should be fine.

To check whether the kernel is actually providing a deferred callchain:

user $ grep -A5 CALLCHAIN_DEFERRED
82795152792808 0x8a0 [0x38]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 2554639/2554639: 0
... FP chain: nr:0
 ... thread: true:2554639
 ...... dso: <not found>

0x8d8@perf.data [0x48]: event: 9
[...]
82795153058898 0xe10 [0x50]: PERF_RECORD_CALLCHAIN_DEFERRED(IP, 0x2): 2554639/2554639: 0
... FP chain: nr:0
 ... thread: true:2554639
 ...... dso: /usr/lib64/ld-linux-x86-64.so.2

0xe60@perf.data [0x30]: event: 4
--
  CALLCHAIN_DEFERRED events:          8  (25.0%)
      FINISHED_ROUND events:          1  ( 3.1%)
            ID_INDEX events:          1  ( 3.1%)
          THREAD_MAP events:          1  ( 3.1%)
             CPU_MAP events:          1  ( 3.1%)
        EVENT_UPDATE events:          1  ( 3.1%)

Another example, with it working, looks like:

user $perf record -g -- perf bench sched messaging
user $perf report -s dso,sym -g none | grep -F -e Children -e '[.]' | head
Warning:
1630 out of order events recorded.
# Children      Self  Shared Object         Symbol
    12.36%    12.36%  libc.so.6             [.] __cxa_finalize
     2.71%     2.71%  ld-linux-x86-64.so.2  [.] do_lookup_x
     1.35%     1.35%  ld-linux-x86-64.so.2  [.] _dl_lookup_symbol_x
     1.00%     1.00%  ld-linux-x86-64.so.2  [.] _dl_relocate_object_no_relro
     0.64%     0.64%  perf                  [.] receiver
     0.62%     0.62%  perf                  [.] bench_sched_messaging
     0.58%     0.58%  libc.so.6             [.] __syscall_cancel
     0.53%     0.53%  libc.so.6             [.] __run_exit_handlers
     0.51%     0.51%  libc.so.6             [.] cfree@GLIBC_2.2.5

And with it broken, it looks like:

user $perf record -g -- perf bench sched messaging
user $perf report -s dso,sym -g none | grep -F -e Children -e '[.]' | head
# Children      Self  Shared Object         Symbol
    32.54%    29.70%  libc.so.6             [.] __cxa_finalize
    31.99%     0.00%  libstdc++.so.6.0.34   [.] __gxx_personality_v0
    25.81%     0.00%  libc.so.6             [.] __libc_start_call_main
    25.81%     0.00%  perf                  [.] 0x000055871450e0ec
    25.81%     0.00%  perf                  [.] 0x000055871459545e
    25.80%     0.00%  perf                  [.] 0x000055871459513e
    25.80%     0.00%  perf                  [.] 0x000055871460324c
    19.42%     0.00%  [unknown]             [.] 0000000000000000
    18.83%     0.53%  libc.so.6             [.] __syscall_cancel

External links