Stack smashing debugging guide
This is a step-by-step guide to debug stack smashing violations.
Symptoms
The stack corruption always looks the same:
user $
some-command
... *** stack smashing detected ***: terminated
This message comes from -fstack-protector
.
TL;DR:
- Enable debugging symbols
- Rebuild the executable as non-PIE (not position independent) to make addresses reproducible:
user $
LDFLAGS=-no-pie emerge -v1 foo-package
- Enable core dump generation with:
user $
ulimit -c unlimited
- Identify the function where stack is corrupted.
- Find where stack canary is stored on stack.
- Add a gdb watchpoint and find out where the canary override happens.
Practical example
To get some hands-on experience, this article now explores a simple runnable toy example:
#include <stdio.h>
// $ gcc a.c -o a
// $ ./a 1 2 3 4 5 6 7 8
// *** stack smashing detected ***: terminated
int main(int argc, char * argv[]) {
volatile long v[8];
v[argc] = 42;
printf("Hello! Is my stack OK?\n");
return v[argc+1];
}
user $
gcc a.c -o a
user $
./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK? *** stack smashing detected ***: terminated
To make addresses stable across invocations, PIE should be disabled for testing purposes. While at it, debugging info should be enabled:
user $
gcc a.c -o a -no-pie -ggdb3
user $
./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK? *** stack smashing detected ***: terminated
Next, enable core dumps to peek at the approximate location of stack crash:
user $
ulimit -c unlimited
user $
./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK? *** stack smashing detected ***: terminated Aborted (core dumped)
Still no change in the output. The bug did not disappear. Good!
Next, a backtrace should be obtained to see which function the failure happened in:
user $
gdb --quiet ./a core.1117780
Reading symbols from ./a... [New LWP 1117780] Core was generated by `./a 1 2 3 4 5 6 7 8'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 return ret; (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fcbdeed6537 in __GI_abort () at abort.c:79 #2 0x00007fcbdef2f1d9 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fcbdf036c2f "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:155 #3 0x00007fcbdefbd0a2 in __GI___fortify_fail (msg=msg@entry=0x7fcbdf036c17 "stack smashing detected") at fortify_fail.c:26 #4 0x00007fcbdefbd080 in __stack_chk_fail () at stack_chk_fail.c:24 #5 0x000000000040118f in main (argc=9, argv=0x7fff257458f8) at a.c:13
The interesting part here is the caller of __stack_chk_fail. In this case, it's main.
Now the hardest part: finding in assembly where a canary value was stored and loaded on stack.
The canary is a value placed on stack and checked by the compiler to see if anything corrupts the canary value. Don't panic. There isn't really much assembly knowledge needed to find the canary!
The canary is emitted by the compiler if -fstack-protector= option is enabled. Gentoo builds gcc with the --enable-default-ssp configure option which enables this by default:
user $
gdb --quiet ./a core.1117780
(gdb) frame 5 #5 0x000000000040118f in main (argc=9, argv=0x7fff257458f8) at a.c:13 13 } (gdb) disassemble Dump of assembler code for function main: 0x0000000000401136 <+0>: push %rbp 0x0000000000401137 <+1>: mov %rsp,%rbp 0x000000000040113a <+4>: sub $0x60,%rsp 0x000000000040113e <+8>: mov %edi,-0x54(%rbp) 0x0000000000401141 <+11>: mov %rsi,-0x60(%rbp) 0x0000000000401145 <+15>: mov %fs:0x28,%rax 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) 0x0000000000401152 <+28>: xor %eax,%eax 0x0000000000401154 <+30>: mov -0x54(%rbp),%eax 0x0000000000401157 <+33>: cltq 0x0000000000401159 <+35>: movq $0x2a,-0x50(%rbp,%rax,8) 0x0000000000401162 <+44>: lea 0xe9b(%rip),%rdi # 0x402004 0x0000000000401169 <+51>: callq 0x401030 <puts@plt> 0x000000000040116e <+56>: mov -0x54(%rbp),%eax 0x0000000000401171 <+59>: add $0x1,%eax 0x0000000000401174 <+62>: cltq 0x0000000000401176 <+64>: mov -0x50(%rbp,%rax,8),%rax 0x000000000040117b <+69>: mov -0x8(%rbp),%rdx 0x000000000040117f <+73>: sub %fs:0x28,%rdx 0x0000000000401188 <+82>: je 0x40118f <main+89> 0x000000000040118a <+84>: callq 0x401040 <__stack_chk_fail@plt> => 0x000000000040118f <+89>: leaveq 0x0000000000401190 <+90>: retq End of assembler dump.
On amd64, the magic value is %fs:0x28. It's important to track where it's stored on the stack. It's always very close to %fs:0x28 itself. In this case, it is a sequence of 3 instructions:
0x0000000000401136 <+0>: push %rbp
0x0000000000401137 <+1>: mov %rsp,%rbp
0x000000000040113a <+4>: sub $0x60,%rsp
0x000000000040113e <+8>: mov %edi,-0x54(%rbp)
0x0000000000401141 <+11>: mov %rsi,-0x60(%rbp)
0x0000000000401145 <+15>: mov %fs:0x28,%rax ; read value from TLS: rax = %fs:0x28
0x000000000040114e <+24>: mov %rax,-0x8(%rbp) ; store canary on stack: [%rbp - 8] = rax
0x0000000000401152 <+28>: xor %eax,%eax ; erase canary from registers: rax = 0
...
0x000000000040117b <+69>: mov -0x8(%rbp),%rdx ; load value from stack
0x000000000040117f <+73>: sub %fs:0x28,%rdx ; compare value with TLS value
0x0000000000401188 <+82>: je 0x40118f <main+89> ; fail if values don't match
0x000000000040118a <+84>: callq 0x401040 <__stack_chk_fail@plt>
=> 0x000000000040118f <+89>: leaveq
0x0000000000401190 <+90>: retq
Here, the important bit is the exact instruction where the canary is stored on the stack: 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) and erased from the registers: 0x0000000000401152 <+28>: xor %eax,%eax.
The task is to get right past the store instruction, set a canary watch, and wait until it gets corrupted.
Here is the full session from start to finish to find the corruption:
user $
gdb --quiet --args ./a 1 2 3 4 5 6 7 8
Reading symbols from ./a... (gdb) start Temporary breakpoint 1 at 0x401145: file a.c, line 6. Starting program: /tmp/a 1 2 3 4 5 6 7 8 Temporary breakpoint 1, main (argc=9, argv=0x7fffffffd7f8) at a.c:6 6 int main(int argc, char * argv[]) { (gdb) break *0x0000000000401152 Breakpoint 2 at 0x401152: file a.c, line 6. (gdb) continue Continuing. Breakpoint 2, 0x0000000000401152 in main (argc=9, argv=0x7fffffffd7f8) at a.c:6 6 int main(int argc, char * argv[]) { (gdb) watch *(long*)($rbp-8) Watchpoint 3: *(long*)($rbp-8) (gdb) continue Continuing. Watchpoint 3: *(long*)($rbp-8) Old value = -6583947134921550848 New value = 42 main (argc=9, argv=0x7fffffffd7f8) at a.c:10 10 printf("Hello! Is my stack OK?\n"); (gdb) list 5 // *** stack smashing detected ***: terminated 6 int main(int argc, char * argv[]) { 7 volatile long v[8]; 8 v[argc] = 42; 9 10 printf("Hello! Is my stack OK?\n"); 11 12 return v[argc+1]; 13 } (gdb) disassemble /s Dump of assembler code for function main: a.c: 6 int main(int argc, char * argv[]) { 0x0000000000401136 <+0>: push %rbp 0x0000000000401137 <+1>: mov %rsp,%rbp 0x000000000040113a <+4>: sub $0x60,%rsp 0x000000000040113e <+8>: mov %edi,-0x54(%rbp) 0x0000000000401141 <+11>: mov %rsi,-0x60(%rbp) 0x0000000000401145 <+15>: mov %fs:0x28,%rax 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) 0x0000000000401152 <+28>: xor %eax,%eax 7 volatile long v[8]; 8 v[argc] = 42; 0x0000000000401154 <+30>: mov -0x54(%rbp),%eax 0x0000000000401157 <+33>: cltq 0x0000000000401159 <+35>: movq $0x2a,-0x50(%rbp,%rax,8) 9 10 printf("Hello! Is my stack OK?\n"); => 0x0000000000401162 <+44>: lea 0xe9b(%rip),%rdi # 0x402004 0x0000000000401169 <+51>: callq 0x401030 <puts@plt> 11 12 return v[argc+1]; 0x000000000040116e <+56>: mov -0x54(%rbp),%eax 0x0000000000401171 <+59>: add $0x1,%eax 0x0000000000401174 <+62>: cltq 0x0000000000401176 <+64>: mov -0x50(%rbp,%rax,8),%rax 13 } 0x000000000040117b <+69>: mov -0x8(%rbp),%rdx 0x000000000040117f <+73>: sub %fs:0x28,%rdx 0x0000000000401188 <+82>: je 0x40118f <main+89> 0x000000000040118a <+84>: callq 0x401040 <__stack_chk_fail@plt> 0x000000000040118f <+89>: leaveq 0x0000000000401190 <+90>: retq End of assembler dump.
Sequence of gdb commands used, explained:
- start: start the program and pause at the beginning of main function
- break: set next stop at the instruction after canary store
- continue: resume program until it breaks again. It should break at the 0x0000000000401152 address
- watch: watch memory changes at specified address
- continue: wait when watch triggers for write
- disassemble /s: get the disassembly interspersed with source code.
The instruction preceding the current instruction (marked as =>) is the offender:
v[argc] = 42;
0x0000000000401159 <+35>: movq $0x2a,-0x50(%rbp,%rax,8)
=> 0x0000000000401162 <+44>: lea 0xe9b(%rip),%rdi # 0x402004
Thus v[argc] = 42; is the problematic source line (0x2a is 42).
Now, as an exercise, more debugging can be carried out to understand the nature of the overflow.
Links
- ARM64 debugging example: https://bugs.gentoo.org/721570#c7
- MIPS debugging example: https://trofi.github.io/posts/205-stack-protection-on-mips64.html