Stack smashing debugging guide

From Gentoo Wiki
Jump to:navigation Jump to:search

This is a step-by-step guide to debug stack smashing violations.

Symptoms

The stack corruption always looks the same:

user $some-command
...
*** stack smashing detected ***: terminated

This message comes from -fstack-protector.

TL;DR:

  1. Enable debugging symbols
  2. Rebuild the executable as non-PIE (not position independent) to make addresses reproducible:
    user $LDFLAGS=-no-pie emerge -v1 foo-package
  3. Enable core dump generation with:
    user $ulimit -c unlimited
  4. Identify the function where stack is corrupted.
  5. Find where stack canary is stored on stack.
  6. Add a gdb watchpoint and find out where the canary override happens.

Practical example

To get some hands-on experience, this article now explores a simple runnable toy example:

CODE a.c
#include <stdio.h>

// $ gcc a.c -o a
// $ ./a 1 2 3 4 5 6 7 8
// *** stack smashing detected ***: terminated
int main(int argc, char * argv[]) {
    volatile long v[8];
    v[argc] = 42;

    printf("Hello! Is my stack OK?\n");

    return v[argc+1];
}
user $gcc a.c -o a
user $./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK?
*** stack smashing detected ***: terminated

To make addresses stable across invocations, PIE should be disabled for testing purposes. While at it, debugging info should be enabled:

user $gcc a.c -o a -no-pie -ggdb3
user $./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK?
*** stack smashing detected ***: terminated

Next, enable core dumps to peek at the approximate location of stack crash:

user $ulimit -c unlimited
user $./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK?
*** stack smashing detected ***: terminated
Aborted (core dumped)

Still no change in the output. The bug did not disappear. Good!

Next, a backtrace should be obtained to see which function the failure happened in:

user $gdb --quiet ./a core.1117780
Reading symbols from ./a...
[New LWP 1117780]
Core was generated by `./a 1 2 3 4 5 6 7 8'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50        return ret;
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fcbdeed6537 in __GI_abort () at abort.c:79
#2  0x00007fcbdef2f1d9 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fcbdf036c2f "*** %s ***: terminated\n")
    at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007fcbdefbd0a2 in __GI___fortify_fail (msg=msg@entry=0x7fcbdf036c17 "stack smashing detected") at fortify_fail.c:26
#4  0x00007fcbdefbd080 in __stack_chk_fail () at stack_chk_fail.c:24
#5  0x000000000040118f in main (argc=9, argv=0x7fff257458f8) at a.c:13

The interesting part here is the caller of __stack_chk_fail. In this case, it's main.

Now the hardest part: finding in assembly where a canary value was stored and loaded on stack.

The canary is a value placed on stack and checked by the compiler to see if anything corrupts the canary value. Don't panic. There isn't really much assembly knowledge needed to find the canary!

The canary is emitted by the compiler if -fstack-protector= option is enabled. Gentoo builds gcc with the --enable-default-ssp configure option which enables this by default:

user $gdb --quiet ./a core.1117780
(gdb) frame 5
#5  0x000000000040118f in main (argc=9, argv=0x7fff257458f8) at a.c:13
13      }
(gdb) disassemble
Dump of assembler code for function main:
   0x0000000000401136 <+0>:     push   %rbp
   0x0000000000401137 <+1>:     mov    %rsp,%rbp
   0x000000000040113a <+4>:     sub    $0x60,%rsp
   0x000000000040113e <+8>:     mov    %edi,-0x54(%rbp)
   0x0000000000401141 <+11>:    mov    %rsi,-0x60(%rbp)
   0x0000000000401145 <+15>:    mov    %fs:0x28,%rax
   0x000000000040114e <+24>:    mov    %rax,-0x8(%rbp)
   0x0000000000401152 <+28>:    xor    %eax,%eax
   0x0000000000401154 <+30>:    mov    -0x54(%rbp),%eax
   0x0000000000401157 <+33>:    cltq
   0x0000000000401159 <+35>:    movq   $0x2a,-0x50(%rbp,%rax,8)
   0x0000000000401162 <+44>:    lea    0xe9b(%rip),%rdi        # 0x402004
   0x0000000000401169 <+51>:    callq  0x401030 <puts@plt>
   0x000000000040116e <+56>:    mov    -0x54(%rbp),%eax
   0x0000000000401171 <+59>:    add    $0x1,%eax
   0x0000000000401174 <+62>:    cltq
   0x0000000000401176 <+64>:    mov    -0x50(%rbp,%rax,8),%rax
   0x000000000040117b <+69>:    mov    -0x8(%rbp),%rdx
   0x000000000040117f <+73>:    sub    %fs:0x28,%rdx
   0x0000000000401188 <+82>:    je     0x40118f <main+89>
   0x000000000040118a <+84>:    callq  0x401040 <__stack_chk_fail@plt>
=> 0x000000000040118f <+89>:    leaveq
   0x0000000000401190 <+90>:    retq
End of assembler dump.

On amd64, the magic value is %fs:0x28. It's important to track where it's stored on the stack. It's always very close to %fs:0x28 itself. In this case, it is a sequence of 3 instructions:

CODE gdb.asm
0x0000000000401136 <+0>:     push   %rbp
   0x0000000000401137 <+1>:     mov    %rsp,%rbp
   0x000000000040113a <+4>:     sub    $0x60,%rsp
   0x000000000040113e <+8>:     mov    %edi,-0x54(%rbp)
   0x0000000000401141 <+11>:    mov    %rsi,-0x60(%rbp)
   0x0000000000401145 <+15>:    mov    %fs:0x28,%rax          ; read value from TLS: rax = %fs:0x28
   0x000000000040114e <+24>:    mov    %rax,-0x8(%rbp)        ; store canary on stack: [%rbp - 8] = rax
   0x0000000000401152 <+28>:    xor    %eax,%eax              ; erase canary from registers: rax = 0
   ...
   0x000000000040117b <+69>:    mov    -0x8(%rbp),%rdx        ; load value from stack
   0x000000000040117f <+73>:    sub    %fs:0x28,%rdx          ; compare value with TLS value
   0x0000000000401188 <+82>:    je     0x40118f <main+89>     ; fail if values don't match
   0x000000000040118a <+84>:    callq  0x401040 <__stack_chk_fail@plt>
=> 0x000000000040118f <+89>:    leaveq
   0x0000000000401190 <+90>:    retq

Here, the important bit is the exact instruction where the canary is stored on the stack: 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) and erased from the registers: 0x0000000000401152 <+28>: xor %eax,%eax.

The task is to get right past the store instruction, set a canary watch, and wait until it gets corrupted.

Here is the full session from start to finish to find the corruption:

user $gdb --quiet --args ./a 1 2 3 4 5 6 7 8
Reading symbols from ./a...

(gdb) start
Temporary breakpoint 1 at 0x401145: file a.c, line 6.
Starting program: /tmp/a 1 2 3 4 5 6 7 8

Temporary breakpoint 1, main (argc=9, argv=0x7fffffffd7f8) at a.c:6
6       int main(int argc, char * argv[]) {

(gdb) break *0x0000000000401152
Breakpoint 2 at 0x401152: file a.c, line 6.
(gdb) continue
Continuing.

Breakpoint 2, 0x0000000000401152 in main (argc=9, argv=0x7fffffffd7f8) at a.c:6
6       int main(int argc, char * argv[]) {

(gdb) watch *(long*)($rbp-8)                                                                                                                           
Watchpoint 3: *(long*)($rbp-8)

(gdb) continue
Continuing.

Watchpoint 3: *(long*)($rbp-8)

Old value = -6583947134921550848
New value = 42
main (argc=9, argv=0x7fffffffd7f8) at a.c:10
10          printf("Hello! Is my stack OK?\n");

(gdb) list
5       // *** stack smashing detected ***: terminated
6       int main(int argc, char * argv[]) {
7           volatile long v[8];
8           v[argc] = 42;
9
10          printf("Hello! Is my stack OK?\n");
11
12          return v[argc+1];
13      }

(gdb) disassemble /s
Dump of assembler code for function main:
a.c:
6       int main(int argc, char * argv[]) {
   0x0000000000401136 <+0>:     push   %rbp
   0x0000000000401137 <+1>:     mov    %rsp,%rbp
   0x000000000040113a <+4>:     sub    $0x60,%rsp
   0x000000000040113e <+8>:     mov    %edi,-0x54(%rbp)
   0x0000000000401141 <+11>:    mov    %rsi,-0x60(%rbp)
   0x0000000000401145 <+15>:    mov    %fs:0x28,%rax
   0x000000000040114e <+24>:    mov    %rax,-0x8(%rbp)
   0x0000000000401152 <+28>:    xor    %eax,%eax

7           volatile long v[8];
8           v[argc] = 42;
   0x0000000000401154 <+30>:    mov    -0x54(%rbp),%eax
   0x0000000000401157 <+33>:    cltq
   0x0000000000401159 <+35>:    movq   $0x2a,-0x50(%rbp,%rax,8)

9
10          printf("Hello! Is my stack OK?\n");
=> 0x0000000000401162 <+44>:    lea    0xe9b(%rip),%rdi        # 0x402004
   0x0000000000401169 <+51>:    callq  0x401030 <puts@plt>

11
12          return v[argc+1];
   0x000000000040116e <+56>:    mov    -0x54(%rbp),%eax
   0x0000000000401171 <+59>:    add    $0x1,%eax
   0x0000000000401174 <+62>:    cltq
   0x0000000000401176 <+64>:    mov    -0x50(%rbp,%rax,8),%rax

13      }
   0x000000000040117b <+69>:    mov    -0x8(%rbp),%rdx
   0x000000000040117f <+73>:    sub    %fs:0x28,%rdx
   0x0000000000401188 <+82>:    je     0x40118f <main+89>
   0x000000000040118a <+84>:    callq  0x401040 <__stack_chk_fail@plt>
   0x000000000040118f <+89>:    leaveq
   0x0000000000401190 <+90>:    retq
End of assembler dump.

Sequence of gdb commands used, explained:

  1. start: start the program and pause at the beginning of main function
  2. break: set next stop at the instruction after canary store
  3. continue: resume program until it breaks again. It should break at the 0x0000000000401152 address
  4. watch: watch memory changes at specified address
  5. continue: wait when watch triggers for write
  6. disassemble /s: get the disassembly interspersed with source code.

The instruction preceding the current instruction (marked as =>) is the offender:

CODE gdb.asm
v[argc] = 42;
    0x0000000000401159 <+35>:    movq   $0x2a,-0x50(%rbp,%rax,8)
 => 0x0000000000401162 <+44>:    lea    0xe9b(%rip),%rdi        # 0x402004

Thus v[argc] = 42; is the problematic source line (0x2a is 42).

Now, as an exercise, more debugging can be carried out to understand the nature of the overflow.

Links