Stack-smashing-debugging-guide

From Gentoo Wiki
Jump to: navigation, search

This is a step-by-step guide to debug stack smashing violations.

Symptoms

The stack corruption always looks the same:

user $some-command
...
*** stack smashing detected ***: terminated

Tl;DR:

  1. Enable debugging symbols in your make.conf:
    CODE /etc/portage/make.conf
    CFLAGS="... -ggdb"
    CXXFLAGS="... -ggdb"
    FEATURES="... splitdebug"
    
  2. Disable positional-independent executable to make addresses reproducible and rebuild problematic package:
    user $LDFLAGS=-no-pie emerge -v1 foo-package
  3. Enable core dump generation with
    user $ulimit -c unlimited
    and spot the function where stack is corrupted.
  4. Find where stack canary is stored on stack.
  5. Add gdb watch point and find out where canary override happens.

Practical example

To get some hands-on experience let's explore the runnable toy example:

CODE a.c
#include <stdio.h>

// $ gcc a.c -o a
// $ ./a 1 2 3 4 5 6 7 8
// *** stack smashing detected ***: terminated
int main(int argc, char * argv[]) {
    volatile long v[8];
    v[argc] = 42;

    printf("Hello! Is my stack OK?\n");

    return v[argc+1];
}
user $gcc a.c -o a
user $./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK?
*** stack smashing detected ***: terminated

To make addresses stable across invocations let's disable PIE. While at it let's also enable debugging info:

user $gcc a.c -o a -no-pie -ggdb3
user $./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK?
*** stack smashing detected ***: terminated

Now let's enable core dumps to peek at approximate location of stack crash:

user $ulimit -c unlimited
user $./a 1 2 3 4 5 6 7 8
Hello! Is my stack OK?
*** stack smashing detected ***: terminated
Aborted (core dumped)

Still no change in the output. Bug did not disappear. Good!

Now let's get a backtrace to see which function failure happened in:

user $gdb --quiet ./a core.1117780
Reading symbols from ./a...
[New LWP 1117780]
Core was generated by `./a 1 2 3 4 5 6 7 8'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50        return ret;
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fcbdeed6537 in __GI_abort () at abort.c:79
#2  0x00007fcbdef2f1d9 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fcbdf036c2f "*** %s ***: terminated\n")
    at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007fcbdefbd0a2 in __GI___fortify_fail (msg=msg@entry=0x7fcbdf036c17 "stack smashing detected") at fortify_fail.c:26
#4  0x00007fcbdefbd080 in __stack_chk_fail () at stack_chk_fail.c:24
#5  0x000000000040118f in main (argc=9, argv=0x7fff257458f8) at a.c:13

The interesting part here is the caller of __stack_chk_fail. In our case it's main.

Now the hardest part: we need to find in assembly where canary value was stored and loaded on stack. Canary is a value placed on stack and checked by the compiler to see if anything corrupts canary value. Don't panic. You don't really need to know much of assembly to find the canary. Canary is emitted by the compiler if {{{1}}} option is enabled. Gentoo builds {{c|gcc} with --enable-default-ssp configure option which enables this option by default.

user $gdb --quiet ./a core.1117780
(gdb) frame 5
#5  0x000000000040118f in main (argc=9, argv=0x7fff257458f8) at a.c:13
13      }
(gdb) disassemble
Dump of assembler code for function main:
   0x0000000000401136 <+0>:     push   %rbp
   0x0000000000401137 <+1>:     mov    %rsp,%rbp
   0x000000000040113a <+4>:     sub    $0x60,%rsp
   0x000000000040113e <+8>:     mov    %edi,-0x54(%rbp)
   0x0000000000401141 <+11>:    mov    %rsi,-0x60(%rbp)
   0x0000000000401145 <+15>:    mov    %fs:0x28,%rax
   0x000000000040114e <+24>:    mov    %rax,-0x8(%rbp)
   0x0000000000401152 <+28>:    xor    %eax,%eax
   0x0000000000401154 <+30>:    mov    -0x54(%rbp),%eax
   0x0000000000401157 <+33>:    cltq
   0x0000000000401159 <+35>:    movq   $0x2a,-0x50(%rbp,%rax,8)
   0x0000000000401162 <+44>:    lea    0xe9b(%rip),%rdi        # 0x402004
   0x0000000000401169 <+51>:    callq  0x401030 <puts@plt>
   0x000000000040116e <+56>:    mov    -0x54(%rbp),%eax
   0x0000000000401171 <+59>:    add    $0x1,%eax
   0x0000000000401174 <+62>:    cltq
   0x0000000000401176 <+64>:    mov    -0x50(%rbp,%rax,8),%rax
   0x000000000040117b <+69>:    mov    -0x8(%rbp),%rdx
   0x000000000040117f <+73>:    sub    %fs:0x28,%rdx
   0x0000000000401188 <+82>:    je     0x40118f <main+89>
   0x000000000040118a <+84>:    callq  0x401040 <__stack_chk_fail@plt>
=> 0x000000000040118f <+89>:    leaveq
   0x0000000000401190 <+90>:    retq
End of assembler dump.

On amd64 magic value is %fs:0x28. We need to track where it's stored on stack. It's always very close to %fs:0x28 itself. In our case it is a sequence of 3 instructions:

CODE gdb.asm
0x0000000000401136 <+0>:     push   %rbp
   0x0000000000401137 <+1>:     mov    %rsp,%rbp
   0x000000000040113a <+4>:     sub    $0x60,%rsp
   0x000000000040113e <+8>:     mov    %edi,-0x54(%rbp)
   0x0000000000401141 <+11>:    mov    %rsi,-0x60(%rbp)
   0x0000000000401145 <+15>:    mov    %fs:0x28,%rax          ; read value from TLS: rax = %fs:0x28
   0x000000000040114e <+24>:    mov    %rax,-0x8(%rbp)        ; store canary on stack: [%rbp - 8] = rax
   0x0000000000401152 <+28>:    xor    %eax,%eax              ; erase canary from registers: rax = 0
   ...
   0x000000000040117b <+69>:    mov    -0x8(%rbp),%rdx        ; load value from stack
   0x000000000040117f <+73>:    sub    %fs:0x28,%rdx          ; compare value with TLS value
   0x0000000000401188 <+82>:    je     0x40118f <main+89>     ; fail if values don't match
   0x000000000040118a <+84>:    callq  0x401040 <__stack_chk_fail@plt>
=> 0x000000000040118f <+89>:    leaveq
   0x0000000000401190 <+90>:    retq

Here the important bit is exact instruction where canary is stored on stack: 0x000000000040114e <+24>: mov %rax,-0x8(%rbp) and erased from registers: 0x0000000000401152 <+28>: xor %eax,%eax. Our task is to get right past store instruction, set canary watch and wait when it gets corrupted.

Here is the full session from start to finish to find our corruption:

user $gdb --quiet --args ./a 1 2 3 4 5 6 7 8
Reading symbols from ./a...

(gdb) start
Temporary breakpoint 1 at 0x401145: file a.c, line 6.
Starting program: /tmp/a 1 2 3 4 5 6 7 8

Temporary breakpoint 1, main (argc=9, argv=0x7fffffffd7f8) at a.c:6
6       int main(int argc, char * argv[]) {

(gdb) break *0x0000000000401152
Breakpoint 2 at 0x401152: file a.c, line 6.
(gdb) continue
Continuing.

Breakpoint 2, 0x0000000000401152 in main (argc=9, argv=0x7fffffffd7f8) at a.c:6
6       int main(int argc, char * argv[]) {

(gdb) watch *(long*)($rbp-8)                                                                                                                           
Watchpoint 3: *(long*)($rbp-8)

(gdb) continue
Continuing.

Watchpoint 3: *(long*)($rbp-8)

Old value = -6583947134921550848
New value = 42
main (argc=9, argv=0x7fffffffd7f8) at a.c:10
10          printf("Hello! Is my stack OK?\n");

(gdb) list
5       // *** stack smashing detected ***: terminated
6       int main(int argc, char * argv[]) {
7           volatile long v[8];
8           v[argc] = 42;
9
10          printf("Hello! Is my stack OK?\n");
11
12          return v[argc+1];
13      }

(gdb) disassemble /s
Dump of assembler code for function main:
a.c:
6       int main(int argc, char * argv[]) {
   0x0000000000401136 <+0>:     push   %rbp
   0x0000000000401137 <+1>:     mov    %rsp,%rbp
   0x000000000040113a <+4>:     sub    $0x60,%rsp
   0x000000000040113e <+8>:     mov    %edi,-0x54(%rbp)
   0x0000000000401141 <+11>:    mov    %rsi,-0x60(%rbp)
   0x0000000000401145 <+15>:    mov    %fs:0x28,%rax
   0x000000000040114e <+24>:    mov    %rax,-0x8(%rbp)
   0x0000000000401152 <+28>:    xor    %eax,%eax

7           volatile long v[8];
8           v[argc] = 42;
   0x0000000000401154 <+30>:    mov    -0x54(%rbp),%eax
   0x0000000000401157 <+33>:    cltq
   0x0000000000401159 <+35>:    movq   $0x2a,-0x50(%rbp,%rax,8)

9
10          printf("Hello! Is my stack OK?\n");
=> 0x0000000000401162 <+44>:    lea    0xe9b(%rip),%rdi        # 0x402004
   0x0000000000401169 <+51>:    callq  0x401030 <puts@plt>

11
12          return v[argc+1];
   0x000000000040116e <+56>:    mov    -0x54(%rbp),%eax
   0x0000000000401171 <+59>:    add    $0x1,%eax
   0x0000000000401174 <+62>:    cltq
   0x0000000000401176 <+64>:    mov    -0x50(%rbp,%rax,8),%rax

13      }
   0x000000000040117b <+69>:    mov    -0x8(%rbp),%rdx
   0x000000000040117f <+73>:    sub    %fs:0x28,%rdx
   0x0000000000401188 <+82>:    je     0x40118f <main+89>
   0x000000000040118a <+84>:    callq  0x401040 <__stack_chk_fail@plt>
   0x000000000040118f <+89>:    leaveq
   0x0000000000401190 <+90>:    retq
End of assembler dump.

Sequence of used gdb commands explained:

  1. start: start the program and pause at the beginning of main function.
  2. break: set next stop at instruction after canary store
  3. continue: resume program until it breaks again. It should break at our 0x0000000000401152 address
  4. watch: watch memory changes at specified address
  5. continue: wait when watch triggers for write
  6. disassemble /s: get the disassembly interspersed with source code.

The instruction preceding our current instruction (marked as =>) is our offender:

CODE gdb.asm
v[argc] = 42;
    0x0000000000401159 <+35>:    movq   $0x2a,-0x50(%rbp,%rax,8)
 => 0x0000000000401162 <+44>:    lea    0xe9b(%rip),%rdi        # 0x402004

Thus v[argc] = 42; is our problematic source line (0x2a is 42).

Now you can add more debugging to understand the nature of the overflow.

Links