GDB – Debugging stripped binaries

GDB - Debugging stripped binaries

Posted by Félix on 2012-08-13

A few days ago I had a discussion with a colleague on how to debug a stripped binary on linux with GDB.
Yesterday I also read an article from an ex-colleague at EPITA on debugging with the dmesg command.
I therefore decided to write my own article, here I will demonstrate how to use GDB with a stripped binary.

Test program

First of all, here is the small C program we will be working on:

#include <stdio.h>
 
 __attribute__ ((noinline)) void fun(int test)
{
  printf("value: %d\n", test);
}
 
int main()
{
  int v = 21;
  fun(v);
}

You can notice we have used a GCC attribute to prevent the compiler from inlining the function.

Symbols

GCC and symbols

When compiling a program, GCC (for example) adds symbols to the binary to help the developer during debugging. There are several types of symbols but the goal of this article is not to explain them.

Contrarily to popular beliefs, GCC does write symbols to an object file even in release mode (with the -O3 switch). That's why even with a release binary, you can do this:

$ gcc -O3 -m32 test.c
$ gdb a.out
[...]
Reading symbols from /home/felix/test/a.out...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x8048453

Listing symbols

We can use the nm command to list all symbols in our binary:

$ nm a.out
08049f28 d _DYNAMIC
08049ff4 d _GLOBAL_OFFSET_TABLE_
0804853c R _IO_stdin_used
w _Jv_RegisterClasses
[...]
08048470 T __libc_csu_init
U __libc_start_main@@GLIBC_2.0
U __printf_chk@@GLIBC_2.3.4
[...]
080483e0 t frame_dummy
08048410 T fun
08048440 T main

Symbols and debugging

With the previous information, GDB knows the bounds of all functions so we can ask the assembly code for any of them. For example thanks to nm we know there is a fun function in the code, we can ask its assembly code:

(gdb) disas fun
Dump of assembler code for function fun:
0x08048410 <+0>: push %ebp
0x08048411 <+1>: mov %esp,%ebp
0x08048413 <+3>: sub $0x18,%esp
0x08048416 <+6>: mov 0x8(%ebp),%eax
0x08048419 <+9>: movl $0x8048530,0x4(%esp)
0x08048421 <+17>: movl $0x1,(%esp)
0x08048428 <+24>: mov %eax,0x8(%esp)
0x0804842c <+28>: call 0x8048340 <__printf_chk@plt>
0x08048431 <+33>: leave
0x08048432 <+34>: ret

Discarding Symbols

Symbols can be removed from the binary using the strip command:

$ gcc -O3 -m32 test.c
$ strip -s a.out
$ nm a.out
nm: a.out: no symbols

Why stripping you may ask ? Well, the resulting binary is smaller which mean it uses less memory and therefore it probably executes faster. When applying this strategy system-wide, the responsiveness of the system will probably be better. You can check this by yourself: use nm on /bin/*: you won't find any symbols.

The problem

Okay, there are no more symbols now, what does it change when using GDB ?

$ gdb a.out
[...]
(gdb) b main
Function "main" not defined.
(gdb) b fun
Function "fun" not defined.

We cannot add a breakpoint now, even on the main function.

The solution

Locating the entry point

Debugging is still possible, but it is more complicated. First we need the memory address of the entry point:

(gdb) info file
Symbols from "a.out".
Local exec file:
`a.out', file type elf32-i386.
Entry point: 0x8048350

With GDB we can add a breakpoint on a memory address:

(gdb) b *0x8048350
Breakpoint 1 at 0x8048350
(gdb) run
Starting program: a.out

Breakpoint 1, 0x08048350 in ?? ()

Disassembling code

We managed to add a breakpoint on the entry point of our binary (and we reached that breakpoint), but we are still having some troubles with our favorite commands:

(gdb) disas
No function contains program counter for selected frame.
(gdb) step
Cannot find bounds of current function

As GDB does not know the bounds of the functions, it does not know which address range should be disassembled.

Once again, we will need to use a command working at a lower level.
We must use the examine (x) command on the address pointed by the Program Counter register, we ask a dump of the 14 next assembly instructions:

(gdb) x/14i $pc
=> 0x8048350: xor %ebp,%ebp
0x8048352: pop %esi
0x8048353: mov %esp,%ecx
0x8048355: and $0xfffffff0,%esp
0x8048358: push %eax
0x8048359: push %esp
0x804835a: push %edx
0x804835b: push $0x80484e0
0x8048360: push $0x8048470
0x8048365: push %ecx
0x8048366: push %esi
0x8048367: push $0x8048440
0x804836c: call 0x8048330 <__libc_start_main@plt>
0x8048371: hlt

Libc initialization

By looking at the code, you might be asking yourself: "Where the hell are we??"
The C runtime has to do some initialization before calling our own main function, this is handled by the initialization routine __libc_start_main (check its prototype here).

Before calling this routine, arguments are pushed on the stack in reverse order (following the cdecl calling convention). The first argument of __libc_start_main is a pointer to our main function, so we now have the memory address corresponding to our code: 0x8048440. This is what we found with nm earlier!
Let's add a breakpoint on this address, continue and disassemble the code:

(gdb) b *0x8048440
Breakpoint 2 at 0x8048440
(gdb) c
Continuing.

Breakpoint 2, 0x08048440 in ?? ()
(gdb) x/10i $pc
=> 0x8048440: push %ebp
0x8048441: mov %esp,%ebp
0x8048443: and $0xfffffff0,%esp
0x8048446: sub $0x10,%esp
0x8048449: movl $0x15,(%esp)
0x8048450: call 0x8048410
0x8048455: xor %eax,%eax
0x8048457: leave
0x8048458: ret
0x8048459: nop

This looks like our main function, the value 21 (0x15) is placed on the stack and a function (the address corresponds to fun) is called.
Afterwards, the eax register is cleared because our main function returns 0.

Additional commands

To step to the next assembly instruction you can use the stepi command.
You can use print and set directly on registers:

(gdb) print $eax
$1 = 1
(gdb) set $eax = 0x2a
(gdb) print $eax
$2 = 42

You can also dump the value of all registers:

(gdb) info registers
eax 0x2a 42
ecx 0xffffd6e4 -10524
edx 0xffffd674 -10636
ebx 0xf7fb8ff4 -134508556
esp 0xffffd64c 0xffffd64c
ebp 0x0 0x0
esi 0x0 0
edi 0x0 0
eip 0x8048440 0x8048440

That's all for now!

C++

← SSE - Image Processing

SSE - Vectorizing conditional code →

Félix Abecassis

Projects, experiments…