In this post, I’d like to talk about the stack and how it works in assembly. We will also examine the stack with gdb. Understanding the stack is crucial for reverse engineering or writing certain types of exploits.
Before reading this, you should already have a basic idea of what processor registers are (at least you should know that you can store data there) and not be afraid of dealing with a few simple assembly instructions. If you want to follow along with the instructions, you should have a Linux system with gcc and gdb ready.
The stack is a simple last in first out (LIFO) structure, meaning what you put last in you get first out. You may visualize it like a stack of blocks in memory:
______ ______ |______| ______ |______| |______| |______| |______| |______| |______| initial *push* *pop*
Values are placed onto the stack via
push and read (and removed) via
pop. So when you see a call to
push eax in assembly, it means that the value currently stored in register
eax is pushed on top of the stack. When you see a
pop eax it means that the value currently on top of the stack is loaded into
eax and it is removed from the stack (so the next call to
pop would read the next value from the stack etc.).
To keep track of the stack, the system uses the base pointer
ebp and the stack pointer
esp points to the top of the stack and
ebp points to its bottom. When you
pop from the stack,
esp is adjusted accordingly. The memory address contained in
esp is increased when you
pop from the stack and decreased when you
push to the stack. This may be surprising, as one would expect the stack to grow towards higher memory (and then the memory address in esp should increase when you put something on the stack and vice versa). But that is not the case.
A small but very important caveat is that the stack grows downwards in memory. So it starts at a high memory address and extends into lower memory regions. Therefore,
ebp will usually be greater (in terms of memory regions) than
Stack top Lower Memory /\ : extends here : ______ esp ---> |______| 0x080483f0 |______| 0x080483f4 ebp ---> |______| 0x080483f8 Stack bottom Higher Memory
We can observe the stack when we analyze our first small C program:
This program does nothing more than declaring three local variables - variables we will observe in memory and on the stack soon. We save this program in a file called
stack1.c and compile it with
gcc -o stack1 -m32 stack1.c
-o option specifies the output file and
-m32 produces a 32-bit binary, which is necessary when you work on a 64-bit system and want to observe typical 32-bit assembly. Some things are different in 64-bit but we might get to that later.
The compilation leaves us with a file called
stack1. To decompile it, you can use a disassembler of your choice, like objdump, radare2 or Hopper, or we simply use the disassembly capabilities of gdb.
Call your program with
to fire up gdb:
$ gdb stack1 GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from stack1...(no debugging symbols found)...done.
Then we set the type of disassembly code produced to Intel syntax:
(gdb) set disassembly intel
and show the gdb disassembly of the main function:
(gdb) disas main Dump of assembler code for function main: 0x080483ed <+0>: push ebp 0x080483ee <+1>: mov ebp,esp 0x080483f0 <+3>: sub esp,0x10 0x080483f3 <+6>: mov DWORD PTR [ebp-0xc],0xa 0x080483fa <+13>: mov DWORD PTR [ebp-0x8],0x5 0x08048401 <+20>: mov DWORD PTR [ebp-0x4],0x2 0x08048408 <+27>: mov eax,0x0 0x0804840d <+32>: leave 0x0804840e <+33>: ret
When you decompile the whole file, e.g. with
objdump, you will notice a lot of other stuff which has to do with the ELF file structure but that is not important here.
In the left column you see a bunch of addresses, followed by their offset from the the starting point of the function (
<+X>) followed by assembly translations of the bytes stored in these addresses. Since it would be hard to read the machine code, which the bytes represent, directly, a disassembler provides us with a human readable translation of this machine code.
If we look at line <+0> to <+3> we see a typical function prologue. It saves
ebp by pushing it onto the stack (
ebp is usually 0 here) and sets
ebp to the value currently stored in
esp with the
mov instruction. Then it subtracts some value from esp to create a new stack frame for our function. (You can think of a stack frame as a space within memory that is reserved for our function’s stack.) So by the time this prologue has finished, we have pushed the old value in
ebp on the stack and set
ebp to the bottom of our new stack, while
esp points to the top of our stack. And we also have reserved some space for our local variables by subtracting from
Lets have a closer look at <+3>, the last line of the function prologue. We see, that there are 16 bytes (0x10 in hex = 16 in decimal) subtracted from the current stack pointer address, so our stack grows by 16 bytes to make room for our local variables. Why 16 bytes? An integer is, at least in a typical Intel 32-bit system, 4 bytes, so we need 12 bytes for our 3 local variables. Why does the compiler reserve 16 bytes? The other 4 bytes are usually reserved to align the stack to a multiple of 16 bytes, which is easier to process. So if we would declare 4 local integer variables, the stack would still have a length of 16 bytes. But if we declared 5 integer variables, which means we needed 20 bytes, our stack would get a length of 32 (= 2*16) bytes. Try it out!
In lines <+6> to <+20> the local variables are written on to the stack. This is done by moving them to locations that are determined by their offset from
ebp, our base pointer. So we place 10 in the memory location at
ebp - 0xc, 5 at
ebp-0x8 and 2 at
ebp - 0x4.
In the end, our stack looks like this (imagine every block to have a height of 4 bytes):
[top] _____________ <---esp | ? | (ebp - 0x10) |_____________| | | (ebp - 0xC) |_____10______| | | (ebp - 0x8) |______5______| | | (ebp - 0x4) |______2______| [bottom] <---ebp
As you notice, the variables on the stack are referenced by using the base register
ebp. Why doesn’t the compiler use
esp to reference the variables? Since
esp always points to the top of the stack and might change when the stack grows, it is easier to use the fixed
ebp, so you will always find you variables at the same relative offset from the bottom of the stack where
ebp points to (instead of recalculating offsets from its top where
esp points to). [However, when you start dealing with compiler optimizations, this might not always be true as there are such things as frame pointer omission. If you want to read more about esp and ebp, have a look here.]
Another thing to note is the order in which the variables are placed on the stack. We declared a = 10, b = 5 and c = 2. As you notice, c is now at the bottom of the stack, followed by b and c, where c was declared last. The variables are placed on the stack in reverse order. That makes sense, since if we started to take them from the stack via a pop instruction, we would get them in the right order (remember that the stack is a LIFO structure, so when you access it, you get the value on the top first).
Finally, we put the return value 0 in the
eax register <+27> and leave the function with the function epilogue in <+32> and <+33>.
So much for the theory. Let’s now see how all of this looks like when we use gdb to examine our program.
If you haven’t done so, call your program with
We start at the very beginning of the main function and set our breakpoint to the first line:
(gdb) break *main Haltepunkt 1 at 0x80483ed
Then we start the execution with the command
(gdb) r Starting program: /home/michael/Entwicklung/Disassemble/stack1 Breakpoint 1, 0x080483ed in main ()
As you can see the compiler tells us the memory address in the instruction pointer so we now where we are in the program’s execution (- have a look at the assembly above, you will find the address there). Note that gdb always stops before the instruction is executed, so
*main+0 hasn’t been executed and we haven’t pushed
We can start to inspect registers and memory. Let’s see what’s currently in
(gdb) info registers eax 0x1 1 ecx 0xc0c8ddda -1060577830 edx 0xffffcd54 -12972 ebx 0xf7fac000 -134561792 esp 0xffffcd2c 0xffffcd2c ebp 0x0 0x0 esi 0x0 0 edi 0x0 0 eip 0x80483ed 0x80483ed <main> eflags 0x246 [ PF ZF IF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99
info registers shows us the content of all registers. We could also abbreviate it with
i r or display the content of a specific register with
i r [register]
So we see that
ebp is 0. (This is due to the Application Binary Interface and Intels Binary Compatibility Standards - you can find more info here).
eip points to the instruction we are about to execute.
esp contains the address 0xffffcd2c, an address, that our application inherited from the OS. We carry on to the next instruction and inspect the registers again:
(gdb) nexti 0x080483ee in main () (gdb) i r [...] esp 0xffffcd28 0xffffcd28 ebp 0x0 0x0 [...] eip 0x80483ee 0x80483ee <main+1> [...]
We have now pushed ebp, which is 0, on the stack. This decreased
esp by 4 bytes, as in hex ffffcd2c - ffffcd28 = 4. (When something is pushed on or popped off the stack, esp will always decrease or increase accordingly, so it always points at the top of the stack.)
To make sure, we would expect the memory content on ffffcd28 (reading 4 bytes towards higher memory region) to be 0. Let’s verify:
(gdb) x/wx 0xffffcd28 0xffffcd28: 0x00000000
x/wx is equivalent to
x/1wx and tells gdb to read 1 Word (which is 4 bytes in the 32-bit architecture) from the memory location given as argument. The first x is for “examine”, examining the memory. The number determines the amount, the next character ‘x’ says in what chunks we want the memory displayed (e.g.
4b would mean 4 bytes,
4w = 4 words etc.) and the last character tells gdb that we want the memory displayed in hexadecimal values (we could choose other formats like ascii or machine instructions here). Have a look in the gdb manual to get more familiar with this syntax.
As expected, we find the memory at the location to which we pushed ebp to be 0. No surprise here. Let us continue.
(gdb) nexti 0x080483f0 in main () (gdb) i r […] esp 0xffffcd28 0xffffcd28 ebp 0xffffcd28 0xffffcd28 […]
moved the value of
ebp and they now both contain the same value and therefore point to the same memory address. On execution of the next instruction, we create our function specific stack frame:
(gdb) nexti 0x080483f3 in main () (gdb) i r [...] esp 0xffffcd18 0xffffcd18 ebp 0xffffcd28 0xffffcd28 [...]
sub esp,0x10, which means we decreased
esp by 16 bytes and thereby reserved 16 bytes for our stack frame.
esp now points to the top of our stack and
ebp to its bottom, while the top lies in a lower memory address than the bottom. So our stack just grew downwards in memory as explained before.
In the next three steps we can see how our variables are placed on the stack. First, let’s have a look at the initial memory values on the stack:
(gdb) x/4wx 0xffffcd18 0xffffcd18: 0x0804841b 0xf7fac000 0x08048410 0x00000000
Then we continue to place our variables there:
(gdb) nexti 0x080483fa in main () (gdb) x/4wx 0xffffcd18 0xffffcd18: 0x0804841b 0x0000000a 0x08048410 0x00000000 (gdb) nexti 0x08048401 in main () (gdb) x/4wx 0xffffcd18 0xffffcd18: 0x0804841b 0x0000000a 0x00000005 0x00000000 (gdb) nexti 0x08048408 in main () (gdb) x/4wx 0xffffcd18 0xffffcd18: 0x0804841b 0x0000000a 0x00000005 0x00000002
As you can see, the stack gets filled from bottom to top: First, we write the value 0xa, overwriting 0xf7fac000, then we write 0x5, overwriting 0x08048410, and finally we write 0x2 overwriting 0x00000000. So here are our local variables placed on the stack and the leftover value 0x0804841b is still at its bottom. If we examine our registers again, we notice that neither
ebp did change as we worked with the stack:
(gdb) i r [...] esp 0xffffcd18 0xffffcd18 ebp 0xffffcd28 0xffffcd28 [...]
This is the end of this small tutorial. We saw that our stack grew downwards and also the values were written from the bottom to the top. We also observed how the stack was created in the beginning and examined the memory associated with it. I hope this little tutorial helped you to get a better grasp of this important memory concept.
Contact me on Twitter if you found a mistake or this tutorial helpful.