What are registers?

You can think of registers as variables the CPU uses for faster data access than accessing data from memory. There are only a few registers the CPU has, so where do we store the rest of our variables in our programs? The stack and the heap.

We can see the current state of our registers in GDB from our simple hello world program with a break point set at the main function:

info registers

I am running on 32 bit x86 architecture and what we're focused on, but it's important to know general purpose registers can be broken down further:

List of X86 Registers

General Purpose Registers:

  • EAX (Accumulator Register)

    • AX

      • AH, AL
  • EBX (Base Register)

    • BX

      • BH, BL
  • ECX (Counter Register)

    • CX

      • CH, CL
  • EDX (Data Register)

    • DX

      • DH, DL

Indexes

  • EDI (Destination Index Register)

    • DI
  • ESI (Source Index Register)

    • SI

Pointers

  • EBP (Base Pointer Register)

    • BP
  • ESP (Stack Pointer Register)

    • SP
  • EIP (Index Pointer Register)

    • IP

Segment Registers - only 16 bit values, holds the corresponding segment memory address

  • CS (Code Segment)

  • DS (Data Segment)

  • ES, FS, GS (Extra Segments)

  • SS (Stack Segment)

EFLAGS register - holds the state of the processor, used for comparing parameters, conditional looping and conditional jumps

  • CF (Carry Flag)

  • PF (Parity Flag)

  • AF (Auxiliary Carry Flag)

  • ZF (Zero Flag)

  • SF (Sign Flag)

  • TF (Trap Flag)

  • IF (Interrupt Enable Flag)

  • DF (Direction Flag)

  • OF (Overflow Flag)

  • IOPL (I/O Privilege Level Flag)

  • NT (Nested Task Flag)

  • RF (Resume Flag)

  • VM (Virtual 8086 Mode Flag)

  • AC (Alignment Check Flag)

  • VIF (Virtual Interrupt Flag)

  • VIP (Virtual Interrupt Pending Flag)

  • ID (ID Flag)

What are OP Codes?

disassemble /r main

OP codes are the translation of assembly language to machine language that the CPU understands.

55 is the op code for "push ebp"

Comparing C to Assembly

Our assembly starts by pushing the base pointer (ebp) onto the stack

Then moves (copies) the stack pointer (esp) into the base pointer (ebp), we can see esp and ebp hold the same value by looking at the state of our registers:

  • and esp, 0xfffffff0

and is boolean bitwise operation, where the result is 1 ONLY if both bits being compared are 1. So this is setting the stack alignment.

Back in GDB we can move forward in the program by a single instruction by typing:

si

and if we disassemble main, we can see we moved to the next instruction located at +6 offset from the base instruction

We can take a look at the state of our registers again:

info registers

We can see esp has changed, now lets move to the next instruction

si
  • sub esp, 0x10

Now we're subracting 16 (0x10 in hex) from the stack pointer (esp) to create 16 bytes of local variable space. We subtract because Stack grows from High memory addresses towards lower memory addresses. The heap grows from low memory addresses towards higher memory addresses.

let's take a look at the state of our esp register after subtracting:

info registers

then we call __main function

  • mov DWORD PTR [esp], 0x405064 - here we are copying the value stored at the memory address 0x405064 into esp

we can see the value at 0x405064:

x/s 0x405064

The value is the "Hello, World!" string.

DWORD is a 32 bit value, we are storing the string "Hello, World!" into a 32 bit memory address located where esp is pointing to.

  • Call printf function - functions get their own stack frame added to the stack

  • mov eax, 0x0 - moving 0 into eax (this is the 0 in the return 0 in C code)

  • leave

  • ret (return from the function)

  • nop = no operation (do nothing) - remember the op code is 90

  • xchg ax,ax (this does the same as nop) - op code is 66 90, this is just one example of weird "optimizations" compilers do on their own