Its been some time since I wrote one of these, so I might be a little rusty. Do bear with me.

I wasn’t able to play 0CTF 2022 in full while it was going on but I was helping one of my teammates (@zolutal) who was trying to solve this challenge. I had briefly tried to reverse engineer the binary but didn’t have enough time to analyze it completely.

Step 1: Reversing

The main function doesn’t do much except for that 256 byte long buffer which I thought might be useful in case we needed to fill some constraints for a one-gadget. But I did not end up using that approach.

The do_main function does more interesting stuff by comparison

During my hasty reverse engineering process, I ended up naming the allocation at line 24 as registers even though the program was basically telling me it was called memory. So for the rest of this blog, I’ll refer to this allocated region as memory even though my decompiled code says registers.

The comparison at line 17 stands out pretty well as a possible location for a bug, but I was more interested in reversing the other functions.

Init VM clears all of the data before executing the code. However, since it uses malloc, some of the data on the heap might still be present and available for reuse.

And now we get to the actual VM part.

There’s quite a lot of switch cases here. Each instruction is apparently one byte and the VM follows a stack based model. Which means that for an arithmetic operation, two values on the top of the stack are popped off and the result is pushed back on to the stack.

The program does make sure that the stack does not overflow or underflow.

Line 27 in the above screenshot looked a bit weird to me since it checks if operand1 is lesser than 4 and then adds 4 to it while dereferencing.

However, looking at the memory layout, it makes more sense.

Like I had mentioned earlier, the regs variable that I had named is actually supposed to be called memory.

The VM maintains its own 4 registers each of which is 8 bytes long. These registers are referred to in the decompiled code as internal_array. And so we have

&regs[4+x] == &internal_array[x]

This makes the reverse engineering much more understandable.

The VM even supports jump instructions. Although it does make sure that the program counter does not go beyond the bounds of the code section.

Step 2: The Bug

I wasn’t able to find the bug here, but @zolutal found it.

There are two bugs here that combine together to provide an arbitrary 8-byte write.

In line 156, the variable op2_ is an 8 byte value that we can control. So if it is large enough, it can index memory outside the chunk allocated in the regs variable.

However, in order to do that, it must first pass the check at line 154.

This is where the second bug comes in.

These lines inside the do_main function allow us to set the regs_size variable in the run_vm function to any value up to 0x2000000000000000 (or 1 « 61 )

However, when the allocation happens at the malloc, this value is multiplied with 8.

In [6]: hex(((0x2000000000000000 + 0x10) * 8) & (2**64 - 1))
Out[6]: '0x80'

Therefore, when we request an allocation of size 0x2000000000000000 + 0x10, the multiplication overflows into a value 0x80 which is the size of the buffer that malloc allocates.

Now, in run_vm, the regs_size is 0x2000000000000000 + 0x10 which allows us to write to memory out of the bounds of the regs buffer.

However, in the switch case number 22 in run_vm, we cannot use this idea to get an arbitrary read. This is because the comparison in line 162 will first multiply the regs_size by 8 and then divide the result by 8.

In [10]: hex((((0x2000000000000000 + 0x10) * 8) & (2**64 - 1))// 8)
Out[10]: '0x10'

This means that we cannot use this functionality to get an arbitrary read.

Step 3: The Exploit

Up until now, the only primitives we had was an arbitrary write. Moreover, we can only trigger this arbitrary overwrite in a single VM after which we cannot reuse it again.

With some heap massaging, we can free tcache and smallbin chunks which leave some useful pointers on the heap. But since the VM does not write any content out, we were not able to figure out a way to leak these pointers.

One of the ideas that I had was to perform some heap fengshui to position the code buffer after the memory buffer. If we could do that, we could use the arbitrary overwrite to overwrite some values in the code with the pointers that were on the heap. This would’ve been a crazy solution, but it eventually did not work out.

And we actually did not solve this challenge during the CTF.

Step 4: Yansquad

I forgot about this challenge after some time. But then it showed up in one of the classes that I’m taking here at ASU.

CSE 598 Topic: Emerging Cybersecurity Techniques taken by none other than @zardus involved us playing CTF’s and solving challenges together in class.

In this class, every week, one student is responsible for picking a challenge from a CTF, deconstructing to its most basic concept and creating a simplified version of the challenge. This one week, it was @clasm’s turn to present a challenge and he chose this one.

The simplified version of the challenge provided all the primitives you could ask for in a heap exploitation challenge. And the objective was to learn how to overwrite the tls_dtors_list to get PC control.

I’ll put up the files for this challenge as soon as @clasm lets me know its okay to. The files for this simplified challenge are available here.

But once I had solved the simplified one, I wanted to give another go at the original challenge.

Step 5: The Leak

This time I noticed something that I had missed. A sidechannel for leaking pointers.

This switch case (which I called the nop case) would print a string “what??” when hit.

Now, if I can write the code for the VM to perform some checks on the pointers and jump to a nop instruction if true, I will be able to leak pointers.

Step 6: The Assembly

I really did not want to write bytecode and deal with jump offsets (although maybe it would’ve been faster).

So I ended up writing an assembler for this VM that supports labels and loops. It would replace labels with a nop instruction.

from assembler import *


def run_vm(prog, mem_count, msg):
    assembler = Assembler(prog)
    code = assembler.assemble(debug=False) 
    p.sendlineafter(b"Please input your code size:\n", b"%d" % len(code))
    p.sendlineafter(b"Please input your memory count:\n", b"%d" % mem_count)
    p.sendafter(b"Please input your code:\n", code)
    out = p.recvuntil(b"finish!\n")
    p.sendlineafter(b"continue?\n", msg)
    return out

I then wrote assembly code that increments a counter from zero in a loop until it becomes equal to one byte of a pointer. In each loop, this program will hit a nop which will make the program print the string “what??”. And upon counting the number of these strings printed out, I can get the value of this one byte.

def leak(p, mem_size, num_bytes):
    leak_byte = """
lq r0 0
push r0
li r3 {byte_idx}
push r3
shr
pop r3
push r3
li r1 0xff
push r1
and
pop r1
li r2 0x1
.loop
push r1
push r2
sub
pop r1
push r1
jnz .loop
hlt
    """
    leaked_bytes = []
    for x in range(num_bytes):
        prog = leak_byte.format(byte_idx=x*8)
        output = run_vm(prog, mem_size, b"B"*10).decode("utf-8").strip()
        leaked_byte = output.split("\n").count("what???")
        leaked_bytes = [leaked_byte] + leaked_bytes

    leaked_addr = 0
    for x in leaked_bytes:
        leaked_addr = (leaked_addr << 8) + x

    log.success(f"Leaked value : {hex(leaked_addr)}")
    return leaked_addr

Step 7: The Target

In the version of libc that the challenge uses (libc-2.35), the usual suspects for PC control such as __free_hook and __malloc_hook have been removed.

But then there’s some code that always gets invoked when the program exits. Namely the __run_exit_handlers()

This function calls __call_tls_dtors() which walks a linked list called tls_dtor_list and executes the functions specified by each object’s function pointer.

Since we have an arbitrary overwrite, we can overwrite one entry in the linked list to point to an area we control. And with that, we can force the __call_tls_dtors() function to call a function pointer that we control.

So then my plan for exploit was as follows

  • Heap fengshui to get a heap pointer in the memory buffer
  • Leak pointer to get heap leak
  • Heap fengshui to get a libc pointer in the memory buffer
  • Leak pointer to get libc leak
  • Create fake tls_dtor_list object on heap
  • Overwrite tls_dtor_list with the fake object
  • ????
  • Profit

We do have to take care of a pesky PTR_DEMANGLE macro which is basically a right shift followed by an xor. However, an easy way to fix it is to overwrite the xor key with 0.

And putting it all together:

The final exploit is available here: Exploit

And the assembler is available here: Assembler