The art of exploiting heap overflow, part 3

Linux Heap

As discussed in the previous part, we know where the heap sits in a process’ memory address space and each process roughly has the same layout.

Keep in mind that, every memory we access in the virtual memory space must be mapped first, otherwise would trigger a segment fault. For this,

vma = find_vma(mm, address);        if (unlikely(!vma)) {                bad_area(regs, error_code, address);                return;        }        if (likely(vma->vm_start <= address))                goto good_area;        if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {                bad_area(regs, error_code, address);                return;        }        if (error_code & PF_USER) {                /*                 * Accessing the stack below %sp is always a bug.                 * The large cushion allows instructions like enter                 * and pusha to work. ("enter $65535, $31" pushes                 * 32 pointers and then decrements %sp by 65535.)                 */                if (unlikely(address + 65536 + 32 * sizeof(unsigned long) < regs->sp)) {                        bad_area(regs, error_code, address);                        return;                }        }        if (unlikely(expand_stack(vma, address))) {                bad_area(regs, error_code, address);                return;        }

So the only part we have to map by ourselves is for heap. And Linux/Unix kernel provides two sets of system calls for us:

Of course, mmap() could map a file into memory space too, it is typically used by dynamic linked libraries. These dynamic linked libraries are usually mapped after the traditional heap (the one right after program data), now the memory area in-between program data and stack is fragmented by these file mappings, it is hard to draw a line between these libraries and the heap area as both grows at run-time, they could mix up in the whole memory address space.

As these are sufficient to manage the memory address space, why do we need the C standard functions malloc()/free()?

So malloc()/free() provides the simplest interface to manage the heap, without even bothering the virtual memory concept, and what’s more important, you don’t even need to remember the size of memory you allocate!

Fortunately all of above are usually managed by glibc. Dynamic linked libraries are loaded by the dynamic linker, ld.so, the rest dynamic memory is manged via the standard libc functions: malloc() and free() which indirectly call the above system calls.

Yes, you can still bypass glibc by calling the system call mmap() directly without messing up the memory space. But unlikely sbrk().

Now, we have to revise the definition of heap, that is the private anonymous memory in-between program data and stack managed by glibc. They are all from mmap() except the (optional) traditional heap which is from sbrk().

This is exactly how glibc intends to distinguish them:

In the next part, we will see how glibc manages these arenas.