Julius Plenz – Blog

Details on CVE-2012-5468

In mid-2010 I found a heap corruption in Bogofilter which lead to the Security Advisory 2010-01, CVE-2010-2494 and a new release. – Some weeks ago I found another similar bug, so there’s a new Bogofilter release since yesterday, thanks to the maintainers. (Neither of the bugs have much potential for exploitation, for different reasons.)

I want to shed some light on the details about the new CVE-2012-5468 here: It’s a very subtle bug that rises from the error handling of the character set conversion library iconv.

The Bogofilter Security Advisory 2012-01 contains no real information about the source of the heap corruption. The full description in the advisory is this:

Julius Plenz figured out that bogofilter's/bogolexer's base64 could overwrite heap memory in the character set conversion in certain pathological cases of invalid base64 code that decodes to incomplete multibyte characters.

The problematic code doesn’t look problematic on first glance. Neither on second glance. Take a look yourself. The version here is redacted for brevity: Convert from inbuf to outbuf, handling possible iconv-failures.

count = iconv(xd, (ICONV_CONST char **)&inbuf, &inbytesleft, &outbuf, &outbytesleft);

if (count == (size_t)(-1)) {
    int err = errno;
    switch (err) {
    case EILSEQ: /* invalid multibyte sequence */
    case EINVAL: /* incomplete multibyte sequence */
        if (!replace_nonascii_characters)
            *outbuf = *inbuf;
        else
            *outbuf = '?';

        /* update counts and pointers */
        inbytesleft -= 1;
        outbytesleft -= 1;
        inbuf += 1;
        outbuf += 1;
        break;

    case E2BIG: /* output buffer has no more room */
                /* TODO: Provide proper handling of E2BIG */
        done = true;
        break;

    default:
        break;
    }
}

The iconv API is simple and straightforward: You pass a handle (which among other things contains the source and destination character set; it is called xd here), and two buffers and modifiable integers for the input and output, respectively. (Usually, when transcoding, the function reads one symbol from the source, converts it to another character set, and then “drains” the input buffer by decreasing inbytesleft by the number of bytes that made up the source symbol. Then, the output lenght is checked, and if the target symbol fits, it is appended and the outbytesleft integer is decreased by how much space the symbol used.)

The API function returns -1 in case of an error. The Bogofilter code contains a copy&paste of the error cases from the iconv(3) man page. If you read the libiconv source carefully, you’ll find that …

/* Case 2: not enough bytes available to detect anything */
errno = EINVAL;

comes before

/* Case 4: k bytes read, making up a wide character */
if (outleft == 0) {
    cd->istate = last_istate;
    errno = E2BIG;
    ...
}

So the “certain pathological cases” the SA talks about are met if a substantially large chunk of data makes iconv return -1, because this chunk just happens to end in an invalid multibyte sequence.

But at that point you have no guarantee from the library that your output buffer can take any more bytes. Appending that character or a ? sign causes an out-ouf-bounds write. (This is really subtle. I don’t blame anyone for not noticing this, although sanity checks – if need be via assert(outbytesleft > 0) – are always in order when you do complicated modify-string-on-copy stuff.) Additionally, outbytesleft will be decreased to -1 and thus even an outbytesleft == 0 will return false.

Once you know this, the fix is trivial. And if you dig deep enough in their SVN, there’s my original test to reproduce this.

How do you find bugs like this? – Not without an example message that makes Bogofilter crash reproducibly. In this case it was real mail with a big PDF file attachment sent via my university's mail server. Because Bogofilter would repeatedly crash trying to parse the message, at some point a Nagios check alerted us that one mail in the queue was delayed for more than an hour. So we made a copy of it to examine the bug more closely. A little Valgrinding later, and you know where to start your search for the out-of-bounds write.

posted 2012-12-05 tagged linux, c, security, spam and university