In mid-2010 I found a heap corruption in Bogofilter which lead to the Security Advisory 2010-01, CVE-2010-2494 and a new release. – Some weeks ago I found another similar bug, so there’s a new Bogofilter release since yesterday, thanks to the maintainers. (Neither of the bugs have much potential for exploitation, for different reasons.)
I want to shed some light on the details about the new CVE-2012-5468 here: It’s a very subtle bug that rises from the error handling of the character set conversion library iconv.
The Bogofilter Security Advisory 2012-01 contains no real information about the source of the heap corruption. The full description in the advisory is this:
Julius Plenz figured out that bogofilter's/bogolexer's base64 could overwrite heap memory in the character set conversion in certain pathological cases of invalid base64 code that decodes to incomplete multibyte characters.
The problematic code doesn’t look problematic on first glance. Neither on
second glance. Take a look yourself.
The version here is redacted for brevity: Convert from inbuf
to
outbuf
, handling possible iconv-failures.
count = iconv(xd, (ICONV_CONST char **)&inbuf, &inbytesleft, &outbuf, &outbytesleft);
if (count == (size_t)(-1)) {
int err = errno;
switch (err) {
case EILSEQ: /* invalid multibyte sequence */
case EINVAL: /* incomplete multibyte sequence */
if (!replace_nonascii_characters)
*outbuf = *inbuf;
else
*outbuf = '?';
/* update counts and pointers */
inbytesleft -= 1;
outbytesleft -= 1;
inbuf += 1;
outbuf += 1;
break;
case E2BIG: /* output buffer has no more room */
/* TODO: Provide proper handling of E2BIG */
done = true;
break;
default:
break;
}
}
The iconv
API is simple and straightforward: You pass a handle
(which among other things contains the source and destination
character set; it is called xd
here), and two buffers and modifiable
integers for the input and output, respectively. (Usually, when
transcoding, the function reads one symbol from the source, converts
it to another character set, and then “drains” the input buffer by
decreasing inbytesleft
by the number of bytes that made up the
source symbol. Then, the output lenght is checked, and if the target
symbol fits, it is appended and the outbytesleft
integer is
decreased by how much space the symbol used.)
The API function returns -1
in case of an error.
The Bogofilter code contains a copy&paste of the error cases from the iconv(3)
man page. If you read the libiconv
source
carefully,
you’ll find that …
/* Case 2: not enough bytes available to detect anything */
errno = EINVAL;
comes before
/* Case 4: k bytes read, making up a wide character */
if (outleft == 0) {
cd->istate = last_istate;
errno = E2BIG;
...
}
So the “certain pathological cases” the SA talks about are met if a
substantially large chunk of data makes iconv
return -1, because
this chunk just happens to end in an invalid multibyte sequence.
But at that point you have no guarantee from the library that your
output buffer can take any more bytes. Appending that character or a
?
sign causes an out-ouf-bounds write. (This is really subtle. I
don’t blame anyone for not noticing this, although sanity checks – if
need be via assert(outbytesleft > 0)
– are always in order when
you do complicated modify-string-on-copy stuff.) Additionally,
outbytesleft
will be decreased to -1 and thus even an
outbytesleft == 0
will return false.
Once you know this, the fix is trivial. And if you dig deep enough in their SVN, there’s my original test to reproduce this.
How do you find bugs like this? – Not without an example message that makes Bogofilter crash reproducibly. In this case it was real mail with a big PDF file attachment sent via my university's mail server. Because Bogofilter would repeatedly crash trying to parse the message, at some point a Nagios check alerted us that one mail in the queue was delayed for more than an hour. So we made a copy of it to examine the bug more closely. A little Valgrinding later, and you know where to start your search for the out-of-bounds write.