Julius Plenz – Blog

Details on CVE-2012-5468

In mid-2010 I found a heap corruption in Bogofilter which lead to the Security Advisory 2010-01, CVE-2010-2494 and a new release. – Some weeks ago I found another similar bug, so there’s a new Bogofilter release since yesterday, thanks to the maintainers. (Neither of the bugs have much potential for exploitation, for different reasons.)

I want to shed some light on the details about the new CVE-2012-5468 here: It’s a very subtle bug that rises from the error handling of the character set conversion library iconv.

The Bogofilter Security Advisory 2012-01 contains no real information about the source of the heap corruption. The full description in the advisory is this:

Julius Plenz figured out that bogofilter's/bogolexer's base64 could overwrite heap memory in the character set conversion in certain pathological cases of invalid base64 code that decodes to incomplete multibyte characters.

The problematic code doesn’t look problematic on first glance. Neither on second glance. Take a look yourself. The version here is redacted for brevity: Convert from inbuf to outbuf, handling possible iconv-failures.

count = iconv(xd, (ICONV_CONST char **)&inbuf, &inbytesleft, &outbuf, &outbytesleft);

if (count == (size_t)(-1)) {
    int err = errno;
    switch (err) {
    case EILSEQ: /* invalid multibyte sequence */
    case EINVAL: /* incomplete multibyte sequence */
        if (!replace_nonascii_characters)
            *outbuf = *inbuf;
            *outbuf = '?';

        /* update counts and pointers */
        inbytesleft -= 1;
        outbytesleft -= 1;
        inbuf += 1;
        outbuf += 1;

    case E2BIG: /* output buffer has no more room */
                /* TODO: Provide proper handling of E2BIG */
        done = true;


The iconv API is simple and straightforward: You pass a handle (which among other things contains the source and destination character set; it is called xd here), and two buffers and modifiable integers for the input and output, respectively. (Usually, when transcoding, the function reads one symbol from the source, converts it to another character set, and then “drains” the input buffer by decreasing inbytesleft by the number of bytes that made up the source symbol. Then, the output lenght is checked, and if the target symbol fits, it is appended and the outbytesleft integer is decreased by how much space the symbol used.)

The API function returns -1 in case of an error. The Bogofilter code contains a copy&paste of the error cases from the iconv(3) man page. If you read the libiconv source carefully, you’ll find that …

/* Case 2: not enough bytes available to detect anything */
errno = EINVAL;

comes before

/* Case 4: k bytes read, making up a wide character */
if (outleft == 0) {
    cd->istate = last_istate;
    errno = E2BIG;

So the “certain pathological cases” the SA talks about are met if a substantially large chunk of data makes iconv return -1, because this chunk just happens to end in an invalid multibyte sequence.

But at that point you have no guarantee from the library that your output buffer can take any more bytes. Appending that character or a ? sign causes an out-ouf-bounds write. (This is really subtle. I don’t blame anyone for not noticing this, although sanity checks – if need be via assert(outbytesleft > 0) – are always in order when you do complicated modify-string-on-copy stuff.) Additionally, outbytesleft will be decreased to -1 and thus even an outbytesleft == 0 will return false.

Once you know this, the fix is trivial. And if you dig deep enough in their SVN, there’s my original test to reproduce this.

How do you find bugs like this? – Not without an example message that makes Bogofilter crash reproducibly. In this case it was real mail with a big PDF file attachment sent via my university's mail server. Because Bogofilter would repeatedly crash trying to parse the message, at some point a Nagios check alerted us that one mail in the queue was delayed for more than an hour. So we made a copy of it to examine the bug more closely. A little Valgrinding later, and you know where to start your search for the out-of-bounds write.

posted 2012-12-05 tagged linux, c, security, spam and university

Find the Spammer

A week ago our server was listed as sending out spam by the CBL, which is part of the XBL which in turn is part of the widely-used Spamhaus ZEN block list. As a practical result, we couldn't send out mail to GMX or Hotmail any more:

<someone@gmx.de>: host mx0.gmx.net[] said:
550-5.7.1 {mx048} The IP address of the server you are using to connect to GMX is listed in
550-5.7.1 the XBL Blocking List (CBL + NJABL). 550-5.7.1 For additional information, please visit
550-5.7.1 http://www.spamhaus.org/query/bl?ip= and
550 5.7.1 ( http://portal.gmx.net/serverrules ) (in reply to RCPT TO command)

The first source we identified was a postfix alias forwarding to a virtual alias domain; however, I had deleted the user in the latter table, such that postfix would return a "user unknown in virtual alias table" error to the sender. But because the sender was localhost, postfix would create a bounce mail. (This is known as Backscatter.)

But one day later, our IP was listed in CBL again. So I started digging deeper. How do you identify who is sending out spam? There are some obvious points to start:

To get a clearer image of what was really happening, I did two things. First, I implemented a very simple "who is doing SMTP" log mechanism using iptables. It went like this:

$ cut -d: -f1 /etc/passwd | while read user; do
    echo iptables -A POSTROUTING -p tcp --dport 25 -m owner --uid-owner $user -j LOG --log-prefix \"$user tried SMTP: \" --log-level 6;
iptables -A POSTROUTING -p tcp --dport 25 -m owner --uid-owner root -j LOG --log-prefix "root tried SMTP: " --log-level 6
iptables -A POSTROUTING -p tcp --dport 25 -m owner --uid-owner feh -j LOG --log-prefix "feh tried SMTP: " --log-level 6

(To be honest I used a Vim macro to make the list of rules, but that's hard to write down in a blog post.)

Second, I NAT'ed all users except for postfix to a different IP address:

$ iptables -A POSTROUTING -p tcp --dport 25 -m owner ! --uid-owner
    postfix -j SNAT --to-source

Then, I dumped the SMTP-related TCP flows for that IP address:

$ tcpflow -c 'host and (dst port 25 or src port 25)'

I waited for a short time, and soon another wave of spam was sent out. Now I could clearly identify the user:

Jul 19 16:48:35 noam kernel: [5590933.619960] pete tried SMTP: IN= OUT=eth0 SRC= DST= ...
Jul 19 16:48:38 noam kernel: [5590936.616860] pete tried SMTP: IN= OUT=eth0 SRC= DST= ...
Jul 19 16:48:44 noam kernel: [5590942.615608] pete tried SMTP: IN= OUT=eth0 SRC= DST= ...

But instead of finding an infected web app, I found that the user was logged in via SSH and was executing sleep 3600 commands. When I killed the SSH session, the spamming stopped immediately.

Since this was not a user I know personally, I don't know what happened. My best guess is an infected Windows computer and an SSH SOCKS forwarding setup that allowed the (romanian) spammer to tunnel its connections.

One question remains: Are modern spam-drones able to steal WinSCP/PuTTY login credentials from the Registry and use them to silently set up SSH tunnels? Or was this just a case of bad luck?

posted 2012-07-21 tagged linux, iptables and spam

spam ratio

How much spam do you get? – I have a feeling I get more than I deserver. So I checked my spm ratio, i.e. the number of (recognized) spam mails divided my the number of total mails (as far as the procmail log spans, which is some 270,000 mails).

$ echo $(fgrep -c 'Subject: *****SPAM*****' Mail/from) / \
    $(fgrep -c 'Subject: ' Mail/from) | bc

Now this is gross. Only less than five percent of the mail I receive might be of interest to me. :-(

posted 2011-10-13 tagged spam