Julius Plenz – Blog

minimizing Linux filesystem cache effects

Last weekend I toyed around a bit and tried to write a shared object library that can be used via LD_PRELOAD to minimize the effect a program has on the Linux filesystem cache.

Basically the use case is that you have a productive system running, and you don't want your backup script to fill the filesystem cache with mostly useless information at night (files that were cached should stay cached). I didn't test whether this brings measurable improvements yet.

The coding was really fun and provided me with yet another insight how the simple concept of file descriptors in UNIX is just great. (GNU software is tough, though: I got stuck once, and found help on Stackoverflow, which I had never used before.)

posted 2012-02-09 tagged linux, c and hack

shredding

I'm currently shredding my old X41's hard drive, because I want to sell it (if you are interested, contact me). I'm overwriting it with zeros, ten passes:

$ shred -vfz -n 10 /dev/sda

Luckily, the disk was fully encrypted all the time. So it's just a precaution.

posted 2012-01-30 tagged x41 and linux

Ten Years of Vim

About ten years ago, I began using Vim. Since about eight years ago, I have been using Vim for every email, every piece of code, literally every text I write. Today, I want to write a short text about how I came to use Vim and what I like about it.

I don't really remember when I first used Vim. It must have been around the time when I was programming PHP a lot. I had access to a "real" computer at home – running Windows XP – in 2002 for the first time; before that, I could only use older Macintoshs. It's typical for first-time Vi users to stumble into believing – by hear-say, I guess – that it is indeed a really superior editor, until they try it out the first time and can't even save, because they don't know how to. That were my first experiences too, probably.

Anyhow, at some point in time I ditched PHP Zend Studio for SciTE. Later, I got to know Vim (i.e., by reading a tutorial about it and actually understanding it) and was instantly hooked. Probably, the guys over at #html.de talked me into it. Ironically, I used Vim before I ever used a UNIX-like operating system.

In my Vim learning curve, I identify seven important advances:

  1. Understanding the Modes Concept. – This, of course, is something everybody needs to grok. It's fairly straight-forward, once you think about it.
  2. Understand the Visual Mode and Yank/Paste. – Line-wise selection already gives you more power than a regular editor when moving code.
  3. Understand Mappings and Macros. – Even today I am amazed how few people automate things. If it's one line, do it manually. If it's three lines, carefully think about the task while recording a macro for it!
  4. Unterstanding Windows. – Multiple files and stuff.
  5. Consequently using [h], [j], [k], [l]. – This actually was a much bigger step that you might think. I went to great lengths to achive this: I configured the arrow mapping to :echoerr a message. Today, I configure all programs to use Vim key bindings, especially for horizontal and vertical navigation. It's the first thing to do. I only use the arrow keys for Mplayer seeking.
  6. Using Text Objects. – See :help text-objects, if you don't know about them.
  7. Switching to a US keyboard layout. – Once you do this, all the Vim commands begin to make sense. (I used a German layout before.)

Steps 1–5 happened in the first two years. The text object only came with more recent Vim development, and I'm not quite sure when I adopted them. Learning the US layout was around 2006, maybe.

When I switched to using Debian in 2004, using Vim for all tasks already felt natural. Of course, at that point I finally came to understand Vim not merely as a text editor, but as a philosophy. And that is what fascinates me to this day: The Vi way of editing text is much more than a set of clever key bindings. It's a language.

Vi-vs.-Emacs fight In a way, I'm really professional at using Vim. If I think of the tasks I do, I suspect there are very few superfluous keys I press during editing. I have acquired a really good intuition of how to skip to a particular line, to a particular function parameter or a certain word in a sentence. (I use [H], [M], [L] for global on-screen navigation a lot, and I heavily use the [f], [t], [F] and [T] jump commands.) Just as you don't actually think about the letters you type when you become a good typist, I don't think about what command keys I press in Normal mode. I just press them, and the cursor magically moves around to where my eyes rest. This is good.

On the other hand, I am just using core Vim features, most of which are already found in original Vi implementations. My really conservative .vimrc change history shows that I pretty much settled my editing habits. – But: I have never used a third-party plugin before. Strange as it may sound, I never felt the urge to do so. Command-T certainly looks like it could be of use; however, I usually start a new Vim instance and go with the Z Shell completion, which I suspect to be superior in more than one way, to find the file(s). – Thus I must acknowledge that there might be vast possibilities yet do discover. (Oh, and while confessing, there's another big one: I have never used Emacs. All I know about it is hear-say.)

For keyboard enthusiasts, there are two quirks with Vim: It mainly relies on Escape for mode switches, and the keys for many combinations are aligned for QWERTY layouts. There's just no way around it: while [c] and [d] are mnemonic for cut and delete, [h], [j], [k], [l] simply aren't. There's no way justify their use when switching to Dvorak, and that's why I didn't (switch). I also once tried mapping [j][j] to Escape, or using the Caps Lock key as Escape replacement; I can't really stick to using it. (I also stick to calling vim on the command line instead of a shorter alias. It is the fourth most command I type, after sudo, git, and man.)

For me, text editing is equal to using Vim. I feel like a four-year old moving a mouse when I'm forced to use another editor on other people's computers. And because text editing is really clumsy with regular text editors, I no longer wonder why people don't really bother to correct errors: the effort is just not worth it.

If I had to sum up the difference between Vim and other editors in one sentence, it is this: While other editors are great for creating text, Vim is also great at manipulating text. And text manipulation, for most programmers and authors, is what it's all about.

:wq

posted 2012-01-30 tagged linux and vim

vlock and suspend to ram

I've had weird race conditions when using vlock together with s2ram. It appears suspend to ram wants to switch VTs, while vlock hooks into the switch requests and explicitly disables them. So some of the time, the machine would not suspend, while at other times, vlock wouldn't be able to acquire the VT.

To solve this, I wrote a simple vlock plugin, which simply clears the lock mechanism, writes mem to /sys/power/state and later reinstates the locking mechanism. This plugin is called after all and new. Thus, the screen will be locked properly before suspending.

Here's my suspend.c:

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

/* Include this header file to make sure the types of the dependencies
 * and hooks are correct. */
#include "vlock_plugin.h"
#include "../src/console_switch.h"

const char *succeeds[] = { "all", "new", NULL };
const char *depends[] =  { "all", "new", NULL };

bool vlock_start(void __attribute__ ((__unused__)) **ctx_ptr)
{
    int fd;

    unlock_console_switch();

    if((fd = open("/sys/power/state", O_WRONLY)) != -1) {
        if(write(fd, "mem", 3) == -1)
            perror("suspend: write");
        close(fd);
    }

    lock_console_switch();

    return true;
}

Simply paste it to the vlock modules folder, make suspend.so and copy it to /usr/lib/vlock/modules. I now invoke it like this:

env VLOCK_PLUGINS="all new suspend" vlock

posted 2012-01-20 tagged linux and c

Xorg, really?!

Are you fucking kidding me? You reintroduce broken behaviour that possibly has devastating security consequences and and make it the default?! Yeah I agree the "usual" X server locking approach is not the best way to do it – but to knowingly smash the security of people's computers on a grand scale... that's priceless.

(My locking solution is env USER=feh vlock -a -n, again.)

Update: Why it happened

posted 2012-01-20 tagged linux and rant

zsh: complete words from tmux pane

Today I wrote a rather cool Z-Shell completion function: It will present all words that are found in the current tmux pane in a zsh completion menu. That means you can actually complete words from the output of commands that you just executed. (In a way it's a little bit like the keeper function, without the overhead of remembering to call keeper in the first place.)

The code below defines two keybindings:

Here's the code:

_tmux_pane_words() {
  local expl
  local -a w
  if [[ -z "$TMUX_PANE" ]]; then
    _message "not running inside tmux!"
    return 1
  fi
  w=( ${(u)=$(tmux capture-pane \; show-buffer \; delete-buffer)} )
  _wanted values expl 'words from current tmux pane' compadd -a w
}

zle -C tmux-pane-words-prefix   complete-word _generic
zle -C tmux-pane-words-anywhere complete-word _generic
bindkey '^Xt' tmux-pane-words-prefix
bindkey '^X^X' tmux-pane-words-anywhere
zstyle ':completion:tmux-pane-words-(prefix|anywhere):*' completer _tmux_pane_words
zstyle ':completion:tmux-pane-words-(prefix|anywhere):*' ignore-line current
zstyle ':completion:tmux-pane-words-anywhere:*' matcher-list 'b:=* m:{A-Za-z}={a-zA-Z}'

How does it work? _tmux_pane_words will just capture the current pane's contents (capture-pane), print out the buffer that contains it (show-buffer) and then delete it again (delete-buffer). – The rest of the magic happens via Zsh's excellent completion mechanisms.

See it in action (after typing spm^X^X):

posted 2012-01-19 tagged zsh, tmux and linux

trying pthreads

Today I played around with POSIX threads a little. In an assignment, we have to implement a very, very simple webserver that does asynchronous I/O. Since it should perform well, I thought I'd not only serialize I/O, but also parallelize it.

So there's a boss that just accepts new inbound connections and appends the fds to a queue:

clientfd = accept(sockfd, (struct sockaddr *) &client, &client_len);
if(clientfd == -1)
    error("accept");
new_request(clientfd);

The new_request function in turn appends it to a queue (of size TODOS = 64), and emits a cond_new signal for possibly waiting workers:

pthread_mutex_lock(&mutex);
while((todo_end + 1) % TODOS == todo_begin) {
    fprintf(stderr, "[master] Queue is completely filled; waiting\n");
    pthread_cond_wait(&cond_ready, &mutex);
}
fprintf(stderr, "[master] adding socket %d at position %d (begin=%d)\n",
    clientfd, todo_end, todo_begin);
todo[todo_end] = clientfd;
todo_end = (todo_end + 1) % TODOS;
pthread_cond_signal(&cond_new);
pthread_mutex_unlock(&mutex);

The workers (there being 8) will just emit a cond_ready, possibly wait until a cond_new is signalled, and then extract the first client fd from the queue. After that, a simple function involving some reads and writes will handle the communication on that fd.

pthread_mutex_lock(&mutex);
pthread_cond_signal(&cond_ready);
while(todo_end == todo_begin)
    pthread_cond_wait(&cond_new, &mutex);
clientfd = todo[todo_begin];
todo_begin = (todo_begin + 1) % TODOS;
pthread_mutex_unlock(&mutex);

// handle communication on clientfd

(Full source is here: webserver.c.)

Now this works pretty well and is fairly easy. I'm not very experienced with threads, though, and run into problems when I do massive parallel requests.

If I run ab, the Apache Benchmark tool with 10,000 requests, 1,000 concurrent, on the webserver it'll go up to 9000-something requests and then lock up.

$ ab -n 10000 -c 1000 http://localhost:8080/index.html
...
Completed 8000 requests
Completed 9000 requests
apr_poll: The timeout specified has expired (70007)
Total of 9808 requests completed

The webserver is blocked; its last line of output reads like this:

[master] Queue is completely filled; waiting

If I attach strace while in this blocking state, I get this:

$ strace -fp `pidof ./webserver`
Process 21090 attached with 9 threads - interrupt to quit
[pid 21099] recvfrom(32,  <unfinished ...>
[pid 21098] recvfrom(23,  <unfinished ...>
[pid 21097] recvfrom(31,  <unfinished ...>
[pid 21095] recvfrom(35,  <unfinished ...>
[pid 21094] recvfrom(34,  <unfinished ...>
[pid 21093] recvfrom(33,  <unfinished ...>
[pid 21092] recvfrom(26,  <unfinished ...>
[pid 21091] recvfrom(24,  <unfinished ...>
[pid 21090] futex(0x6024e4, FUTEX_WAIT_PRIVATE, 55883, NULL

So the children seem to be starving on unfinished recv calls, while the master thread waits for any children to work away the queue. (With a queue size of 1024 and 200 workers I couldn't reproduce the situation.)

How can one counteract this? Specify a timeout? Spawn workers on demand? Set the listen() backlog argument to a low value? – or is it all Apache Benchmark's fault? *confused*

posted 2012-01-17 tagged linux and c

mutt sidebar patch improvements

It is generally accepted as an almost universal truth that mutt sucks, but is the MUA that sucks less than all others. While people use either Vim or Emacs and fight about it, I hardly see any people fight about whether mutt is good or bad. There is, to my knowledge, no alternative worth mentioning.

Mutt dates back well into the mid-nineties. As you might imagine, with lots of contributors over the course of almost two decades, the code quality is rather messy.

When development had stalled for quite a while in the mid-2000's, a fork was attempted. While mutt-ng was quite popular for a while, most changes were incorporated back into mainline mutt at some point. (Ironically, the latest article in the mutt-ng development blog is from October 2006 and is titled "mutt-ng isn't dead!"). The development of main mutt gained some momentum again, triggered in large parts by the contributions of late Rocco Rutte.

I remember two big features that the original mutt authors just wouldn't integrate into mainline: The headercache patch and the sidebar patch. About the former I can't say anything, but lately I've been fixing the Sidebar patch in various places. (We use mutt at work and rely heavily on e-mail communication, so we'd like a bug-free user agent, naturally.)

When all the mutt forking went about five years ago, I didn't know much about it. Retrospectively, I see the people did a hell of a job. Long before mutt-ng was forked, Sven told me he and Mika met in Graz for several weeks to sift and sort through the availbale patches, intending to do a "super patch".

Mutt's code quality is arguably rather messy.

On top of that, the Sidebar patch tries to make it even worse. Imagine this: mutt draws a mail from position (line=x,char=0) to the end of the line. Now the sidebar patch will introduce a left "margin", such that the sidebar can be drawn there. Thus, all code parts where a line is started from the leftmost character has to be rewritten to check if the sidebar is active and possibly start drawing at (line=x,char=20).

The sidebar code quality is a fringe case of bad code. Really, it sucks. However, there's no real way to "do it right", since original mutt never planned for a sidebar.

Who maintains the sidebar patch? – Not sure. There's a version at thomer.com, but he says:

July 20, 2006 I quit. Sadly, there seems to be no desire to absorb the sidebar patch into the main source tree.

The most up-to-date version is found at Lunar Linux. Last update is from mid-2009.

Debian offers a mutt-patched package that includes the sidebar patch, albeit in a different version than usually found 'round the net. In short, this patch is a mess, too.

But since I made all the fixes, I decided to contact the package's maintainer, Antonio Radici. He promptly responded and said he'd happily fix all the issues, so I started by opening two bug reports. Nothing has happened since.

The patches run quite stable for my colleagues, so I think it's best to release them. Maybe someone else can use them. Please note that I have absolutely no interest in taking over any Sidebar patch maintainance. ;-)

For some of the patches I provide annotations. They all feature quite descriptive commit messages, and apply cleanly on top of the Debian mutt repository's master branch.

The first four patches are not by me, they are just the corresponding patches from the debian/patches/ directory applied to have a starting point.

The first few patches fix rather trivial bugs.

Now come the performance critical patches. They are the real reason I was assigned the task to repair the sidebar:

This patch fixes a huge speed penalty. Previously, the sidebar would count the mails (and thus read through the whole mbox) every time that mtime > atime! This is just an incredible oversight by the developer and must have burned hundreds of millions of CPU cycles.

This introduces a member `sb_last_checked' to the BUFFY struct. It
will be set by `mh_buffy_update', `buffy_maildir_update' and
`buffy_mbox_update' when they count all the mails.

Mboxes only: `buffy_mbox_update' will not be run unless the
condition "sb_last_checked > mtime of the file" holds. This solves
a huge performance penalty you obtain with big mailboxes. The
`mx_open_mailbox' call with the M_PEEK flag will *reset* mtime and
atime to the values from before. Thus, you cannot rely on "mtime >
atime" to check whether or not to count new mail.

Also, don't count mail if the sidebar is not active:

Then, I removed a lot of cruft and simply stupid design. Just consider one of the functions I removed:

-static int quick_log10(int n)
-{
-        char string[32];
-        sprintf(string, "%d", n);
-        return strlen(string);
-}

That is just insane.

Now, customizing the sidebar format is simple, straight-forward and mutt-like:

sidebar_format

    Format string for the sidebar. The sequences `%N', `%F' and
    `%S' will be replaced by the number of new or flagged messages
    or the total size of the mailbox. `%B' will be replaced with
    the name of the mailbox. The `%!' sequence will be expanded to
    `!' if there is one flagged message; to `!!' if there are two
    flagged messages; and to `n!' for n flagged messages, n>2.

While investigating mutt's performance, one thing struck me: To decode a mail (eg. from Base64), mutt will create a temporary file and print the contents into it, later reading them back. This also happens for evaluating filters that determine coloring. For example,

color   index  black green  '~b Julius'

will highlight mail containg my name in the body in bright green (this is tremendously useful). However, for displaying a message in the index, it will be decoded to a temporary file and later read back. This is just insane, and clearly a sign that the mutt authors wouldn't bother with dynamic memory allocation.

By chance I found a glib-only function fmemopen(), "fmemopen, open_memstream, open_wmemstream - open memory as stream".

From the commit message:

When searching the header or body for strings and the
`thorough_search' option is set, a temp file was created, parsed,
and then unlinked again. This is now done in memory using glibc's
open_memstream() and fmemopen() if they are available.

This makes mutt respond much more rapidly.

Finally, there are some patches that fix various other issues, see commit message for details.

There you go. I appreciate any comments or further improvements.

Update 1: The original author contacted me. He told me he's written most of the code in a single sitting late at night. ;-)

Update 2: The 16th patch will make mutt crash when you compile it with -D_FORTIFY_SOURCE=2. There's a fix: 0020-use-PATH_MAX-instead-of-_POSIX_PATH_MAX-when-realpat.patch (thanks, Jakob!)

posted 2012-01-08 tagged mutt, linux and c

X220's UMTS card

I've been toying around with the UMTS module in my X220 lately. I got a pre-paid SIM from blau.de, who offer 24h UMTS flatrates for 2,40 EUR. (This is probably my use case: Being somewhere without internet access for a day or two. This only happens so often, so I don't want a "real" flat.)

My UMTS card is manufactured by Sony Ericsson and connected via internal USB:

$ lsusb -v -s 004:003
    ...
    idVendor           0x0bdb Ericsson Business Mobile Networks BV
    idProduct          0x1911

The installation is easy: Just insert the SIM card behind the battery as shown here. Add yourself to the dialout group, log in again, and you're set.

You can first connect to your device using chat or picocom (which you can be terminated via C-a C-x). To ask if you can use the SIM without PIN, send the AT+CPIN? command:

$ picocom /dev/ttyACM0
...
AT+CPIN?
+CPIN: READY

If you're not ready to go, I would disable the PIN request using a regular phone. (I did.)

Dialling out is easy. I set up two profiles in the /etc/wvdial.conf that allow me to switch between "pay per megabyte" and "dayflat":

[Dialer blau]
Modem = /dev/ttyACM0
Init1 = AT+CGDCONT=1,"IP","internet.eplus.de"
Stupid mode = 1
phone= *99#
Username = blau
Password = blau

[Dialer tagesflat]
Modem = /dev/ttyACM0
Init1 = AT+CGDCONT=1,"IP","tagesflat.eplus.de"
Stupid mode = 1
phone= *99#
Username = blau
Password = blau

The rest happens automatically, once you invoke wvdial blau or wvdial tagesflat. (Note you have to execute these with root privileges because they want to modify pppd-related config files.) Most probably you want the follow-up command route add default dev ppp0 to route all traffic via the ppp0 interface.

In a test run I got a downstream speed of 190KB/s (city perimeter). Working over SSH is not painful at all.

I also played around with gammu a little bit.

$ gammu --identify
Device               : /dev/ttyACM0
Manufacturer         : Lenovo
Model                : unknown (F5521gw)
Firmware             : R2A07

The Wammu interface is nice, it can even receive SMS. But sending SMSes failed so far:

$ echo "Das ist ein Test" | gammu --debug textall --debug-file /tmp/gammu \
    sendsms TEXT +491785542342
...
1 "AT+CMGS=28"
2 "> 079194710716000011000C919471584532240000FF10C4F01C949ED341E5B41B442DCFE9^Z"
3 "+CMS ERROR: 500"

... which is somewhat of an "generic error". Maybe sending SMS is not supported at all. I'll look into that later.

Also, I'll have a look whether my Card supports GPS information retrieval. Thinkwiki claims a similar model does this. Interesting.

Update: Actually, I forgot one thing. I keep the following two entries in my /etc/wvdial.conf:

[Dialer on]
Modem = /dev/ttyACM0
Init1 = AT+CFUN=1

[Dialer off]
Modem = /dev/ttyACM0
Init1 = AT+CFUN=4

The actual sequence is now: wvdial on && wvdial blau. The AT+CFUN=1 will active the radio equipment, which is necessary. And, suddenly, also SMS delivery works! :-)

posted 2011-12-22 tagged x220, umts and linux

New X220

I got a brand new Thinkpad X220 on thursday. I'm not much into hardware, I think it should mainly work. I have a model with 4 GB of RAM, an i7 at 2.7 GHz, UMTS preinstalled, SSD instead of a HDD and an IPS panel. It's a really nifty thing.

Paying the extra money for the SSD is totally worth it. Everything happens instantaneous. The bootup process is down to five seconds. The IPS panel is really worth it, too. ThinkPads have long been criticized for their bad displays – with the new panel at full brightness, my regular screen looks really dim and grey...

The Debian netinstall works smoothly. I haven't come around to testing all the stuff like the DisplayPort connectors, Bluetooth, UMTS, USB 3.0. But the usual stuff works out of the box.

However, there are major problems with the power management of both the graphics card and the whole system, the latter one being a regression in the recent 3.0 and 3.1 kernel series regarding ASPM. Currently I'm using the 3.1.0-1-amd64 kernel with the pcie_aspm=force boot parameter. I cannot really see a difference in power consumtion when varying this parameter, though.

A major thing, however, is re-enabling the RC6 mode of the graphics chip. This alone saves more than 4W when the computer is in an idle state. My /etc/modprobe.d/i915-kms.conf looks like this now:

options i915 modeset=1
options i915 i915_enable_rc6=1
options i915 i915_enable_fbc=1
options i915 lvds_downclock=1

Suspend/resume works fine, no flickering effects. I use the following command to find out the current power consumption:

while sleep 1; do
    awk '{printf"%.2f\n",$1/-1000}' < /sys/devices/platform/smapi/BAT0/power_now;
done

This requires the tp_smapi kernel module to be loaded. With full brightness (0) and while writing this blog article, the consumption is at ~12W; with medium brightness (8) it's ~8.5W; at the lowest brightness (15) it's ~8W; With the display completely turned off, it's ~6.5W. There are people who claim they only have an ~5.4 power consumption. If you have any other hints on this or if you own an X220 yourself, I'd be interested in the details.

posted 2011-12-10 tagged x220 and linux

GUI simplicity vs. UNIX simplicity

I ranted about the new Unity interface some weeks ago. On several occasions thereafter, I had to help people solve problems they had using some sort of graphical user interface.

UNIX is simple. It really is. There is a reasonable and easy-to-follow philosophy behind it. But UNIX requires the user to know what he wants to do, and read error messages. UNIX simplicity is not the same as iPhone simplicity.

Eric S. Raymond wrote this set of rules that should guide UNIX program design. In this context, two important rules stick out (emphasis mine):

Rule of Silence: When a program has nothing surprising to say, it should say nothing.

Rule of Repair: When you must fail, fail noisily and as soon as possible.

Although this is of course mostly aimed at text user interface programs, you can get an important point here. Most GUIs adhere to the Rule of Silence quite well – in fact so well that they seldom say anything at all!

Since many UNIX GUIs invoke text-interface programs under the hood, it should be a necessity to be able to view how those program failed. Luckily, most TUI programs provide descriptive error messages. If they are hidden in the GUI there are two effects:

I don't use GUI programs at all, except for a Browser (Vimperator/Firefox), a PDF viewer (Zathura) and The GIMP. Mostly, this is because of usability considerations. But also, I'm afraid to use a computer where I cannot see what is happening. And that's exactly the case with GUIs that do stuff that can fail: I don't know what they are doing and why they are failing!

I the end I always go the extra mile and read up on the PPP daemon, for example. This wouldn't be necessary if GUIs had a switch to do some really verbose logging. That would help tremendously. Plus a button to display that log. Should be easy, shouldn't it?

posted 2011-11-24 tagged unix and linux

dead code easter egg

I was just researching on how the file format of the xt_recent module works. That's where I found this nice easter egg: instead of writing down the size of an IPv6 address plus one, they simply used a dummy string +b335:1d35:1e55:dead:c0de:1715:5afe:c0de", reading "beesides less dead code it is safe code". Hehe.

posted 2011-10-02 tagged linux

statically linking dwm against X11 and XCB

Today, virtually all binaries used on linux systems are dynamically linked to several libraries. While it is commonly accepted statically linking applications is bad – most notably in terms of security concerns: fixing a library's bug means you won't have to recompile all applications that are using that special library, they'll simply load the version available at run-time – there are in fact good reasons to use static linking. (And for those who claim statically linked binaries occupy much disk space: yeah, sure. As if a few megs compared to a few hundred kilobytes make that much a difference today, plus you don't have the overhead of looking up and loading the libs in the first place.)

As I mentioned in my post about tmux already, there's a huge advantage to static linking: you can compile bleeding edge software with bleeding edge library functions and still use them on reasonably outdated systems (think: Debian stable).

One division of rapidly evolving software I could never successfully link statically was window managers like dwm or awesome. However, especially considering the XCB development and adoption over the past few years, to me it makes perfect sense. I'll just distribute a copy of the window manager I use to different systems and have a guarantee it'll work there, no matter the libxcb version (or if it's available at all).

Usually, however, it's not possible to just pass a -static or -Wl,-Bstatic flag to the compiler (in my case, gcc). It'll fail to find several symbols that are located in libraries that don't have to be explicitly linked in. Such an error message might look like this:

/usr/lib/libXinerama.a(Xinerama.o): In function `find_display':
(.text+0x89): undefined reference to `XextCreateExtension'
/usr/lib/libXinerama.a(Xinerama.o): In function `XineramaQueryScreens':
(.text+0x255): undefined reference to `XMissingExtension'

To find the appropriate library, you may try to use pkg-config. I use a different approach, however. I have a shell function defined called findsym (beware, Z-Shell specialties apply):

findsym () {
  [[ -z $1 ]] && return 1
  SYMBOL=$1
  LIBDIR=${2:-/usr/lib}
  for lib in $LIBDIR/*.a
  do
    nm $lib &> /dev/null | grep -q $SYMBOL && \
      print "symbol found in $lib\n -L$LIBDIR -l${${lib:t:r}#lib}"
  done
}

Thus, I can simply go looking for the missing XMissingExtension symbol like this:

$ findsym XMissingExtension
symbol found in /usr/lib/libXext.a
 -L/usr/lib -lXext
symbol found in /usr/lib/libXi.a
 -L/usr/lib -lXi
symbol found in /usr/lib/libXinerama.a
 -L/usr/lib -lXinerama
symbol found in /usr/lib/libXrandr.a
 -L/usr/lib -lXrandr

Now, I use the readme file, some common sense or symple try'n'error to find out which library I'd best link in, too. In this case, it's adding a simple -lXext to the LDFLAGS part.

Thus, I come up with the following diff to dwm's config.mk:

--- a/config.mk
+++ b/config.mk
@@ -16,7 +16,7 @@ XINERAMAFLAGS = -DXINERAMA

 # includes and libs
 INCS = -I. -I/usr/include -I${X11INC}
-LIBS = -L/usr/lib -lc -L${X11LIB} -lX11 ${XINERAMALIBS}
+LIBS = -L/usr/lib -L${X11LIB} -static -lX11 ${XINERAMALIBS} -lxcb -lXau -lXext -lXdmcp -lpthread -ldl

 # flags
 CPPFLAGS = -DVERSION=\"${VERSION}\" ${XINERAMAFLAGS}

There's one important point here: libX11 will (to me, it seems, inevitably) load another library, not sure why or which one. Thus, it is vitally important to statically link in libdl, the library that dynamically loads another library. Otherwise, the follwing error messages appear:

/usr/lib/libX11.a(CrGlCur.o): In function `open_library':
(.text+0x3b): undefined reference to `dlopen'
/usr/lib/libX11.a(CrGlCur.o): In function `fetch_symbol':
(.text+0x6b): undefined reference to `dlsym'
/usr/lib/libX11.a(CrGlCur.o): In function `fetch_symbol':
(.text+0x88): undefined reference to `dlsym'

With the above modification to config.mk, dwm will compile and link just fine:

$ file dwm
dwm: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, for GNU/Linux 2.6.18, not stripped

You can reduce the binary's size by a few hundred kilobytes by manually calling strip(1).

The binary works very well for me. I'll try to use it on different systems over the next few weeks and see what happens. If that works out well, I'll also try to get lucky with awesome and zathura, as these (and the libraries needed) are not installed on many systems, either.

posted 2011-08-05 tagged dwm, linux and static-linking

Tuning old hardware with slow hard drive

My main work machine is a pretty old X41 with a 40GB hard disk and 512MB of RAM. It is more than five years old and is not without problems. (In recent months, I have to try several before switching it on successfully – in most cases, it just beeps twice and displays "Keyboard error, <F1> to configure" and the keyboard doesn't work.)

However, there's a thing which annoys me a lot: bad performance. I use a resource-friendly window manager with some urxvts running. Apart from the memory-hog Firefox, I very seldom use any graphical application (ie. any program using the GTK or Qt libraries).

For some weeks now I've been trying this cgroups hack, with mixed results. In some cases, the performance is better, sometimes it's not.

How bad could the overall performance be, then? – Unfortunately, very bad. Which has, in part, to do with my slow hard disk. It does uncached reading with 18MB/s in theory:

$ sudo hdparm -t /dev/sda
/dev/sda:
 Timing buffered disk reads:   56 MB in  3.08 seconds =  18.21 MB/sec

In reality, it's rather some 16.5MB/s:

$ dd if=/dev/zero of=./zero bs=1048576 count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 10.0246 s, 26.8 MB/s

$ time cat zero > /dev/null
cat zero > /dev/null  0.01s user 0.25s system 1% cpu 16.246 total

Now, I have to live with this (SSD's are still very expensive!). The main problem here is that the Kernel swaps out data that wasn't accessed for a while (although, from a naïve perspective, there's no ultimate need to do so since there's still free memory left).

I actually notice that with two programs regularly:

Now I always thought this was the Linux Kernel being stupid. However I discovered a switch today. From the sysctl.vm documentation:

swappiness

This control is used to define how aggressive the kernel will swap
memory pages.  Higher values will increase agressiveness, lower values
decrease the amount of swap.

The default value is 60.

Debian (like all other distros) seem to keep this default value. After reading up on some articles I set vm.swappiness=0 in /etc/sysctl.conf. (You can do this interactively with sysctl -w vm.swappiness=0 also. Interestingly, Ubuntu recommends a value of 10 for desktop systems.)

For the past day or so, I have been monitoring the output of vmstat 1 every now and then (especially the swap in/out parameters si and so). But even after the first hour one thing is evident: the interactive system performance is much, much better. It feels like a machine upgrade.

Terminals open instantly (because the initialization parts of their binary doesn't get swapped out, for example). Switching to Firefox is instant. Switching tabs is fast. The system feels a lot more responsive.

Where's the drawback, then? If you could magically tune your system's performance, why wouldn't you want do that?

A case where this setup will give you a headache is when you actually do run out of memory. I easily accomplished that by opening Gimp on a huge (blank) file. Now, working with Gimp is easy now; switching to Firefox takes ages (heavy swapping). So there a not-so-agressive swapping policy would be better if you switch between several memory-hogging applications a lot.

(Side note: When there's a lot of free memory left – for example after closing Gimp – the kernel step by step swaps in certain blocks again, a few every second so as to not disturb system performance. I saw this going on for several minutes on a otherwise completely idle system.)

Conclusion: For the usage pattern I'm accustomed to, setting vm.swappiness=0 actually is a huge performance improvement. But your mileage may vary.

posted 2011-01-05 tagged x41, linux and performance