Today I played around with POSIX threads a little. In an assignment, we have to implement a very, very simple webserver that does asynchronous I/O. Since it should perform well, I thought I'd not only serialize I/O, but also parallelize it.
So there's a boss that just accepts new inbound connections and appends the fds to a queue:
clientfd = accept(sockfd, (struct sockaddr *) &client, &client_len);
if(clientfd == -1)
error("accept");
new_request(clientfd);
The new_request
function in turn appends it to a queue (of size
TODOS
= 64), and emits a cond_new
signal for possibly waiting
workers:
pthread_mutex_lock(&mutex);
while((todo_end + 1) % TODOS == todo_begin) {
fprintf(stderr, "[master] Queue is completely filled; waiting\n");
pthread_cond_wait(&cond_ready, &mutex);
}
fprintf(stderr, "[master] adding socket %d at position %d (begin=%d)\n",
clientfd, todo_end, todo_begin);
todo[todo_end] = clientfd;
todo_end = (todo_end + 1) % TODOS;
pthread_cond_signal(&cond_new);
pthread_mutex_unlock(&mutex);
The workers (there being 8) will just emit a cond_ready
, possibly
wait until a cond_new
is signalled, and then extract the first
client fd from the queue. After that, a simple function involving some
reads and writes will handle the communication on that fd.
pthread_mutex_lock(&mutex);
pthread_cond_signal(&cond_ready);
while(todo_end == todo_begin)
pthread_cond_wait(&cond_new, &mutex);
clientfd = todo[todo_begin];
todo_begin = (todo_begin + 1) % TODOS;
pthread_mutex_unlock(&mutex);
// handle communication on clientfd
(Full source is here: webserver.c.)
Now this works pretty well and is fairly easy. I'm not very experienced with threads, though, and run into problems when I do massive parallel requests.
If I run ab
, the Apache Benchmark tool with 10,000 requests, 1,000
concurrent, on the webserver it'll go up to 9000-something requests and
then lock up.
$ ab -n 10000 -c 1000 http://localhost:8080/index.html
...
Completed 8000 requests
Completed 9000 requests
apr_poll: The timeout specified has expired (70007)
Total of 9808 requests completed
The webserver is blocked; its last line of output reads like this:
[master] Queue is completely filled; waiting
If I attach strace while in this blocking state, I get this:
$ strace -fp `pidof ./webserver`
Process 21090 attached with 9 threads - interrupt to quit
[pid 21099] recvfrom(32, <unfinished ...>
[pid 21098] recvfrom(23, <unfinished ...>
[pid 21097] recvfrom(31, <unfinished ...>
[pid 21095] recvfrom(35, <unfinished ...>
[pid 21094] recvfrom(34, <unfinished ...>
[pid 21093] recvfrom(33, <unfinished ...>
[pid 21092] recvfrom(26, <unfinished ...>
[pid 21091] recvfrom(24, <unfinished ...>
[pid 21090] futex(0x6024e4, FUTEX_WAIT_PRIVATE, 55883, NULL
So the children seem to be starving on unfinished recv
calls, while
the master thread waits for any children to work away the queue. (With
a queue size of 1024 and 200 workers I couldn't reproduce the
situation.)
How can one counteract this? Specify a timeout? Spawn workers on
demand? Set the listen()
backlog argument to a low value? – or
is it all Apache Benchmark's fault? *confused*