Julius Plenz - Blog - An on demand Debugging Technique for long-running Processes

An on demand Debugging Technique for long-running Processes

Debbuging long-running processes or server software is usually an “either–or”: Either you activate debugging and have huge files that you rarely if ever look at, and they take up a considerable amount of disk space – or you did not activate the debugging mode and thus cannot get to the debugging output to figure out what the program is doing right now.

There is a really nice quick and dirty Non-invasive printf debugging technique that just does a printf on a non-existent file descriptor, so that you can view the messages by strace-ing the process and grepping for EBADF.

I want to share here a few Perl code snippets for an approach that is a little neater IMO, yet a little bit more invasive. Consider a simple “server” doing some work, occasionally printing out a debug statement:

#!/usr/bin/perl

use strict;
use warnings;

sub Debug { }; # empty for now

while(1) {
    Debug("Here I am!");
    select undef, undef, undef, 0.1;
}

The idea is now to on demand create a UNIX domain socket where the process can write debug information to, so that (possibly a few) other processes can read and print out the debug info received on that socket.

We introduce a “global” structure $DEBUG, and a function to initialize and destroy the socket, which is named debug-<pid-of-process> and placed in /tmp.

my $DEBUG = {
    socket => undef,
    conn => [],
    last_check => 0,
};

sub Debug_Init {
    use IO::Socket;
    use Readonly;
    my Readonly $SOCKET = "/tmp/debug-$$";

    return if $DEBUG->{socket};

    unlink $SOCKET;
    my $s = IO::Socket::UNIX->new(
        Type => IO::Socket::SOCK_STREAM,
        Local => $SOCKET,
        Listen => 1,
    ) or die $!;
    $s->blocking(0);
    $DEBUG->{socket} = $s;
}

sub Debug_Cleanup {
    return unless $DEBUG->{socket};
    my $path = $DEBUG->{socket}->hostpath;
    undef $DEBUG->{socket};
    unlink $path;
}

When the process receives a SIGUSR1, we call Debug_Init, and to be sure we’ll clean up the socket in case of normal exit:

$SIG{USR1} = \&Debug_Init;
END { Debug_Cleanup; }

The socket is in non-blocking mode, so trying to accept() new connections will not block. Now, whenever we want to print out a debugging statement, we check if anyone has requested the debugging socket via SIGUSR1. After the first connection is accepted, we’ll only check once every second for new connections. For every accepted connection, we send the debugging message to that peer. (Note that UNIX domain sockets with Datagram type sadly do not support broadcast messaging – otherwise this would probably be easier.)

In case sending the message fails (probably because the peer disconnected), we’ll remove that connection from the list. If the last connection goes, we’ll unlink the socket.

sub Debug {
    return unless $DEBUG->{socket};
    my $s = $DEBUG->{socket};
    my $conn = $DEBUG->{conn};
    my $msg = shift or return;
    $msg .= "\n" unless $msg =~ /\n$/;

    if(time > $DEBUG->{last_check}) {
        while(my $c = $s->accept) {
            $c->shutdown(IO::Socket::SHUT_RD);
            push @$conn => $c;
        }
        $DEBUG->{last_check} = time if @$conn;
    }
    return unless @$conn;

    for(@$conn) {
        $_->send($msg, IO::Socket::MSG_NOSIGNAL) or undef $_;
    }
    @$conn = grep { defined } @$conn;

    unless(@$conn) {
        Debug_Cleanup();
    }
}

Here’s a simple script to display the debugging info for a given PID, assuming it uses the setup described above:

#!/usr/bin/perl

use strict;
use warnings;
use IO::Socket;

my $pid = shift;
if(not defined $pid) {
    print "usage: $0 <pid>\n";
    exit(1);
}

kill USR1 => $pid or die $!;

my $path = "/tmp/debug-$pid";
select undef, undef, undef, 0.01 until -e $path;

my $s = IO::Socket::UNIX->new(
    Type => IO::Socket::SOCK_STREAM,
    Peer => $path,
) or die $!;
$s->shutdown(IO::Socket::SHUT_WR);

$| = 1;
while($s->recv(my $m, 4096)) {
    print $m;
}

We can now start the server; no debugging happens. But as soon as we send a SIGUSR1 and attach to the (now present) debug socket, we can see the debug information:

$ perl server & ; sleep 10
[1] 19731

$ perl debug-process 19731
Here I am!
Here I am!
Here I am!
^C

When we hit Ctrl-C, the debug socket vanishes again.

In my opinion this is a really neat way to have a debugging infrastructure in place “just in case”.

posted 2013-07-13 ∴ tagged linux and perl

Julius Plenz – Blog

An on demand Debugging Technique for long-running Processes

Tagcloud