Debbuging long-running processes or server software is usually an “either–or”: Either you activate debugging and have huge files that you rarely if ever look at, and they take up a considerable amount of disk space – or you did not activate the debugging mode and thus cannot get to the debugging output to figure out what the program is doing right now.
There is a really nice quick and dirty Non-invasive printf
debugging technique that just does a printf
on a
non-existent file descriptor, so that you can view the messages by strace
-ing
the process and grepping for EBADF
.
I want to share here a few Perl code snippets for an approach that is a little neater IMO, yet a little bit more invasive. Consider a simple “server” doing some work, occasionally printing out a debug statement:
#!/usr/bin/perl
use strict;
use warnings;
sub Debug { }; # empty for now
while(1) {
Debug("Here I am!");
select undef, undef, undef, 0.1;
}
The idea is now to on demand create a UNIX domain socket where the process can write debug information to, so that (possibly a few) other processes can read and print out the debug info received on that socket.
We introduce a “global” structure $DEBUG
, and a function to initialize and
destroy the socket, which is named debug-<pid-of-process>
and placed in /tmp
.
my $DEBUG = {
socket => undef,
conn => [],
last_check => 0,
};
sub Debug_Init {
use IO::Socket;
use Readonly;
my Readonly $SOCKET = "/tmp/debug-$$";
return if $DEBUG->{socket};
unlink $SOCKET;
my $s = IO::Socket::UNIX->new(
Type => IO::Socket::SOCK_STREAM,
Local => $SOCKET,
Listen => 1,
) or die $!;
$s->blocking(0);
$DEBUG->{socket} = $s;
}
sub Debug_Cleanup {
return unless $DEBUG->{socket};
my $path = $DEBUG->{socket}->hostpath;
undef $DEBUG->{socket};
unlink $path;
}
When the process receives a SIGUSR1
, we call Debug_Init
, and to be sure
we’ll clean up the socket in case of normal exit:
$SIG{USR1} = \&Debug_Init;
END { Debug_Cleanup; }
The socket is in non-blocking mode, so trying to accept()
new connections
will not block. Now, whenever we want to print out a debugging
statement, we check if anyone has requested the debugging socket
via SIGUSR1
. After the first connection is accepted, we’ll only
check once every second for new connections. For every accepted
connection, we send the debugging message to that peer. (Note that
UNIX domain sockets with Datagram type sadly do not support broadcast
messaging – otherwise this would probably be easier.)
In case sending the message fails (probably because the peer disconnected), we’ll remove that connection from the list. If the last connection goes, we’ll unlink the socket.
sub Debug {
return unless $DEBUG->{socket};
my $s = $DEBUG->{socket};
my $conn = $DEBUG->{conn};
my $msg = shift or return;
$msg .= "\n" unless $msg =~ /\n$/;
if(time > $DEBUG->{last_check}) {
while(my $c = $s->accept) {
$c->shutdown(IO::Socket::SHUT_RD);
push @$conn => $c;
}
$DEBUG->{last_check} = time if @$conn;
}
return unless @$conn;
for(@$conn) {
$_->send($msg, IO::Socket::MSG_NOSIGNAL) or undef $_;
}
@$conn = grep { defined } @$conn;
unless(@$conn) {
Debug_Cleanup();
}
}
Here’s a simple script to display the debugging info for a given PID, assuming it uses the setup described above:
#!/usr/bin/perl
use strict;
use warnings;
use IO::Socket;
my $pid = shift;
if(not defined $pid) {
print "usage: $0 <pid>\n";
exit(1);
}
kill USR1 => $pid or die $!;
my $path = "/tmp/debug-$pid";
select undef, undef, undef, 0.01 until -e $path;
my $s = IO::Socket::UNIX->new(
Type => IO::Socket::SOCK_STREAM,
Peer => $path,
) or die $!;
$s->shutdown(IO::Socket::SHUT_WR);
$| = 1;
while($s->recv(my $m, 4096)) {
print $m;
}
We can now start the server; no debugging happens. But as soon as we
send a SIGUSR1
and attach to the (now present) debug socket, we can
see the debug information:
$ perl server & ; sleep 10
[1] 19731
$ perl debug-process 19731
Here I am!
Here I am!
Here I am!
^C
When we hit Ctrl-C, the debug socket vanishes again.
In my opinion this is a really neat way to have a debugging infrastructure in place “just in case”.
The standard book Perl Best Practices advises in chapter 4.5 that
one should use the Readonly
Perl module instead of the
constant
standard module for various reasons.
An example might look like this:
package Myprogram;
use Exporter;
use Readonly;
our @EXPORT = qw(conffile);
Readonly our $BASEPATH => "$ENV{HOME}/.myprogram";
sub conffile { "$BASEPATH/config.ini" }
If you want to unit test your program now, you cannot just mess around
and replace a potentially existing config file with a bogous
one. You have to create a temporary directory and use that as the base
path. That is, you have to modify your Readonly
declared variable.
I’ve not seen this documented, so I guess this might help others:
Internally the method Readonly::Scalar::STORE
is called when you do
an assignment (see man perltie
for details). In Readonly.pm
, this is
redefined to
*STORE = *UNTIE = sub {Readonly::croak $Readonly::MODIFY};
which dies with an error message. So you only have to circumvent this.
The method STORE
gets a reference of the location as first argument,
and the value as second argument. So a quick-and-dirty workaround is
just setting
*Readonly::Scalar::STORE = sub { ${$_[0]} = $_[1]; };
prior to assigning to the Readonly variable. If you want to do it properly, you should only change this locally in the block where you re-assign the value, so that subsequent attempts will again produce the usual error message. Such a test might look like this:
use strict;
use warnings;
use Test::More 'no_plan';
use Myprogram;
{
no warnings 'redefine';
local *Readonly::Scalar::STORE = sub { ${$_[0]} = $_[1]; };
$Myprogram::BASEPATH = "/tmp";
}
is(Myprogram::conffile, "/tmp/config.ini", "get config filename");
For non-scalar values, this will probably work similar. (Read the source if in doubt.)
Verschiedene Texte, die ich heute las und für sehr lesenswert halte:
Der Artikel Global Warming's Terrifying New Math von Bill McKibben verknüpft neue Zahlen zum Thema Klimawandel mit der Einsicht, dass wir es aller Voraussicht nach nicht schaffen werden, das Problem unter Kontrolle zu bekommen – es ist einfach zu lukrativ, jetzt weiter Öl zu verbrennen:
The Third Number: 2,795 Gigatons. This number is the scariest of all – one that, for the first time, meshes the political and scientific dimensions of our dilemma. ... The number describes the amount of carbon already contained in the proven coal and oil and gas reserves of the fossil-fuel companies, and the countries (think Venezuela or Kuwait) that act like fossil-fuel companies. In short, it's the fossil fuel we're currently planning to burn. And the key point is that this new number – 2,795 – is higher than 565 [the maximum number of gigatons the climate can abosrb without rising beyond 2 degress celsius]. Five times higher.
The top 10 tricks of Perl one-liners.
awk
braucht man schließlich nicht wirklich.
Kategorientheorie in Scala und in Haskell.
There's a Perl module threads::shared
. However, it won't show up as a
completion match to the perldoc
command. Why is that?, a colleague
asked me. Now, that obviously is a quirk, so let's debug it.
We start at /usr/share/zsh/functions/Completion/Unix/_perldoc
, where
we find this:
_alternative \
'modules:module: _perl_modules -tP' \
'pods:base pod: _perl_basepods' \
'files:module or .pod file:_files -g "*.(pod|pm)(-.)"' &&
ret=0
So, go on to _perl_modules
: It searches for paths where Perl would
store modules and their documentation (line 85):
inc=( $( $perl -e 'print "@INC"' ) )
Then, they do some complicated globbing stuff on the directories (line
104), where a loop iterates over $inc
and stores each element in
$libdir
:
new_pms=( $libdir/{[A-Z]*/***/,}*${~sufpat}~*blib* )
There's the catch: This globbing expression will only recurse into
subdirectories that start with an uppercase letter. Lowercase
modules on the highest level are okay, though. That's why threads
appears (from threads.pm
), but threads::shared
doesn't (from
threads/shared.pm
).
The search for all modules that are missed this way will translate to the following Z-Shell command:
$ print -l ${^=$(perl -e 'print "@INC"')}/[a-z]*/***/*.pm(.N)
/usr/lib/perl/5.10/threads/shared.pm
/usr/share/perl/5.10/encoding/warnings.pm
/usr/share/perl/5.10/warnings/register.pm
With it's 5.10 version, Perl learned some pretty interesting new features. Some of them modify the syntactic interpretation of Perl code, so to stay backwards-compatible, you have to manually enable them. To do this, add a line like
use feature qw(:5.10);
to your other use
declarations. (For one-liners, add the -E
command line switch.) The documentation is available via perldoc
feature
or online.
There's one pretty trivial change that nonetheless has bothered me for
a long time already. Why is there no function in Perl to print out
some strings, and then finish with a newline? I mean, almost every
sane programming language has this, and it helps writing clean code a
lot (no fiddling with \n
and so on). – Now, they backported
the say
function from Perl 6, which does just that: add an omplicit
newline.
The new concept of state variables is nothing I need for now; however,
the feature they strangely call switch
(although the actual keywords
are given
/when
) is pretty nice: It's pretty much like what you'd
expect with a typical C construct like this:
switch(var) {
case 1:
/* do something */
break;
...
default:
/* do default thing */
break;
}
In Perl that's now:
given($var) {
when(condition1) {
# do something
}
do_other_thing when condition2; # as one-liner
default {
# fallthrough case
}
}
Update: The statement when condition;
one-liner doesn't work.
Don't know where I got that from.
You can use simple strings (maps to $_ eq "string"
), regular
expressions (maps to $_ =~ /regex/
) or function references like
\&myfunc
(maps to myfunc($_)
).
The best improvement, however, are so-called named captures.
(You don't have to enable those specifically.)
It allows regexes to contain pairs of parentheses that you can assign
a name to later extract the match. That means: no stupid and
mind-boggling re-numbering of $2
to $3
etc., just because you
added a set of brackets around something you want to extract from a
string. Consider this example regex:
my $re = qr/
(?<user>[^\@\s]+)
\@
(?<domain>[^\s">]+)
/x;
You can now access the user-part of the e-mail address via $+{user}
,
not necessarily via $1
. The immediate gain is obvious: if you add
parentheses around the whole expression (to capture the whole e-mail
address) the user part is still available via $+{user}
. The numbered
variable would be $2
now, though.
Another side-effect is that you can test several regexes in a row now, and
later extract matching parts from all of these – of course only
if you use distinct capture names. Hash entries in %+
will only be
updated upon a match (and won't be cleared AFAIK, at least in the
lexical scope).