An Interesting pid File Race

ISC's dhcpd uses this code to check for an already-running daemon:

/* Read previous pid file. */
if ((i = open (path_dhcpd_pid, O_RDONLY)) >= 0) {
    status = read (i, pbuf, (sizeof pbuf) - 1);
    close (i);
    if (status > 0) {
        pbuf [status] = 0;
        pid = atoi (pbuf);        /* If the previous server process is not still running,
           write a new pid file immediately. */
        if (pid && (pid == getpid() || kill (pid, 0) < 0)) {
            unlink (path_dhcpd_pid);
            if ((i = open (path_dhcpd_pid,
                           O_WRONLY | O_CREAT, 0644)) >= 0) {
                sprintf (pbuf, "%d\n", (int)getpid ());
                write (i, pbuf, strlen (pbuf));
                close (i);
                pidfilewritten = 1;
            }
        } else
            log_fatal ("There's already a DHCP server running.");
    }
}

The problem with this strategy is that, if the box dies, there's a stale pid file left in /var/run/dhcpd.pid. This wouldn't be so bad -- the code above checks [using kill(pid, 0)] to see if there's a process running with that pid. But when the box is restarting, there will be a bunch of processes all starting in similar sequence each time. So on one boot, you might see dhcpd with a pid of 1001 and ntpd with a pid of 1002. If the box dies violently (e.g. power cut), the dhcpd pid file will contain 1001. On the second boot, assume ntpd starts first and gets a pid of 1001 and dhcpd is 1002. Now, the kill(pid, 0) will succeed, making it appear that dhcpd is already running, and dhcpd will exit.

How to fix this?

  1. Explicitly put the pid file under /tmp. Getting this right is fussy -- make sure you avoid the race conditions associated with creating temp files. Use dhcpd's "-pf" flag to tell it where to use the pid file. This avoids spurious "already running" messages, because dhcpd will never read a pid from an existing pid file. [You could also just remove the /var/run/dhcpd.pid file, but I'd rather explicitly provide the path in my startup script in case some dim bulb decides to change the compiled-in default.]
  2. Be careful in your restart code to kill any existing dhcpd (assuming you really want a new dhcpd), or avoid trying to start a new one (assuming you want to use an already running dhcpd). pgrep(1) and pkill(1) will be useful here.

In researching this, I saw this bit of wisdom from Henning Brauer: "pid files are useless.".

I heartily agree...

Posted on 2010-05-26 by brian in linux .
Comments on this post are closed. If you have something to share, please send me email.