May 26, 2010
An Interesting pid File Race
ISC’s dhcpd uses this code to check for an already-running daemon:
/* Read previous pid file. */
if ((i = open (path_dhcpd_pid, O_RDONLY)) >= 0) {
status = read (i, pbuf, (sizeof pbuf) - 1);
close (i);
if (status > 0) {
pbuf [status] = 0;
pid = atoi (pbuf);
/* If the previous server process is not still running,
write a new pid file immediately. */
if (pid && (pid == getpid() || kill (pid, 0) < 0)) {
unlink (path_dhcpd_pid);
if ((i = open (path_dhcpd_pid,
O_WRONLY | O_CREAT, 0644)) >= 0) {
sprintf (pbuf, "%d\n", (int)getpid ());
write (i, pbuf, strlen (pbuf));
close (i);
pidfilewritten = 1;
}
} else
log_fatal ("There's already a DHCP server running.");
}
}
The problem with this strategy is that, if the box dies, there’s a stale pid file left in /var/run/dhcpd.pid. This wouldn’t be so bad — the code above checks [using kill(pid, 0)] to see if there’s a process running with that pid. But when the box is restarting, there will be a bunch of processes all starting in similar sequence each time. So on one boot, you might see dhcpd with a pid of 1001 and ntpd with a pid of 1002. If the box dies violently (e.g. power cut), the dhcpd pid file will contain 1001. On the second boot, assume ntpd starts first and gets a pid of 1001 and dhcpd is 1002. Now, the kill(pid, 0) will succeed, making it appear that dhcpd is already running, and dhcpd will exit.
How to fix this?
- Explicitly put the pid file under /tmp. Getting this right is fussy — make sure you avoid the race conditions associated with creating temp files. Use dhcpd’s “-pf” flag to tell it where to use the pid file. This avoids spurious “already running” messages, because dhcpd will never read a pid from an existing pid file. [You could also just remove the /var/run/dhcpd.pid file, but I'd rather explicitly provide the path in my startup script in case some dim bulb decides to change the compiled-in default.]
- Be careful in your restart code to kill any existing dhcpd (assuming you really want a new dhcpd), or avoid trying to start a new one (assuming you want to use an already running dhcpd).
pgrep(1)andpkill(1)will be useful here.
In researching this, I saw this bit of wisdom from Henning Brauer: “pid files are useless.”.
I heartily agree…