[Beowulf] delayed savings time crashes
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Kewley kewley at gps.caltech.eduWed Apr 12 10:42:31 PDT 2006
- Previous message: [Beowulf] delayed savings time crashes
- Next message: [Beowulf] delayed savings time crashes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
David, The reboots were due to a City of Pasadena power glitch at 9:17 that morning. :) It was raining, and a 34kV city feeder line that runs between the generating plant at the entrance of the 110 and a substation at Del Mar & Los Robles faulted. The responsible breaker took 13 cycles to break, during which time the single-phase voltage seen at Caltech dropped to about 75V. This info comes from the responsible EE at Caltech. As for its effects, believe me, I know about it the hard way, as it took down 2/3 of our compute nodes, 1/3 of our disk shelves, and 3/4 of our fileservers. Our UPS has been on bypass these past 6+ months as we wait for our UPS vendor to install a fix so that the UPS can handle the tendency of our computer power supplies' internal Power Factor Correction feedback circuitry to lock up & induce massive 12Hz oscillations on the room's power lines. As for the time glitch, that is probably induced by the fact that Daylight Savings Time changes only take place on the "system" clock, and in a standard Red Hat system those changes only get synced to the hardware clock upon a clean shutdown. So if your machine crashes after a DST change, then upon bootup syslogd gets its time from the hardware clock, which is wrong. The system clock is only corrected later in the bootup sequence, when ntpd starts. The best solution is probably to set the hardware clock to UCT rather than local time. UCT doesn't undergo step changes like most timezones in the U.S. do, so the compensation for DST happens dynamically in software, rather than requiring a hardware clock change. David On Wednesday 12 April 2006 09:05, David Mathog wrote: > This is an odd one. I just realized that 9 of 20 nodes > rebooted on Apr 4. (Since they all rebooted successfully everything > was working and there was no reason to think that this had > taken place.) This appears to be related to the daylights > savings time change two days before. The reason I think that is > that the nodes that rebooted have /var/log/messages files like: > > Apr 4 08:01:00 nodename CROND ... /cron/hourly > Apr 4 09:01:00 nodename CROND ... /cron/hourly > Apr 4 08:24:33 nodename syslogd 1.4.1; restart > > Notice the time shift backwards between the last normal > record and the first reboot record. > > As if it finally caught on that the clock had changed and that > somehow triggered a reboot. Unfortunately none of the log files > contain a message that indicated exactly what it was that ordered > the reboot. > > Unclear to me what piece of software could have triggered this. > Presumably something that had it's own clock stuck one hour off > on the previous time standard and also has the ability to restart > the system. ntpd? Ganglia? They were both running. > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] delayed savings time crashes
- Next message: [Beowulf] delayed savings time crashes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
