[Beowulf] RE: S2466 systems won't reboot after linux poweroff
David Mathog
mathog at mendel.bio.caltech.edu
Wed Dec 15 10:01:11 PST 2004
> Problem: each node in a 20 node beowulf typically will
> not reboot following a linux poweroff command. Power comes
> back on, but it never even shows the BIOS screens.
>
> Hardware:
>
> S2466 MPX mobo
Two of these nodes are flakey and aren't in the compute pool.
These were both upgraded to BIOS v4.06. This DID resolve
the problem with a "poweroff" followed by "turning power switch
on" not rebooting. In other words, they now boot as they should
following a poweroff/power switch on cycle. The oddball
message cited in the first post that comes out the serial line
at the end of "poweroff" remains.
Tests:
"poweroff" followed by "power switch on": worked 5/5 times
"reboot": worked 5/5 times
However, the new BIOS didn't make these two nodes any more
stable - they still crash at about the same rate.
Conclusion, it might be worth the effort to upgrade the BIOS
if your cluster is down for some reason anyway.
WARNING1. All my nodes seemed to "forget" how to read floppy disks.
If the nodes had been up for a while and then were rebooted,
and a known good floppy placed in the drive,
they would NOT boot from it. If, however, while the node was up,
the same floppy was put into the drive and explicitly mounted,
listed, and unmounted a couple of times, THEN on the subsequent
reboot the system could read from the floppy. I've never seen
this on any other system (Tyan's are just full of suprises :-( ).
Subsequent to the V4.06 upgrade these nodes seem to recognize
the floppy better and so far have not had any problems
rebooting directly from a floppy without the kludge
described above. However, if you are at Bios V4.03
(which is what they were at, not V4.01 as I had previously
posted) you may have the same problems booting from floppy
in order to do the BIOS upgrades. So either flash from the
net (I have no idea how) or verify that your floppy drives work
before rebooting the nodes to be upgraded.
WARNING2: update with:
>phlash16 244v406.rom
left the BIOS settings as they were. But:
>flash
which ran flash.bat, WIPED the BIOS settings.
WARNING3: These are BIOS settings seem to be equivalent:
v4.03 v4.06
quickboot enabled disabled
diagnostic disabled disabled
summary disabled disabled
If quickboot is enabled in v4.06 it appears to skip the
BIOS memory test entirely. It boots MUCH faster but you
may have a hard time ever getting in an F2 to get back
to change the BIOS.
Regards,
David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
More information about the Beowulf
mailing list