[Beowulf] Anyone having IPMI problems on Intel S3200 series
Perry E. Metzger
perry at piermont.com
Wed Apr 15 13:51:57 PDT 2009
In our brand new cluster, we're using Intel S3210SH motherboards.
The boards are going to be managed by a pure hands off system I've
built. IPMI is used for tasks like monitoring and telling the boards to
PXE boot so they can be re-installed by a purely automated system when
software upgrades happen.
Unfortunately, every once in a while, the IPMI BMCs on my test systems
simply stop talking to the network. This isn't overly tragic since I can
have a process go over to such a board when it detects that pings have
stopped working and use a local IPMI command to cold rest the BMC, but
it is still really Not The Right Thing. Also, I suspect every once in a
great while I'll get a simultaneous OS and IPMI BMC failure and shoe
leather will be needed to reset the box, which I don't like.
I was wondering if anyone had experienced problems like this with the
S3200 series motherboards before, or with IPMI on any Intel
motherboard. If so, did you find any sort of resolution? Yes, I've
upgraded the BIOS to the latest available from Intel.
(As a pure aside, we're running caseless using a design I came up with
that I've not heard of before -- vertically mounted boards attached with
zip ties above and below to properly spaced metro shelving. I may post
about that some time. It is remarkably easy and painless to set this up,
and it requires no special parts of any sort.)
More information about the Beowulf