scyld scyld on an ASUS A7V266-C
Jorge M. Pacheco
pacheco at cii.fc.ul.pt
Thu Jun 27 16:04:06 PDT 2002
Dear ALL,
Just to let you know the happy ending of a cluster upgrade story.
As I stated before, I had a small scyld beowulf cluster with AMD's
XP-1600+ & SDRAM PC133 running flawlessly for 6 months, 24h/day.
In view of this fantastic performance, we decided to buy some extra
nodes, and we got a good deal with brand new XP 2000+ & DDR PC2100.
We expected our task to involve the trivial upgrade for a scyld beowulf,
namely, that all we needed to do was to floppy-boot each node (no
cdroms, please) and beoboot-install the operating system on the HD's...
WRONG.
We started to have quite a few problems which, after some tweaking,
meant we could add the new nodes but they turned out to be quite
resilient to run whatever program you would submit to them - needless to
say, MPI programs would simply collapse.
Furthermore, slave node behaviour would depend sensitively on memory
timings & other bios setup parameters...
This would happen at the same time that all sorts of pings to the new
nodes from the main node would invariably give 0% packet loss.
Strange hein ? Also, if you take into account that when booting, the
only strange thing that would happen was the complaint "neighbour table
overflow"... this would drive you into the thought of a network problem...
Well, the truth is that, as an act of desperation, I decided to install
THE SAME scyld beowulf software in one of the new machines, and
transform this new machine into the main node.
Installation was perfect & smooth; at the end, all new nodes could be
added without a single complaint. Moreover, programs would now run
nicely (serial & parallel), so everyting went back to normal.
And what about the old nodes ?
Very well, I tried and... they did work fine. No complaints whatsoever.
So... if you decide for an expansion of your nice & stable scyld beowulf
machine, and if you start getting strange complaints, try & set the
fastest & most up-to-date hardware as main node, and all the rest as
daughter-nodes.
If it works for you the same way it worked for us, you're bound to
become a happy human again.
Cheers, J. M. Pacheco
More information about the Beowulf
mailing list