[Beowulf] High quality hardware

Daniel Fernandez daniel at labtie.mmt.upc.es
Mon May 24 12:52:07 PDT 2004

Hi you all,

We've recently suffered some unsuspected random failures during runs of
our CFD cases.

Our subsequent mpi tests also showed random results, the first were
countinuous "lamd" daemoun hangups. But the next day all nodes ran
almost fine with identical test suites. Moreover, now it's giving very
few checksum errors or "lamd" hangups in a really reduced set of all
nodes, really weird.

This headache makes us think that could be:

	-Continuous run 24h a day of common hardware not prepared to.
	-Defective mainboard/memory 
	-External inteferences/noise
The last argument seemed to gain our attention, in that case what would
be the best case material for shielding ?

Also, what kind of mainboard manufacturers do you trust most ? I'm
referring mainly to Socket A platform, we're currently using Asus and
MSI but alse Tyan seems a good option.

Daniel Fernandez <daniel at labtie.mmt.upc.es>
Heat And Mass Transfer Center - CTTC
c/ Colom nº11
UPC Campus Industrials Terrassa , Edifici TR4

More information about the Beowulf mailing list