[Beowulf] High quality hardware

Daniel Fernandez daniel at labtie.mmt.upc.es
Mon May 24 12:52:07 PDT 2004


Hi you all,

We've recently suffered some unsuspected random failures during runs of
our CFD cases.

Our subsequent mpi tests also showed random results, the first were
countinuous "lamd" daemoun hangups. But the next day all nodes ran
almost fine with identical test suites. Moreover, now it's giving very
few checksum errors or "lamd" hangups in a really reduced set of all
nodes, really weird.

This headache makes us think that could be:

	-Continuous run 24h a day of common hardware not prepared to.
	-Defective mainboard/memory 
	-External inteferences/noise
	
The last argument seemed to gain our attention, in that case what would
be the best case material for shielding ?

Also, what kind of mainboard manufacturers do you trust most ? I'm
referring mainly to Socket A platform, we're currently using Asus and
MSI but alse Tyan seems a good option.

-- 
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Heat And Mass Transfer Center - CTTC
www.cttc.upc.edu
c/ Colom nº11
UPC Campus Industrials Terrassa , Edifici TR4





More information about the Beowulf mailing list