[Beowulf] High quality hardware

Daniel Fernandez daniel at labtie.mmt.upc.es
Tue May 25 12:40:25 PDT 2004


On Tue, 2004-05-25 at 18:14, Mark Hahn wrote:
> > countinuous "lamd" daemoun hangups. But the next day all nodes ran
> > almost fine with identical test suites. Moreover, now it's giving very
> 
> but do you have monitoring of, for instance, temperature?  lack of
> repeatability will make your life very difficult.

I'm running continuous tests, and saving results , there's only one node
out of range running a CPU temperature of 60ºC ( crashed powersource fan
) but its *not* on the group of nodes that causes most of problems.

> > 	-Continuous run 24h a day of common hardware not prepared to.
> 
> which generally equates to temperature.
> 
> > 	-Defective mainboard/memory 
> 
> but what you describe is a degredation, no?  that is, it used to work fine,
> but now, sometimes intermittently fails?
> 
> > 	-External inteferences/noise
> 
> are these systems based on bare boards?  or multiple boards per chassis?
> 
> > The last argument seemed to gain our attention, in that case what would
> > be the best case material for shielding ?
> 
> mu-metal, I suppose.  but that's rather extreme!  are you proposing some 
> kind of EMF interference *through* the case?  or some kind of exotic noise
> 
> > Also, what kind of mainboard manufacturers do you trust most ? I'm
> 
> I go with "A-list" vendors: recognizable vendors, preferably not entirely
> focused on either low-end or gamer markets.  asus, tyan, supermicro, msi,
> celestica, hp, ibm, apple, dell, etc.
> 
> > referring mainly to Socket A platform,
> 
> ah.  I wonder if that's your problem, then.  socket A has always had a rep
> for being somewhat fiddly to run stably, and to keep cool.  the latter is 
> presumably just because the chips dissipate a fair amount of power, and need
> rather good contact with a rather good heatsink.  unlike intel or recent AMD
> systems, which have builtin heatspreaders.  still, if you have properly
> mounted, fan-working, copper heatsinks with good through-case airflow,
> I'd think you could expect stable behavior.

Our CPUs ( XP 2600+ Thoroughbred/Barton ) are working at a temperature
range of 44-48 Cº and our ambient temperature is about 21-23 ºC ... I
don't consider it too high, maybe I am wrong and this range of
temperatures don't assure a 100% fail-free 365x24h run.


> > we're currently using Asus and
> > MSI but alse Tyan seems a good option.
> 
> those work for me.  oddly, I'm getting feedback from OEM channels that Tyan
> is having trouble stocking/delivering products.  that's kind of worrisome,
> since I tend to like their products...
> 
> regards, mark hahn.
> 
> 

Cheers.

-- 
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Heat And Mass Transfer Center - CTTC
www.cttc.upc.edu
c/ Colom nº11
UPC Campus Industrials Terrassa , Edifici TR4




More information about the Beowulf mailing list