[Beowulf] High quality hardware
Daniel Fernandez
daniel at labtie.mmt.upc.es
Tue May 25 12:40:25 PDT 2004
On Tue, 2004-05-25 at 18:14, Mark Hahn wrote:
> > countinuous "lamd" daemoun hangups. But the next day all nodes ran
> > almost fine with identical test suites. Moreover, now it's giving very
>
> but do you have monitoring of, for instance, temperature? lack of
> repeatability will make your life very difficult.
I'm running continuous tests, and saving results , there's only one node
out of range running a CPU temperature of 60ºC ( crashed powersource fan
) but its *not* on the group of nodes that causes most of problems.
> > -Continuous run 24h a day of common hardware not prepared to.
>
> which generally equates to temperature.
>
> > -Defective mainboard/memory
>
> but what you describe is a degredation, no? that is, it used to work fine,
> but now, sometimes intermittently fails?
>
> > -External inteferences/noise
>
> are these systems based on bare boards? or multiple boards per chassis?
>
> > The last argument seemed to gain our attention, in that case what would
> > be the best case material for shielding ?
>
> mu-metal, I suppose. but that's rather extreme! are you proposing some
> kind of EMF interference *through* the case? or some kind of exotic noise
>
> > Also, what kind of mainboard manufacturers do you trust most ? I'm
>
> I go with "A-list" vendors: recognizable vendors, preferably not entirely
> focused on either low-end or gamer markets. asus, tyan, supermicro, msi,
> celestica, hp, ibm, apple, dell, etc.
>
> > referring mainly to Socket A platform,
>
> ah. I wonder if that's your problem, then. socket A has always had a rep
> for being somewhat fiddly to run stably, and to keep cool. the latter is
> presumably just because the chips dissipate a fair amount of power, and need
> rather good contact with a rather good heatsink. unlike intel or recent AMD
> systems, which have builtin heatspreaders. still, if you have properly
> mounted, fan-working, copper heatsinks with good through-case airflow,
> I'd think you could expect stable behavior.
Our CPUs ( XP 2600+ Thoroughbred/Barton ) are working at a temperature
range of 44-48 Cº and our ambient temperature is about 21-23 ºC ... I
don't consider it too high, maybe I am wrong and this range of
temperatures don't assure a 100% fail-free 365x24h run.
> > we're currently using Asus and
> > MSI but alse Tyan seems a good option.
>
> those work for me. oddly, I'm getting feedback from OEM channels that Tyan
> is having trouble stocking/delivering products. that's kind of worrisome,
> since I tend to like their products...
>
> regards, mark hahn.
>
>
Cheers.
--
Daniel Fernandez <daniel at labtie.mmt.upc.es>
Heat And Mass Transfer Center - CTTC
www.cttc.upc.edu
c/ Colom nº11
UPC Campus Industrials Terrassa , Edifici TR4
More information about the Beowulf
mailing list