Dual Athlon MP 1U units

Velocet math at velocet.ca
Sat Jan 26 22:34:29 PST 2002

On Sat, Jan 26, 2002 at 06:20:10PM -0500, Robert G. Brown's all...
> On Sat, 26 Jan 2002, W Bauske wrote:
> That's basically why I worry about 1U duals.  In principle they'll work
> -- keep the outside air cool, pull as much cold air through the cases as
> you can possibly arrange, keep the air clean (so the fans don't clog),
> monitor thermal sensors and kill if they start getting too hot.  You can
> see, though, that they are a design that taunts Murphy's Law.  Not too
> robust.  A little thing like an AC blower motor that blows a circuit
> breaker at 3 am can reduce your $65K rack of hardware to a pile of junk
> in the thirty minutes it takes you to find out and do something about
> it, if you don't have fully automated (and functioning) shutdown setup.

This sounds like you shouldnt have closed boxes at all - why not much more
open cases instead, so that in case some big critical fan somewhere does shut
down, then you arent risking a meltdown of your entire cluster... if its 
at least slightly open to the air of the room its in, hopefully regular 
convection or other air currents would be enough to keep things cool.

This makes a case (*ahem*) for a thermal power switch placed inside the rack -
if its 50C (or whatever) in the rack, its time to cut the power - I am sure
these things exist and shouldnt be too expensive. Anyone using them?
> Not that a stack of 2U duals is MUCH better.  It's still hot -- we have
> 1800 XP's and probably will have more like 150-160W/box.  If we only put
> 12 per rack, though, we can leave gaps between the cases and get some
> cooling from the surfaces of the cases and in any event the cases have

In case the fans in the case fail, you mean...?

> much larger air volumes, more room for air to flow through, and more
> room for bigger fans.  With luck we'll have SOME time to react (or for
> our automated sentries to react) if the room AC fails and the power
> doesn't.

Why not custom mount a large number of boards in a common space with
a similar number of fans? Then if 1 or 2 (or half of the) fans fail,
there arent 1 or 2 or more boards risking burnout due to 0 cooling - 
instead, all the boards involved in that enclosure are sharing half
the cooling - half being better than none, and half being great when you
put in 3 times the airflow than was actually required. Im sure people
have though of this before, and there's a reason why its not more
popular. Just wondering what all your experience is out there.

> relatively low density.  In three years, we may need to start repacking
> or replacing with more tightly packed nodes as we grow, but in the
> meantime we'll enjoy slightly reduced risk and greater robustness of
> design.

In 3 years we'll hopefully have CPUs that burn 10W at 5GHz instead! :)


>    rgb
> -- 
> Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

More information about the Beowulf mailing list