[Beowulf] Reasonable upper limit in kW per rack for air cooling?

Robert G. Brown rgb at phy.duke.edu
Mon Feb 14 03:18:27 PST 2005

On Sun, 13 Feb 2005, Jim Lux wrote:

> > I think you're within a factor of 2 or so of the SANE threshold at 10KW.
> > A rack full of 220 W Opterons is there already (~40 1U enclosures).  I'd
> > "believe" that you could double that with a clever rack design, e.g.
> > Rackable's, but somewhere in this ballpark...it stops being sane.
> >
> > > If you were designing a computer room today (which I am) what would
> > > you allow for the maximum power dissipation per rack _to_be_handled_
> > > by_the_room_A/C.  The assumption being that in 8 years if somebody
> > > buys a 40kW (heaven forbid) rack it will dump its heat through
> > > a separate water cooling system.
> >
> > This is a tough one.  For a standard rack, ballpark of 10 KW is
> > accessible today.  For a Rackable rack, I think that they can not quite
> > double this (but this is strictly from memory -- something like 4 CPUs
> > per U, but they use a custom power distribution which cuts power and a
> > specially designed airflow which avoids recycling used cooling air).  I
> > don't know what bladed racks achieve in power density -- the earlier
> > blades I looked at had throttled back CPUs but I imagine that they've
> > cranked them up at this point (and cranked up the heat along with them).
> >
> > Ya pays your money and ya takes your choice.  An absolute limit of 25
> > (or even 30) KW/rack seems more than reasonable to me, but then, I'd
> > "just say no" to rack/serverroom designs that pack more power than I
> > think can sanely be dissipated in any given volume. Note that I consider
> > water cooled systems to be insane a priori for all but a small fraction
> > of server room or cluster operations, "space" generally being cheaper
> > than the expense associated with achieving the highest possible spatial
> > density of heat dissipating CPUs.  I mean, why stop at water?  Liquid
> > Nitrogen.  Liquid Helium.  If money is no option, why not?  OTOH, when
> > money matters, at some point it (usually) gets to be cheaper to just

Keyword:                             ^^^^^^^

> > build another cluster/server room, right?

Sure, I agree with everything below, for bleeding edge work.  Or if
you're building a cluster in your Manhattan office, where for whatever
reason you have to work with a space the size of a broom closet (but
where you miraculously have access to a stream of chilled water, or
liquid nitrogen, or liquid helium).

This just (IMO) pushes you over some sort of magic threshold that (while
arbitrary and existing perhaps only in my fevered imagination) separates
"COTS clusters" from a "big iron supercomputer".  I have a hard time
seeing liquid cooled clusters as being a beowulf in the sense I have
grown to know and love.  COTS clusters have always been about being ABLE
to DIY, and while I can (if my life depends on it) do plumbing, it just
seems like there would be some highly nonlinear cost and hassle
thresholds in there.  

Also, I just cannot see COTS systems being built with copper pipes and
coupling valves where you hook them into your household or office
chilled water supply at your desk.  I suspect that COTS desktops and
even server mobos will continue to be engineered to be air cooled in the
forseeable future.

Now your observation that racks themselves may start coming with a pair
of copper pipes and couplings for a built-in blower and heat exchanger
-- so the rack itself is in some sense "liquid cooled", while the actual
nodes within are still COTS mobos cooled by air -- I don't know what the
cost and volume trade-offs are of this solution. Cooling the air in the
rack bases (more likely at the top of each rack and ducting the cold air
down to the base) vs cooling the air in a big liebert and piping the
cool air around to the bases in a raised floor -- hmmm.

One thing to remember (that I think was brought up one of the last times
this issue was raised on list -- I know from bitter experience that
water couplings are a PITA to reliably get, and keep, tight under
pressure.  When they leak ("when" because of Murphy), they're going to
make God's Own mess and potentially ruin many tens of thousands of
dollars worth of hardware.  Heat exchangers at the tops of racks also
increase the probability that humidity will be a problem -- I also know
from bitter experience that overhead cold air ducting has a tendency to
sweat unless carefully insulated, and the sweat in a humid climate like
NC will inevitably drip into whatever is below.  Heat exchangers at the
bottom make it harder to move the warm air exhausted at the rack tops
back to the bottom for recooling as you're working against an air
pressure/density convective flow differential and not with it.

Finally, there are likely to be Human Resources and state regulatory
issues with liquid cooled electronics -- systems and network engineers
somehow are viewed as being competent to manage end-stage electronics
from the plug point on even by the unions in all but the most rabid of
union shops (although I have heard of places where you have to call a
union employee in to do any major plugging or unplugging of certain
kinds o hardware).  That simply won't be the case with liquid cooled
hardware.  I may be able to work on my household plumbing (and wiring),
but if I set my hand to plumbing at Duke the HR Gods and the State would
get Angry, and if anything wet wrong (like a leak causing a short and a
fire) I would be Held Liable. This adds another project-staffing human
notch to the TCO -- likely a fairly significant one as the heat
exchanger/blowers in EACH rack might well need servicing and inspection
1-2x a year (as the room unit does now).

None of these things are insurmountable difficulties, and as you note
there are certain big, expensive pieces of hot hardware (big lasers,
giant magnets, automobile engines) that one DOES plug right into a
chilled water loop.  With the exception of car engines they tend to be
components with 6-8 figure price tags, though, where tacking on a full
or part time FTE for managing the plumbing etc is a small fraction of
the total marginal cost of operation.  I'd expect this to make sense
only for clusters in this same category -- really large, already
expensive clusters shooting for bleeding edge performance (top 10 of top
500) at very high density someplace where a) physical space is very
"expensive" (justifying the trade off economically); or b) speed of
light and/or interconnect lengths are indeed an issue.

Note that the fixing the latter will likely rely as much on moving out
of the COTS arena for the cluster interconnect as it does on cooling
alone.  High end cluster interconnects are again almost by definition
engineered on the assumption of air-cooled node densities and internode
latencies that are specified by worst-case assumptions and protocol, not
speed of light in the sense that interconnect length is an important
parameter in the overall latency.  As in 1 usec is pretty good latency
for a modern interconnect IIRC, and a light-usecond is 3x10^8 x 10^-6 =
300 meters.  I'd guess that very little of the internode latency over
fiber is due to speed of light delays per se and nearly all of it is in
the interconnects themselves, the switches, and the node bus interface.

> The speed of light starts to set another limit for the physical size, if you
> want real speed.  There's a reason why the old Crays are compact and liquid
> cooled.  It's that several nanoseconds per foot propagation delay.  Once you

There's also a reason why old Crays are currently used primarily as
lobby art, whereever they haven't been disassembled and bathed in
mercury to recover all that gold.  Several reasons, actually, but liquid
cooling and the hassle and expense it entailed are a big one.  Many a
Cray was finally decommissioned when one could build and operate a true
COTS cluster with as much or more raw horsepower for what it cost for
just the infrastructure support for the Cray it supplanted.  Like it or
not, Moore's Law biases cost-benefit solutions heavily towards the COTS
and disposable, and wet-cooling requires a significant and sustained
investment in a particular technology that is likely to remain
non-mainstream, human-resource intensive, and hence nonlinearly costly
in a TCO CBA.  One needs significant benefit in order to make it

> get past a certain threshold, you're actually better off going to very dense
> form factors and liquid cooling, in many areas.  I think that most clusters
> haven't reached the performance point where it's worth liquid cooling the
> processors, but it's probably pretty close to the threshold. Adding machine
> room space is expensive for other reasons.  You've already got to have the
> water chillers for any sort of major sized cluster (to cool the air), so the
> incremental cost to providing an appropriate interface to the racks and
> starting to build racks in liquid cooled configurations can't be far away.
> Liquid cooling is MUCH more efficient than air cooling: better heat
> transfer, better life (more even temperatures), less real estate required,
> etc.  The hangup now is that nobody makes liquid cooled PCs as a commodity,
> mass production item.  What you'll find is liquid cooling retrofits that
> don't take advantage of what liquid cooling can get you. If you look at high
> performance radar or sonar processors and such that use liquid cooling, the
> layout and physical configuration is MUCH different (partly driven by the
> fact that the viscosity of liquid is higher than air).
> Wouldn't YOU like to have, say, 1000 processors in one rack, with a  2-3"
> flexible pipe to somewhere else?  Especially if it was perfectly quiet? And
> could sit next to your desk?  (1000 processors*100W each is 100kW).

If somebody else paid for and fed the whole thing, you could multiply
the capacity by an order of magnitude and use liquid nitrogen for
cooling instead of water and I'd simply love it.  And as Austin Powers
might add, I'd like a gold-plated potty as well -- but I'm not going to
get it...;-)

Alas, in the real world it isn't about what I'd "like", it is about what
I can afford, about what I can convince a grant agency to pay for.  High
infrastructure costs come out of node count, and node count matters --
in many projects, it is the PRIMARY thing that matters.  High density
increases infrastructure costs, often nonlinearly, and hence decreases
node count at any fixed budget.  In order to for liquid cooling to ever
make sense for COTS clusters, it would have to BECOME COTS -- basically,
to become cheap in both hardware and human terms.  Might happen, might
happen, but I'm not holding my breath...


Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu

More information about the Beowulf mailing list