[Beowulf] let's standardize liquid cooling
hahn at mcmaster.ca
Fri Sep 28 10:58:35 PDT 2012
I have a modest proposal:
standardize the location of liquid-cooled cold plates in each 19" rack.
nodes would have internal heatpipes from heat sources (presumably CPUs
mostly) to plates along the sides to mate/contact with the rack.
I have an aging machineroom with ~50 racks, with the compute racks
dissipating something like 11KW aircooled. total dissipation a little
under 300KW. contemplating upgrades, it would need to go to about
1MW to be viable, and should be smart about cooling (free cooling,
or pre-heating the building's air intake during winter.) our current
setup is reasonably tuned (well-partitioned air, PUE of 1.3-1.4.)
we happen to be located in a southerly part of Canada, and would
definitely need active cooling during summer. but the real point of
using liquid cooling right to the CPU is that it would reduce the
overall thermal resistance, and permit higher outgoing temperatures,
which then stand more of a chance of being interesting, utility-wise.
it seems to me like this would be doable - vendors currently each
have a slightly different node-rail combination, but I don't think this
is really perceived as a competitive advantage. they would presumably
have to accept a standard design (perhaps just a literal rail with no
ball-bearing widgets to vertically position the node. I personally
find little or no value in the ability to pull out a node and have it
hang in place.
I don't know whether there would have to be some kind of clamping mechanism
to put pressure on the node, improving the plate-to-plate contact.
I guess that would need to be per-U, which is a bit of a pain.
internally, a node could use heatpipes or possibly small pumps.
how the rack-mounted plates are cooled would be available for innovation
(in-rack DX cooling might be attractive, though for bigger installations,
presumably some center-wide circulation of glycol/etc would make sense.)
there is some vendor-specific activity along these lines, and a fairly
long history of per-rack heat exchangers. obviously, avoiding vendor lockin
is hugely attractive, especially to a Beowulf mindset. vendors, of course
love lockin, but have accepted standards in various ways (Intel's PSU
standards, JEDEC ram, etc)
it also seems so obviously a win to avoid air as a means of transferring heat
from the cpu to the rackmounted dx coil mere inches away...
More information about the Beowulf