[Beowulf] let's standardize liquid cooling

Mark Hahn hahn at mcmaster.ca
Fri Sep 28 10:58:35 PDT 2012


I have a modest proposal: 
standardize the location of liquid-cooled cold plates in each 19" rack. 
nodes would have internal heatpipes from heat sources (presumably CPUs
mostly) to plates along the sides to mate/contact with the rack.

I have an aging machineroom with ~50 racks, with the compute racks 
dissipating something like 11KW aircooled.  total dissipation a little
under 300KW.  contemplating upgrades, it would need to go to about 
1MW to be viable, and should be smart about cooling (free cooling,
or pre-heating the building's air intake during winter.)  our current
setup is reasonably tuned (well-partitioned air, PUE of 1.3-1.4.)

we happen to be located in a southerly part of Canada, and would 
definitely need active cooling during summer.  but the real point of 
using liquid cooling right to the CPU is that it would reduce the 
overall thermal resistance, and permit higher outgoing temperatures,
which then stand more of a chance of being interesting, utility-wise.

it seems to me like this would be doable - vendors currently each 
have a slightly different node-rail combination, but I don't think this
is really perceived as a competitive advantage.  they would presumably
have to accept a standard design (perhaps just a literal rail with no
ball-bearing widgets to vertically position the node.  I personally
find little or no value in the ability to pull out a node and have it 
hang in place.

I don't know whether there would have to be some kind of clamping mechanism 
to put pressure on the node, improving the plate-to-plate contact.
I guess that would need to be per-U, which is a bit of a pain.

internally, a node could use heatpipes or possibly small pumps.
how the rack-mounted plates are cooled would be available for innovation
(in-rack DX cooling might be attractive, though for bigger installations,
presumably some center-wide circulation of glycol/etc would make sense.)

there is some vendor-specific activity along these lines, and a fairly 
long history of per-rack heat exchangers.  obviously, avoiding vendor lockin
is hugely attractive, especially to a Beowulf mindset.  vendors, of course
love lockin, but have accepted standards in various ways (Intel's PSU 
standards, JEDEC ram, etc)

it also seems so obviously a win to avoid air as a means of transferring heat 
from the cpu to the rackmounted dx coil mere inches away...

comments?


More information about the Beowulf mailing list