[Beowulf] followup on 1000-node Caltech cluster

Stuart Midgley sdm900 at gmail.com
Mon Jun 20 08:02:35 PDT 2005

> how much airflow (CFM) do you see from tiles in the front of your  
> racks?
> for 8kW, I'd expect maybe 8-900 CFM, or around 2 modestly-performing
> perforated tiles.  I'm reasonably happy with the tiles in my new  
> machineroom:
> about 600 CFM apiece, and placed 2/rack.
> the rule-of-thumb is 1 ton (3.5 KW) per tile, but that assumes  
> somewhat
> older, lower-flow tiles, I think.

I'm not sure how much air flow is coming through the tiles, the tiles  
have air ducts cut into them (really old machine room) which  
effectively has 1/3 of the tile completely open (slight gill over  
it).  There is 2 tiles work of "slots" in front of each rack of  
compute nodes.  About 5 degrees celcius difference between top and  
bottom nodes.

> how long are your rows?

3 rows of 11 racks (well, give or take a rack on each row).

> obviously the hot air will want to rise, but I suppose enough velocity
> will make it go where you want.  my machineroom is "half-ducted" as  
> well:
> downflow chillers, 16" raised floor acting as a cold air plenum, but
> open space above for hot/return air.  it's a bit of a risk - it would
> offer a lot more control to have a suspended ceiling close to the top
> of the racks, with the supra-ceiling space acting as a return plenum.

About a 60cm raised floor and about the same amount of head room.

The new system (SGI Altix Bx2 - large beowulf style cluster) has  
chilled water radiators(?) in the back of each rack, which prevents  
the hot air from the compute nodes hitting the room.  Works very very  
well.  The air coming out the back of the racks is actually colder  
than the intake air.

> that sounds surprisingly slow.  our older machineroom had only  
> about 30KW
> in it, but it was fairly small.  when cooling was lost, we went up  
> >15 C
> in <5 minutes.

The machine room is very large.

> interestingly, there's no real point to keeping up compute nodes  
> via UPS
> unless you also have the chillers+blowers on UPS or automatic  
> generator.
> in fact, all of our new machines (~6K cpus across 4 large clusters)  
> have
> UPS-less compute nodes.

Agreed.  Only servers and disks on UPS.

Having chillers on UPS can make some real sense.  The new system  
generates about 400kW of heat (along with about 100kW of other  
equipment)... loose the chilled doors at the back of the rack and you  
could be in serious trouble.  You need to shut those nodes down in a  
real hurry.

I think power/air conditioning is a major concern for even small  
linux clusters.  I've seen some real disasters, even just by putting  
a small 20 node cluster into an under-spec machine room.  Power the  
nodes on and the power circuits were happy... until they started  
generating heat (on boot) and the air conditioner kicked in... which  
tripped the whole machine room.  UPS start squeeling, servers start  
crashing... raid trays start going down.  Took the group several  
weeks to fully recover.

Dr Stuart Midgley
Industry Uptake Program Leader - iVEC
26 Dick Perry Avenue, Technology Park
Kensington WA 6151

Phone: +61 8 6436 8545
Fax: +61 8 6436 8555
Email: stuart.midgley at ivec.org
WWW:  http://www.ivec.org

More information about the Beowulf mailing list