[Beowulf] Power draw of cluster nodes under heavy load

Joe Landman landman at scalableinformatics.com
Mon Jul 28 12:07:27 PDT 2014

On 7/28/14, 2:55 PM, Prentice Bisbal wrote:
> On 07/28/2014 01:29 PM, Jeff White wrote:
>> Power draw will vary greatly depending on many factors.  Where I am 
>> at we currently have 16 racks of HPC equipment (compute nodes, 
>> storage, network gear, etc.) using about 140kVA but can use up to 160 
>> kVA.  A single rack with 26 compute nodes each with 64 cores worth of 
>> AMD 6276 (Supermicro boxes) is using about 18 kW across the PDUs, 3 
>> phase at 240 volts, with most of the nodes at 100% CPU usage.
> Agreed there's a lot of variability. Since I don't exactly what's 
> going in my new space yet, I'm looking for everyone's input to come up 
> with an average, or ballpark amount. the 5 - 10 kW one vendor 
> specified seems waaaay too low for a rack of high-density HPC nodes 
> running at or near 100% utilization.

Seriously, don't design for average, shoot for worst case scenario. 
Nothing suck so much as having too low of a power or cooling budget and 
a big new shiny that can't be fully turned on thanks to that.

I can't speak to what other vendors say/do in this regard, but I can say 
that we try to make sure we never use more than 50% of the capacity of 
any particular PDU, and that the PDUs have enough head room to be able 
to handle sudden loads (say one of the PDUs falling over).

We've had a situation (years ago) where we were pressed not to 
"over-spec" the power, and despite our protests, this is what was 
installed.  First time a PDU tripped a breaker (did I mention that they 
overloaded our original design? No? Well ...), all the load hit the 
second PDU, full force.  This was not pretty.

The cost to "over spec" is in the noise relative to the opportunity cost 
for under spec'ing, not to mention the "additional" cost of more power 
(and cooling ... don't forget the cooling!).

You can set the maximum boundary on power pretty easily with maximum 
draw per node and basic math.  This ignores inrush current and power, 
but lets assume you do a phased power on (1-3 second intervals between 
nodes).  If you want to hit all the power buttons at once, just make 
sure you have enough headroom for that inrush.

Its not a dark art per se, but be quite aggressive in what you think 
your power draws are going to be.   Use that to set your upper bound, 
and assume you don't want to run your PDUs to 75% capacity normally 
(though under extreme load with half of your other PDUs offline, this 
isn't a bad target).

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
twtr : @scalableinfo
phone: +1 734 786 8423 x121
cell : +1 734 612 4615

More information about the Beowulf mailing list