[Beowulf] evaluating FLOPS capacity of our cluster

Gus Correa gus at ldeo.columbia.edu
Mon May 11 14:56:43 PDT 2009

Hi Tom, Greg, Rahul, list

Tom Elken wrote:
>> On Behalf Of Rahul Nabar
>> Rmax/Rpeak= 0.83 seems a good guess based on one very similar system
>> on the Top500.
>> Thus I come up with a number of around 1.34 TeraFLOPS for my cluster
>> of 24 servers.  Does the value seem reasonable ballpark? Nothing too
>> accurate but I do not want to be an order of magnitude off. [maybe  a
>> decimal mistake in math! ]
> You're in the right ballpark.  
> I recently got 0.245 Tflops on HPL on a 4-node version of what you have 
> (with Goto BLAS), so 6x that # is in the same ballpark as your 
> 1.34 TF/s estimate.  
> My CPUs were 2.3 GHz Opteron 2356 instead of your 2.2 GHz.  
> Greg is also right on the memory size being a factor allowing larger N 
> to be used for HPL.  
> I used a pretty small N on this HPL run since we were running it 
 > as part of a  HPC Challenge suite run,
> and a smaller N can be better for PTRANS if you are interested 
> in the non-HPL parts of HPCC (as I was).

I have 16GB/node, the maximum possible is 128GB for this motherboard.

I have tried only two problem sizes: N=50,000, and N=196,000,
which is approximately the maximum the cluster can run without
memory swap.
(HPL suggests aiming at 80% of memory, as a rule of thumb).

It is true that performance at large N (1.49Tflops, Rmax/Rpeak=83.6%)
is much better than at small N (1.23Tflops, Rmax/Rpeak=70%).

However, here is somebody that did an experiment with increasing
values of N, and his results suggest that performance increases 
logarithmically with problem size (N), not linearly,
saturating when you get closer to the maximum possible for your
current memory size.


Of course, your memory size is how much you have, but could be as
large as your motherboard (and your budget) allows it to be.

Questions for the HPL experts:

Would I get a significant increase in performance if the nodes were
outfitted with the maximum of 128GB of RAM each,
instead of the current 16GB?
Would I get, say, Rmax/Rpeak=90% or better?

>> All 64 bit machines with a dual channel
>> bonded Gigabit ethernet interconnect. AMD Quad-Core AMD Opteron(tm)
>> Processor 2354.
> As others have said, 50% is a more likely HPL efficiency for a large GigE 
cluster, but with your smallish cluster (24 nodes) and bonded channels,
you would probably get closer to 80% than 50%.

Thank you.
That clarifies things a bit.
Are "bonded channels" what you get in a single switch?
So, it is "small is better", right?  :)
How about Infiniband, would the same principle apply,
a small cluster with a single switch being more efficient than a large
one with stacked switches?

Thank you,
Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

> -Tom
>> PS.  The Athelon was my typo, earlier sorry!
>> --
>> Rahul
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list