[Beowulf] hpl size problems

Thu Sep 22 10:13:21 PDT 2005

We have a 128 node cluster running redhat/rocks comprised of dell  
1850s with dual 3.4 GHz Xeons and 2 GB memory each.  The interconnect  
is a 4x infiniband nonblocking fabric.    HPL was built using Intel's  
mplpk MPP distribution which links the topspin mpich with the intel  
optimized em64t blas routines.   When running hpl, we found that we  
were able to get decent but not great performance and we seem to be  
limited by problem size.  We can reach about 1.1 Tflops with a full  
256 processor run but the problem size where swap space begins to be  
utilized is very small, around 80K.  With N=120K we are using all  
memory (real+virtual = 4MB) and the program crashes.   Theoretically,  
with 256 GBytes of memory we should be able to use a problem size of  
around 150K, assuming the OS uses about 1/4 of the RAM.   Similar  
clusters on the top500 list are able to obtain closer to 1.3 TBytes  
with a NMAX of around 150K.

  Any ideas?

Thanks

-Geoff

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Geoffrey W. Cowles, Research Scientist
School for Marine Science and Technology          phone: (508) 910-6397
University of Massachusetts, Dartmouth                 fax:        
(508) 910-6371
706 Rodney French Blvd.                  email: gcowles at umassd.edu
New Bedford, MA  02744-1221        http://codfish.smast.umassd.edu/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050922/bad003c1/attachment.html>