Clarification: [Beowulf] hpl - large problems fail
Paul Johnson
redboots at ufl.edu
Thu Mar 10 14:56:16 PST 2005
Guy Coates wrote:
>On Thu, 10 Mar 2005, Paul Johnson wrote:
>
>
>
>>All:
>>
>>I have a 4 node cluster(dont snicker :) )
>>
>>
>
>Everyone starts off small.
>
>and Im trying to do some
>
>
>>benchmarking with HPL. I want to test 2 of the nodes with 1Gb of
>>ram each. I calculated the maximum problem size that can fit in 2Gb
>>and still allow for memory for the operating system. That came out to
>>be around 14500x14500. When I run that size of a test it always fails.
>>The largest problem that I can test and not have it fail on me is
>>12500x12500.
>>What is the reason behind this? Im confused on what is going on here.
>>Thanks for any help.
>>
>>
>
>
>Do you know what actually caused the failure?
>
>If your problem size was too big, and you are really out of memory, you
>should see some messages in the system log saying the out-of-memory-killer
>was activated and HPL was zapped.
>
>If you know your machines was not actually out of memory, then you have
>broken hardware on one of your nodes. Run memtest+ or memtest on your
>nodes (Possibly the world's most useful pieces of diagnostic software).
>
>http://www.memtest86.com
>http://www.memtest.org
>
>
>If you haven't seen it, IBM have a redpaper on tuning HPL, which gives
>some good starting parameters, problem-sizing tips and an overview of
>different BLAS libraries you can compile against to get that extra few
>Gflops of performance.
>
>Cheers,
>
>Guy
>
>
>
I should have been more clearer in my description. It doesn't fail at
the command prompt when I run it. It fails when it checks the solution
to linear equations. The residual is too high and fails. This is part
of the data from my HPL.out file:
============================================================================
T/V N NB P Q Time Gflops
----------------------------------------------------------------------------
WC12R2L4 14500 64 1 2 388.43 5.233e+00
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 284363.4669186 ...... FAILED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 210262.3627204 ...... FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 41377.6398965 ...... FAILED
||Ax-b||_oo . . . . . . . . . . . . . . . . . = 0.001692
||A||_oo . . . . . . . . . . . . . . . . . . . = 3708.772315
||A||_1 . . . . . . . . . . . . . . . . . . . = 3695.221759
||x||_oo . . . . . . . . . . . . . . . . . . . = 6.847285
||x||_1 . . . . . . . . . . . . . . . . . . . = 19610.120504
============================================================================
Sorry for the confusion,
Paul
--
Paul Johnson
Graduate Student - Mechanical Engineering
University of Florida - Gainesville, Fl
http://plaza.ufl.edu/redboots
Reclaim Your Inbox!
http://www.mozilla.org/products/thunderbird
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050310/9349bbcf/attachment.html>
More information about the Beowulf
mailing list