[Beowulf] HPCC MPIRandomAccess Error

James Evans iamjamesevans at gmail.com
Tue Oct 17 14:19:58 PDT 2006


I am testing a cluster using HPLinpack -

HPLinpack 1.0a  --  High-Performance Linpack benchmark  --   January 20, 2004
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Labs.,  UTK

and every so often, the test will stop at MPIRandomAccess.

Most recently, this has happened after 70 hours of running, however,
it has also happened after 10 hours, or even 10 minutes on rare
occassions.

I simply receive "Begin of MPIRandomAccess section." at the end of the
log file 'hpccoutf.txt'.

If I look at the nodes, HPCC is still shown to be running, but no CPU
is being used. I have noticed that MPIRandomAccess runs different
tests depending if the number of CPUs is equal to a power of two, but
this makes no difference as to whether it successfully runs or not.

Has anyone seen this before? Any ideas on how to debug the problem?

Thanks!

PS. Here is my hpccinf.txt:

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
10000      Ns
1            # of NBs
184          NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
4      Ps
4      Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                      		Number of additional problem sizes for PTRANS
1200 10000 30000        	values of N
0                       	number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64       	values of NB



More information about the Beowulf mailing list