[Beowulf] HPL and problem size issue

Thu Jun 29 07:05:50 PDT 2006

Dear All

I have a problem with running HPL benchmark on large problem sizes,
namely - the total memory on our cluster (15x (2xOpteron 275) nodes - 60
cores, 1Gb memory per core) is 60 Gb. This should allow for problem
sizes of up to 80,000, based on 80% of available memory. HPL, both
compiled standalone and as a part of HPCC crashes when the problem size
is such that the memory required exceeds 8Gb (which is by ?coincidence?
the total memory available on the master (submit) node). The error
messages are

running /home/shel/Benchmarks/RUN/./xhpl on 60 LINUX ch_p4 processors
Created /home/shel/Benchmarks/RUN/PI11051
        p20_28603: (1845.558594) net_send: could not write to fd=15,
errno = 110
p12_29208:  p4_error: interrupt SIGx: 13
p40_26441:  p4_error: interrupt SIGx: 13
p48_28494:  p4_error: interrupt SIGx: 13
P4 procgroup file is /home/shel/Benchmarks/RUN/PI11051.
p20_28603:  p4_error: net_send write: -1
p23_28690: (1842.156250) net_send: could not write to fd=16, errno = 104
    p4_error: latest msg from perror: Connection timed out

It seems that it is not an actual memory problem, since HPL crashes on
size of 40000, for example, where each process uses approx 345Mb.

P4_GLOBMEMSIZE is set to 134217728

Thanks for your help!

Evgeniy

Configuration is as follows:

mpich 1.2.7p1 - compiled with  gcc 3.4.5
hpl 1.0a -  compiled with  gcc 3.4.5
ATLAS  - compiled with gcc 3.4.5

HPL.dat:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
40000        Ns
1            # of NBs
60           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
10            Ps
6            Qs
16.0         threshold
3            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
2            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
3            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
40000        Ns
1            # of NBs
60           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
10            Ps
6            Qs
16.0         threshold
3            # of panel fact
0 1 2        PFACTs (0=left, 1=Crout, 2=Right)
2            # of recursive stopping criterium
2 4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
3            # of recursive panel fact.
0 1 2        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
0            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

HPL.out:
============================================================================
HPLinpack 1.0a  --  High-Performance Linpack benchmark  --   January 20,
2004
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Labs.,  UTK
============================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   40000
NB     :      60
PMAP   : Row-major process mapping
P      :      10
Q      :       6
PFACT  :    Left    Crout    Right
NBMIN  :       2        4
NDIV   :       2
RFACT  :    Left    Crout    Right
BCAST  :   1ring
DEPTH  :       0
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

----------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
   1) ||Ax-b||_oo / ( eps * ||A||_1  * N        )
   2) ||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  )
   3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be          1.110223e-16
- Computational tests pass if scaled residuals are less than           16.0

============================================================================
T/V                N    NB     P     Q               Time             Gflops
----------------------------------------------------------------------------
WR00L2L2       40000    60    10     6             456.11          9.355e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1  * N        ) =        0.0137911 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1  * ||x||_1  ) =        0.0126099 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) =        0.0024762 ...... PASSED
============================================================================
T/V                N    NB     P     Q               Time             Gflops
----------------------------------------------------------------------------
WR00L2L4       40000    60    10     6             451.57          9.449e+01