[Beowulf] HPL and problem size issue
Evgeniy Shapiro
shellinux at gmail.com
Thu Jun 29 07:05:50 PDT 2006
Dear All
I have a problem with running HPL benchmark on large problem sizes,
namely - the total memory on our cluster (15x (2xOpteron 275) nodes - 60
cores, 1Gb memory per core) is 60 Gb. This should allow for problem
sizes of up to 80,000, based on 80% of available memory. HPL, both
compiled standalone and as a part of HPCC crashes when the problem size
is such that the memory required exceeds 8Gb (which is by ?coincidence?
the total memory available on the master (submit) node). The error
messages are
running /home/shel/Benchmarks/RUN/./xhpl on 60 LINUX ch_p4 processors
Created /home/shel/Benchmarks/RUN/PI11051
p20_28603: (1845.558594) net_send: could not write to fd=15,
errno = 110
p12_29208: p4_error: interrupt SIGx: 13
p40_26441: p4_error: interrupt SIGx: 13
p48_28494: p4_error: interrupt SIGx: 13
P4 procgroup file is /home/shel/Benchmarks/RUN/PI11051.
p20_28603: p4_error: net_send write: -1
p23_28690: (1842.156250) net_send: could not write to fd=16, errno = 104
p4_error: latest msg from perror: Connection timed out
It seems that it is not an actual memory problem, since HPL crashes on
size of 40000, for example, where each process uses approx 345Mb.
P4_GLOBMEMSIZE is set to 134217728
Thanks for your help!
Evgeniy
Configuration is as follows:
mpich 1.2.7p1 - compiled with gcc 3.4.5
hpl 1.0a - compiled with gcc 3.4.5
ATLAS - compiled with gcc 3.4.5
HPL.dat:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
8 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
40000 Ns
1 # of NBs
60 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
10 Ps
6 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
2 # of recursive stopping criterium
2 4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
8 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
40000 Ns
1 # of NBs
60 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
10 Ps
6 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
2 # of recursive stopping criterium
2 4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
HPL.out:
============================================================================
HPLinpack 1.0a -- High-Performance Linpack benchmark -- January 20,
2004
Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
============================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 40000
NB : 60
PMAP : Row-major process mapping
P : 10
Q : 6
PFACT : Left Crout Right
NBMIN : 2 4
NDIV : 2
RFACT : Left Crout Right
BCAST : 1ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
----------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
1) ||Ax-b||_oo / ( eps * ||A||_1 * N )
2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )
3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
============================================================================
T/V N NB P Q Time Gflops
----------------------------------------------------------------------------
WR00L2L2 40000 60 10 6 456.11 9.355e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0137911 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0126099 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0024762 ...... PASSED
============================================================================
T/V N NB P Q Time Gflops
----------------------------------------------------------------------------
WR00L2L4 40000 60 10 6 451.57 9.449e+01
More information about the Beowulf
mailing list