[Beowulf] hang-up of HPC Challenge
Mikhail Kuzminsky
kus at free.net
Wed Aug 20 10:52:51 PDT 2008
In message from Greg Lindahl <lindahl at pbm.com> (Tue, 19 Aug 2008
19:39:38 -0700):
>On Wed, Aug 20, 2008 at 03:45:43AM +0400, Mikhail Kuzminsky wrote:
>> For some localization of possible problem reason, I ran pure HPL
>>test
>> instead of HPCC. HPL performs direct output to screen instead of
>>writing
>> to the file.
>>
>> Using MPICH w/np=8 I obtained normal HPL result for N=35000 -
>>including
>> 3 "PASSED" strings for ||Ax-b|| calculations. BUT ! Linux hang-ups
>> immediately after output of this strings.
>
>Well, what did your configuration file tell HPL to do? Does it have
>another test, perhaps a bigger one, or is it supposed to exit? We
>aren't mind-readers.
Pls sorry: I performed now 2 HPL run cases for the same N=10000,
(1st) - "single" HPL run, i.e. ONE N=10000, ONE blocksize value, and
ONE any other HPL.dat parameter.
(2nd) - "multiple" HPL run w/same (one) N=10000 and blocksize=100, but
with a sets of PFACTS etc (see the output below).
1st run finished successfully, 2nd lead to Linux hang-up.
Yours
Mikhail
"single" HPL run :
HPLinpack 1.0a -- High-Performance Linpack benchmark -- January
20, 2004
Written by A. Petitet and R. Clint Whaley, Innovative Computing
Labs., UTK
============================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 10000
NB : 100
PMAP : Row-major process mapping
P : 2
Q : 4
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 16 double precision words
----------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
1) ||Ax-b||_oo / ( eps * ||A||_1 * N )
2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )
3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be
1.110223e-16
- Computational tests pass if scaled residuals are less than
16.0
============================================================================
T/V N NB P Q Time
Gflops
----------------------------------------------------------------------------
WR11C2R4 10000 100 2 4 23.32
2.859e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0767386 ......
PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0181586 ......
PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0040588 ......
PASSED
============================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
----------------------------------------------------------------------------
End of Tests.
============================================================================
[1]+ Done mpirun -np 8 xhpl
"multiple" HPL run:
HPLinpack 1.0a -- High-Performance Linpack benchmark -- January
20, 2004
Written by A. Petitet and R. Clint Whaley, Innovative Computing
Labs., UTK
============================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 10000
NB : 100
PMAP : Row-major process mapping
P : 2
Q : 4
PFACT : Left Crout Right
NBMIN : 2 4
NDIV : 2
RFACT : Left Crout Right
BCAST : 1ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 16 double precision words
----------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
1) ||Ax-b||_oo / ( eps * ||A||_1 * N )
2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )
3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be
1.110223e-16
- Computational tests pass if scaled residuals are less than
16.0
============================================================================
T/V N NB P Q Time
Gflops
----------------------------------------------------------------------------
WR00L2L2 10000 100 2 4 23.02
2.897e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0980967 ......
PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0232126 ......
PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0051885 ......
PASSED
============================================================================
T/V N NB P Q Time
Gflops
----------------------------------------------------------------------------
WR00L2L4 10000 100 2 4 22.97
2.903e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0832258 ......
PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0196937 ......
PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0044019 ......
PASSED
============================================================================
T/V N NB P Q Time
Gflops
----------------------------------------------------------------------------
WR00L2C2 10000 100 2 4 22.95
2.905e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.0980967 ......
PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0232126 ......
PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0051885 ......
PASSED
... and here Linux hangs ...
>
>-- greg
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list