[Beowulf] p4_error: interrupt SIGSEGV: 11 Killed by signal 2.
cjoung at tpg.com.au
cjoung at tpg.com.au
Thu Oct 21 23:02:17 PDT 2004
Firstly,
Thank you to Mark H, Anthony S and in particular Henry G for your helpful
comments on my
MPI/scaLAPACK problem. After reading your comments, I removed MPICH2, installed
mpich1.2.6 and this did indeed fix the "Null Comm Pointer" error!
> Date: Tue, 19 Oct 2004 14:21:31 -0700
> From: "Gabb, Henry" <henry.gabb at intel.com>
> Subject: Re: [Beowulf] MPI & ScaLAPACK:
> error in MPI_Comm_size: Invalid communicator
> The "Null Comm pointer" error that you're seeing is almost always due to
> a header mismatch. I don't think Intel Cluster MKL 7.0 supports MPICH2 yet.
>
> > Error message:
> > aborting job: Fatal error in MPI_Comm_size: Invalid communicator, error stack:
> > MPI_Comm_size(82): MPI_Comm_size(comm=0x5b, size=0x80d807c) failed
> > MPI_Comm_size(66): Null Comm pointer
**********************************************************
Current Problem:
I have another problem now, also related to MPI/scaLAPACK.
I tried to compile and run the example1.f program - It uses the
scalapack PDGESV subroutine to solve the equation [A]x=b
(from the netlib/scalapack website - I did NOT modify it)
(see end of email for copy of example1.f)
It seems to compile ok, but on execution it gives this
error message:
****************************************
[tony at carmine clint]$ mpif77 -o ex1 example1.f
-L/opt/intel/mkl70cluster/lib/32
-lmkl_scalapack
-lmkl_blacsF77init
-lmkl_blacs
-lmkl_blacsF77init
-lmkl_lapack
-lmkl_ia32
-lguide
-lpthread
-static-libcxa
[tony at carmine clint]$ mpirun -np 6 ./ex1
p0_24505: p4_error: interrupt SIGSEGV: 11
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
[tony at carmine clint]$
****************************************
(there are also comments about "broken pipes", but
I didn't include that part above)
..as far as I can discover on the web, SIGSEGV
represents a "segmentation fault", beyond this,
I don't know what to do for a fix.
Some people have suggested fixes such as:
* increasing memory sizes: i.e. type in:
> >export P4_GLOBMEMSIZE=536870912
> >
> >(=512MB).
..but this didn't do anything for me.
Generally, I have not seen any other solutions which
have had a positive response.
They say, a problem like this is normally attributed
to a bug in the source code, but seeing as the
netlib/scalapack developers give this source code out
as the beginners basic 'hello world' program, I doubt
this program would carry a bug in it!
I would have to guess that there's something ELSE wrong.
(In any case, I tried another program out of a book that
does basically the same thing - same problem)
I was hoping a reader in this forum has seen this problem
before, and knows of a solution.
Any suggestions, even speculative, would be most appreciated,
(Please use simple language - I am quite a novice at all of this...)
with many thanks,
Clint Joung
Postdoctoral Research Associate
Department of Chemical Engineering
University of Sydney, NSW 2006
Australia
ps: My system details and the source code 'example1.f':
OS: Linux Redhat Fedora Core 2
FC: Intel Fortran Compiler V8.0 (ifort)
(I've also tried building MPI libraries using GNU g77 - same problem)
CC: GNU gcc
(for some reason, MPI libraries don't 'make' properly using intel icpc)
Scalapack et al: Intel Cluster Maths Kernel Library for Linux v7.0
MPI: mpich-1.2.6 (I've also tried mpich1.2.5.2 - same problem)
The example1.f program. It runs ok as far as the actual call to
PDGESV, then it falls over.....
**EXAMPLE1.F***************************************
PROGRAM EXAMPLE1
*
* Example Program solving Ax=b via ScaLAPACK routine PDGESV
*
* .. Parameters ..
INTEGER DLEN_, IA, JA, IB, JB, M, N, MB, NB, RSRC,
$ CSRC, MXLLDA, MXLLDB, NRHS, NBRHS, NOUT,
$ MXLOCR, MXLOCC, MXRHSC
PARAMETER ( DLEN_ = 9, IA = 1, JA = 1, IB = 1, JB = 1,
$ M = 9, N = 9, MB = 2, NB = 2, RSRC = 0,
$ CSRC = 0, MXLLDA = 5, MXLLDB = 5, NRHS = 1,
$ NBRHS = 1, NOUT = 6, MXLOCR = 5, MXLOCC = 4,
$ MXRHSC = 1 )
DOUBLE PRECISION ONE
PARAMETER ( ONE = 1.0D+0 )
* ..
* .. Local Scalars ..
INTEGER ICTXT, INFO, MYCOL, MYROW, NPCOL, NPROW
DOUBLE PRECISION ANORM, BNORM, EPS, RESID, XNORM
* ..
* .. Local Arrays ..
INTEGER DESCA( DLEN_ ), DESCB( DLEN_ ),
$ IPIV( MXLOCR+NB )
DOUBLE PRECISION A( MXLLDA, MXLOCC ), A0( MXLLDA, MXLOCC ),
$ B( MXLLDB, MXRHSC ), B0( MXLLDB, MXRHSC ),
$ WORK( MXLOCR )
* ..
* .. External Functions ..
DOUBLE PRECISION PDLAMCH, PDLANGE
EXTERNAL PDLAMCH, PDLANGE
* ..
* .. External Subroutines ..
EXTERNAL BLACS_EXIT, BLACS_GRIDEXIT, BLACS_GRIDINFO,
$ DESCINIT, MATINIT, PDGEMM, PDGESV, PDLACPY,
$ SL_INIT
* ..
* .. Intrinsic Functions ..
INTRINSIC DBLE
* ..
* .. Data statements ..
DATA NPROW / 2 / , NPCOL / 3 /
* ..
* .. Executable Statements ..
*
* INITIALIZE THE PROCESS GRID
*
CALL SL_INIT( ICTXT, NPROW, NPCOL )
CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
*
* If I'm not in the process grid, go to the end of the program
*
IF( MYROW.EQ.-1 )
$ GO TO 10
*
* DISTRIBUTE THE MATRIX ON THE PROCESS GRID
* Initialize the array descriptors for the matrices A and B
*
CALL DESCINIT( DESCA, M, N, MB, NB, RSRC, CSRC, ICTXT, MXLLDA,
$ INFO )
CALL DESCINIT( DESCB, N, NRHS, NB, NBRHS, RSRC, CSRC, ICTXT,
$ MXLLDB, INFO )
*
* Generate matrices A and B and distribute to the process grid
*
CALL MATINIT( A, DESCA, B, DESCB )
*
* Make a copy of A and B for checking purposes
*
CALL PDLACPY( 'All', N, N, A, 1, 1, DESCA, A0, 1, 1, DESCA )
CALL PDLACPY( 'All', N, NRHS, B, 1, 1, DESCB, B0, 1, 1, DESCB )
*
* CALL THE SCALAPACK ROUTINE
* Solve the linear system A * X = B
*
CALL PDGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB,
$ INFO )
*
IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
WRITE( NOUT, FMT = 9999 )
WRITE( NOUT, FMT = 9998 )M, N, NB
WRITE( NOUT, FMT = 9997 )NPROW*NPCOL, NPROW, NPCOL
WRITE( NOUT, FMT = 9996 )INFO
END IF
*
* Compute residual ||A * X - B|| / ( ||X|| * ||A|| * eps * N )
*
EPS = PDLAMCH( ICTXT, 'Epsilon' )
ANORM = PDLANGE( 'I', N, N, A, 1, 1, DESCA, WORK )
BNORM = PDLANGE( 'I', N, NRHS, B, 1, 1, DESCB, WORK )
CALL PDGEMM( 'N', 'N', N, NRHS, N, ONE, A0, 1, 1, DESCA, B, 1, 1,
$ DESCB, -ONE, B0, 1, 1, DESCB )
XNORM = PDLANGE( 'I', N, NRHS, B0, 1, 1, DESCB, WORK )
RESID = XNORM / ( ANORM*BNORM*EPS*DBLE( N ) )
*
IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
IF( RESID.LT.10.0D+0 ) THEN
WRITE( NOUT, FMT = 9995 )
WRITE( NOUT, FMT = 9993 )RESID
ELSE
WRITE( NOUT, FMT = 9994 )
WRITE( NOUT, FMT = 9993 )RESID
END IF
END IF
*
* RELEASE THE PROCESS GRID
* Free the BLACS context
*
CALL BLACS_GRIDEXIT( ICTXT )
10 CONTINUE
*
* Exit the BLACS
*
CALL BLACS_EXIT( 0 )
*
9999 FORMAT( / 'ScaLAPACK Example Program #1 -- May 1, 1997' )
9998 FORMAT( / 'Solving Ax=b where A is a ', I3, ' by ', I3,
$ ' matrix with a block size of ', I3 )
9997 FORMAT( 'Running on ', I3, ' processes, where the process grid',
$ ' is ', I3, ' by ', I3 )
9996 FORMAT( / 'INFO code returned by PDGESV = ', I3 )
9995 FORMAT( /
$ 'According to the normalized residual the solution is correct.'
$ )
9994 FORMAT( /
$ 'According to the normalized residual the solution is incorrect.'
$ )
9993 FORMAT( / '||A*x - b|| / ( ||x||*||A||*eps*N ) = ', 1P, E16.8 )
STOP
END
SUBROUTINE MATINIT( AA, DESCA, B, DESCB )
*
* MATINIT generates and distributes matrices A and B (depicted in
* Figures 2.5 and 2.6) to a 2 x 3 process grid
*
* .. Array Arguments ..
INTEGER DESCA( * ), DESCB( * )
DOUBLE PRECISION AA( * ), B( * )
* ..
* .. Parameters ..
INTEGER CTXT_, LLD_
PARAMETER ( CTXT_ = 2, LLD_ = 9 )
* ..
* .. Local Scalars ..
INTEGER ICTXT, MXLLDA, MYCOL, MYROW, NPCOL, NPROW
DOUBLE PRECISION A, C, K, L, P, S
* ..
* .. External Subroutines ..
EXTERNAL BLACS_GRIDINFO
* ..
* .. Executable Statements ..
*
ICTXT = DESCA( CTXT_ )
CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
*
S = 19.0D0
C = 3.0D0
A = 1.0D0
L = 12.0D0
P = 16.0D0
K = 11.0D0
*
MXLLDA = DESCA( LLD_ )
*
IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
AA( 1 ) = S
AA( 2 ) = -S
AA( 3 ) = -S
AA( 4 ) = -S
AA( 5 ) = -S
AA( 1+MXLLDA ) = C
AA( 2+MXLLDA ) = C
AA( 3+MXLLDA ) = -C
AA( 4+MXLLDA ) = -C
AA( 5+MXLLDA ) = -C
AA( 1+2*MXLLDA ) = A
AA( 2+2*MXLLDA ) = A
AA( 3+2*MXLLDA ) = A
AA( 4+2*MXLLDA ) = A
AA( 5+2*MXLLDA ) = -A
AA( 1+3*MXLLDA ) = C
AA( 2+3*MXLLDA ) = C
AA( 3+3*MXLLDA ) = C
AA( 4+3*MXLLDA ) = C
AA( 5+3*MXLLDA ) = -C
B( 1 ) = 0.0D0
B( 2 ) = 0.0D0
B( 3 ) = 0.0D0
B( 4 ) = 0.0D0
B( 5 ) = 0.0D0
ELSE IF( MYROW.EQ.0 .AND. MYCOL.EQ.1 ) THEN
AA( 1 ) = A
AA( 2 ) = A
AA( 3 ) = -A
AA( 4 ) = -A
AA( 5 ) = -A
AA( 1+MXLLDA ) = L
AA( 2+MXLLDA ) = L
AA( 3+MXLLDA ) = -L
AA( 4+MXLLDA ) = -L
AA( 5+MXLLDA ) = -L
AA( 1+2*MXLLDA ) = K
AA( 2+2*MXLLDA ) = K
AA( 3+2*MXLLDA ) = K
AA( 4+2*MXLLDA ) = K
AA( 5+2*MXLLDA ) = K
ELSE IF( MYROW.EQ.0 .AND. MYCOL.EQ.2 ) THEN
AA( 1 ) = A
AA( 2 ) = A
AA( 3 ) = A
AA( 4 ) = -A
AA( 5 ) = -A
AA( 1+MXLLDA ) = P
AA( 2+MXLLDA ) = P
AA( 3+MXLLDA ) = P
AA( 4+MXLLDA ) = P
AA( 5+MXLLDA ) = -P
ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.0 ) THEN
AA( 1 ) = -S
AA( 2 ) = -S
AA( 3 ) = -S
AA( 4 ) = -S
AA( 1+MXLLDA ) = -C
AA( 2+MXLLDA ) = -C
AA( 3+MXLLDA ) = -C
AA( 4+MXLLDA ) = C
AA( 1+2*MXLLDA ) = A
AA( 2+2*MXLLDA ) = A
AA( 3+2*MXLLDA ) = A
AA( 4+2*MXLLDA ) = -A
AA( 1+3*MXLLDA ) = C
AA( 2+3*MXLLDA ) = C
AA( 3+3*MXLLDA ) = C
AA( 4+3*MXLLDA ) = C
B( 1 ) = 1.0D0
B( 2 ) = 0.0D0
B( 3 ) = 0.0D0
B( 4 ) = 0.0D0
ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.1 ) THEN
AA( 1 ) = A
AA( 2 ) = -A
AA( 3 ) = -A
AA( 4 ) = -A
AA( 1+MXLLDA ) = L
AA( 2+MXLLDA ) = L
AA( 3+MXLLDA ) = -L
AA( 4+MXLLDA ) = -L
AA( 1+2*MXLLDA ) = K
AA( 2+2*MXLLDA ) = K
AA( 3+2*MXLLDA ) = K
AA( 4+2*MXLLDA ) = K
ELSE IF( MYROW.EQ.1 .AND. MYCOL.EQ.2 ) THEN
AA( 1 ) = A
AA( 2 ) = A
AA( 3 ) = -A
AA( 4 ) = -A
AA( 1+MXLLDA ) = P
AA( 2+MXLLDA ) = P
AA( 3+MXLLDA ) = -P
AA( 4+MXLLDA ) = -P
END IF
RETURN
END
SUBROUTINE SL_INIT( ICTXT, NPROW, NPCOL )
*
* .. Scalar Arguments ..
INTEGER ICTXT, NPCOL, NPROW
* ..
*
* Purpose
* =======
*
* SL_INIT initializes an NPROW x NPCOL process grid using a row-major
* ordering of the processes. This routine retrieves a default system
* context which will include all available processes. In addition it
* spawns the processes if needed.
*
* Arguments
* =========
*
* ICTXT (global output) INTEGER
* ICTXT specifies the BLACS context handle identifying the
* created process grid. The context itself is global.
*
* NPROW (global input) INTEGER
* NPROW specifies the number of process rows in the grid
* to be created.
*
* NPCOL (global input) INTEGER
* NPCOL specifies the number of process columns in the grid
* to be created.
*
* =====================================================================
*
* .. Local Scalars ..
INTEGER IAM, NPROCS
* ..
* .. External Subroutines ..
EXTERNAL BLACS_GET, BLACS_GRIDINIT, BLACS_PINFO,
$ BLACS_SETUP
* ..
* .. Executable Statements ..
*
* Get starting information
*
CALL BLACS_PINFO( IAM, NPROCS )
*
* If machine needs additional set up, do it now
*
IF( NPROCS.LT.1 ) THEN
IF( IAM.EQ.0 )
$ NPROCS = NPROW*NPCOL
CALL BLACS_SETUP( IAM, NPROCS )
END IF
*
* Define process grid
*
CALL BLACS_GET( -1, 0, ICTXT )
CALL BLACS_GRIDINIT( ICTXT, 'Row-major', NPROW, NPCOL )
*
RETURN
*
* End of SL_INIT
*
END
***************************************************
More information about the Beowulf
mailing list