[Beowulf] MPI & ScaLAPACK: error in MPI_Comm_size: Invalid communicator
cjoung at tpg.com.au
cjoung at tpg.com.au
Mon Oct 18 19:14:53 PDT 2004
Hi, I was hoping someone could help me with a F77,MPI & ScaLAPACK
problem. Basically, I have a problem making the Scalapack
libraries work in my program.
Programs with MPI-only calls work fine, e.g. the "pi.f" MPI
program that comes with the MPI installation works fine
(the one that predicts pi),
as do other examples I've gotten from books & simple ones I've written
myself, but whenever I try an example with scalapack & blacs calls, it falls
over with the same error message (which I can't decipher).
If you can help, then I have a more detailed account of whats
going on below,
Any advice would be most gratefully appreciated,
Clint Joung
Postdoctoral Research Associate,
Department of Chemical Engineering
University of SYdney, NSW 2006
Australia
**************************************************************
I'm just learning parallel programming. The netlib scalapack website
has an example program called 'example1.f'
It uses a scalapack subroutine PSGESV to solve the standard
matrix equation [A]x=b, and return the answer, vector x.
It seemed to compile ok, but on running, I got some error
messages.
So I systematically stripped down 'example1.f' in stages,
recompling & running each time, trying to achieve a working
program, eliminating potential bugs & rebuild it from there.
Eventually I got down to the following emaciated F77 program
(see below).
All it does now is initialize a 2x3 process grid,
then release it - thats all.
****example2.f*******************************************
program example2
integer ictxt,mycol,myrow,npcol,nprow
nprow=2
nocol=3
call SL_INIT(ictxt,nprow,npcol)
call BLACS_EXIT(0)
STOP
END
*********************************************************
Yet, it still doesn't work!, the following is the output
when I try to compile and run it,
*********************************************************
[tony at carmine clint]$ mpif77 -o example2 example2.f
-L/opt/intel/mkl70cluster/lib/32
-lmkl_scalapack
-lmkl_blacsF77init
-lmkl_blacs
-lmkl_blacsF77init
-lmkl_lapack
-lmkl_ia32
-lguide
-lpthread
-static-libcxa
[tony at carmine clint]$ mpirun -n 6 ./example2
aborting job: Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(82): MPI_Comm_size(comm=0x5b, size=0x80d807c) failed
MPI_Comm_size(66): Null Comm pointer
aborting job:
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(82): MPI_Comm_size(comm=0x5b, size=0x80d807c) failed
MPI_Comm_size(66): Null Comm pointer
rank 5 in job 17 carmine.soprano.org_32782 caused collective abort of all ranks
exit status of rank 5: return code 13
aborting job:
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(82): MPI_Comm_size(comm=0x5b, size=0x80d807c) failed
MPI_Comm_size(66): Null Comm pointer
rank 1 in job 17 carmine.soprano.org_32782 caused collective abort of all ranks
exit status of rank 1: return code 13
rank 0 in job 17 carmine.soprano.org_32782 caused collective abort of all ranks
exit status of rank 0: return code 13
[tony at carmine clint]$
*********************************************************
..so apparently somethings wrong with MPI_Comm_size, but
beyond that, I can't figure it out.
My system details:
* I am running this on a '1 node' cluster - i.e. my notebook.
(just to prototype before I run on a proper cluster)
* O/S: Redhat Fedora Core 1, Kernel 2.4.22
* Compiler: Intel Fortran Compiler for linux 8.0
* MPI: MPICH2 ver 0.971 (was compiled with the ifort compiler,
so it should work ok with the ifort compiler)
* The Scalapack, blacs, blas and lapack come from the
Intel Cluster Maths Kernel Library for Linux 7.0
If you know how to fix this problem, I'd appreciate to
hear from you.
Please consider me a NOVICE with all three
- linux, MPI and Scalapack.
The simpler the explanation, the better!
with thanks,
clint joung
More information about the Beowulf
mailing list