[Beowulf] MPI & ScaLAPACK: error in MPI_Comm_size: Invalid communicator

cjoung at tpg.com.au cjoung at tpg.com.au
Mon Oct 18 19:14:53 PDT 2004


Hi, I was hoping someone could help me with a F77,MPI & ScaLAPACK
problem. Basically, I have a problem making the Scalapack 
libraries work in my program.
Programs with MPI-only calls work fine, e.g. the "pi.f" MPI
program that comes with the MPI installation works fine
(the one that predicts pi), 
as do other examples I've gotten from books & simple ones I've written
myself, but whenever I try an example with scalapack & blacs calls, it falls 
over with the same error message (which I can't decipher).

If you can help, then I have a more detailed account of whats
going on below,
Any advice would be most gratefully appreciated, 
Clint Joung
Postdoctoral Research Associate,
Department of Chemical Engineering
University of SYdney, NSW 2006
Australia
**************************************************************

I'm just learning parallel programming. The netlib scalapack website
has an example program called 'example1.f'
It uses a scalapack subroutine PSGESV to solve the standard 
matrix equation [A]x=b, and return the answer, vector x.

It seemed to compile ok, but on running, I got some error
messages. 
So I systematically stripped down 'example1.f' in stages, 
recompling & running each time, trying to achieve a working 
program, eliminating potential bugs & rebuild it from there.

Eventually I got down to the following emaciated F77 program 
(see below).
All it does now is initialize a 2x3 process grid,
then release it - thats all.
****example2.f*******************************************
      program example2
      integer ictxt,mycol,myrow,npcol,nprow

      nprow=2
      nocol=3

      call SL_INIT(ictxt,nprow,npcol)

      call BLACS_EXIT(0)
      STOP
      END
*********************************************************
Yet, it still doesn't work!, the following is the output
when I try to compile and run it,
*********************************************************
[tony at carmine clint]$ mpif77 -o example2 example2.f 
                             -L/opt/intel/mkl70cluster/lib/32 
                             -lmkl_scalapack  
                             -lmkl_blacsF77init 
                             -lmkl_blacs 
                             -lmkl_blacsF77init 
                             -lmkl_lapack 
                             -lmkl_ia32 
                             -lguide 
                             -lpthread 
                             -static-libcxa
[tony at carmine clint]$ mpirun -n 6 ./example2 
aborting job: Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(82): MPI_Comm_size(comm=0x5b, size=0x80d807c) failed
MPI_Comm_size(66): Null Comm pointer
aborting job:
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(82): MPI_Comm_size(comm=0x5b, size=0x80d807c) failed
MPI_Comm_size(66): Null Comm pointer
rank 5 in job 17  carmine.soprano.org_32782   caused collective abort of all ranks
  exit status of rank 5: return code 13
aborting job:
Fatal error in MPI_Comm_size: Invalid communicator, error stack:
MPI_Comm_size(82): MPI_Comm_size(comm=0x5b, size=0x80d807c) failed
MPI_Comm_size(66): Null Comm pointer
rank 1 in job 17  carmine.soprano.org_32782   caused collective abort of all ranks
  exit status of rank 1: return code 13
rank 0 in job 17  carmine.soprano.org_32782   caused collective abort of all ranks
  exit status of rank 0: return code 13
[tony at carmine clint]$
*********************************************************
..so apparently somethings wrong with MPI_Comm_size, but
beyond that, I can't figure it out.

My system details:
* I am running this on a '1 node' cluster - i.e. my notebook.
(just to prototype before I run on a proper cluster)
* O/S: Redhat Fedora Core 1, Kernel 2.4.22 
* Compiler: Intel Fortran Compiler for linux 8.0
* MPI: MPICH2 ver 0.971 (was compiled with the ifort compiler,
  so it should work ok with the ifort compiler)
* The Scalapack, blacs, blas and lapack come from the
 Intel Cluster Maths Kernel Library for Linux 7.0

If you know how to fix this problem, I'd appreciate to
hear from you.
Please consider me a NOVICE with all three  
  -    linux, MPI and Scalapack. 
The simpler the explanation, the better!

with thanks,
clint joung




More information about the Beowulf mailing list