AGAIN: mpi-prog from lam -> scyld beompi DIES

Sat Dec 8 12:36:25 PST 2001

Some time ago I asked about some problem with my mpi program and a scyld 
beowulf cluster and got no real response to it.
- did nobody every port a lam-mpi program onto a scyld-beowulf cluster?
- did I miss the right keywords or what information is missing??

any hints? I add my post again.
Peter

On Wed, 28 Nov 2001, Peter Beerli wrote:

> Hi,
> I have a program developed using MPI-1 under LAM.
> It runs fine on several LAM-MPI clusters with different architecture.
> A user wants to run it on a Scyld-beowulf cluster and there it fails.
> I did a few tests myself and it seems
> that the program stalls if run on more than 3 nodes, but seems to work for
> 2-3 nodes. The program has master-slaves architectures where the master
> is mostly doing nothing. There are some reports sent to stdout from any node
> (but this seems to work in beompi the same way as in LAM). 
> There are several things unclear to me
> because I have no clue about the beompi system, beowulf and scyld in
> particular.
> 
> (1) if I run "top" why do I see 6 processes running when I start
>     with mpirun -np 3 migrate-n ? 
here I received a useful response, but this does not solve my problem.
this is solved, and is just they way how mpich treats run and I/O,
but they these different process have different mpi-IDs? then this would
be a problem.
> 
> (2) The data-phase stalls on the slave nodes.
>     The master node is reading the data from a file and then broadcasts
>     a large char buffer to the slaves. Is this wrong, is there a better way
>     to do that [I do not know how big the data is and it is a complex mix
>     of strings numbers etc.]
> 
> void
> broadcast_data_master (data_fmt * data, option_fmt * options)
> {
>   long bufsize;
>   char *buffer;
>   buffer = (char *) calloc (1, sizeof (char));
>   bufsize = pack_databuffer (&buffer, data, options);
>   MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world);
>   MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world);
>   free (buffer);
> }
In case you wonder about the size of the buffer, it gets expanded
in pack_databuffer()
> 
> void
> broadcast_data_worker (data_fmt * data, option_fmt * options)
> {
>   long bufsize;
>   char *buffer;
>   MPI_Bcast (&bufsize, 1, MPI_LONG, MASTER, comm_world);
>   buffer = (char *) calloc (bufsize, sizeof (char));
>   MPI_Bcast (buffer, bufsize, MPI_CHAR, MASTER, comm_world);
>   unpack_databuffer (buffer, data, options);
>   free (buffer);
> }
> 
>   the master and the first node seem to read the data fine
>    but the others either don't and wait or silently die.
>    
> (3) what is the easiest way to debug this? With LAM I just attached to pids to
>     in gdb on the different nodes, but here the nodes are transparent to me
>     [but as I said I have never used a beowulf cluster before].
> 
> 
> Can you give pointers, hints
> 
> thanks
> Peter
> 

-- 
Peter Beerli,  Genome Sciences, Box #357730, University of Washington,
Seattle WA 98195-7730 USA, Ph:2065438751, Fax:2065430754
http://evolution.genetics.washington.edu/PBhtmls/beerli.html