[Beowulf] use a MPI library thought a shared library
Mathieu Gontier
mg.mailing-list at laposte.net
Wed Dec 5 00:28:05 PST 2007
Yep, I use ldd every days. But here the problem comes from a corrupted
structure in MorphMPI and MPI
typedef struct{
int MorphMPI_SOURCE;
int MorphMPI_TAG;
int MorphMPI_ERROR;
void* mpi_status ;
} MorphMPI_Status ;
Where the attribut mpi_status is used to point a real MPI_Status. In MPICH:
typedef struct{
int MPI_SOURCE;
int MPI_TAG;
int MPI_ERROR;
int count ;
} MPI_Status ;
Then, when my MorphMPI_Status is given to MorphMPI_Get_count(), the
attribut MorphMPI_Status::mpi_status is not corrupted but
MorphMPI_Status::mpi_status::count is corrupted: the value should be 4
and not "random".
I tried to manipulate the structure MorphMPI_Status (add another integer
to align it in 64-bits, only have the void*,...) without success.
As reminder, this problem appears only when the MPI is used through a
dynamic linked MorphMPI library.
Does someone have an idea?
Mathieu Gontier
Core Development Engineer
Read the attached v-card for telephone, fax, adress
Look at our web-site http://www.fft.be
Joe Landman wrote:
> Greetings Mathieu:
>
> Mathieu Gontier wrote:
>
> [...]
>
>> So, I meet a little problem whatever the MPI library used (I tried
>> with MPICH-1.2.5.2, MPICHGM and IntelMPI).
>> When MorphMPI is linked statically with my parallel application,
>> everything is ok; but when MorphMPI is linked dynamically with my
>> parallel application, MPI_Get_count return a wrong value.
>>
>> I concluded it is difficult to use a MPI library thought a shared
>> library. I wonder if someone have more information about it (in this
>
> Not likely. I would suggest ldd. It is your friend.
>
> For example:
>
> joe at pegasus-i:~/workspace/source-mpi$ ldd matmul_mpi_3.exe
> libm.so.6 => /lib/libm.so.6 (0x00002b5409d17000)
> libmpi.so.0 => not found
> libopen-rte.so.0 => not found
> libopen-pal.so.0 => not found
> librt.so.1 => /lib/librt.so.1 (0x00002b5409f99000)
> libdl.so.2 => /lib/libdl.so.2 (0x00002b540a1a2000)
> libnsl.so.1 => /lib/libnsl.so.1 (0x00002b540a3a6000)
> libutil.so.1 => /lib/libutil.so.1 (0x00002b540a5c0000)
> libpthread.so.0 => /lib/libpthread.so.0 (0x00002b540a7c3000)
> libc.so.6 => /lib/libc.so.6 (0x00002b540a9de000)
> /lib64/ld-linux-x86-64.so.2 (0x00002b5409af9000)
>
> Notice that libmpi.so.0 is not found, so I can't run this by hand.
> Unless I force the issue using LD_LIBRARY_PATH
>
> joe at pegasus-i:~/workspace/source-mpi$ export
> LD_LIBRARY_PATH="/home/joe/local/lib64/:/home/joe/local/lib/"
> joe at pegasus-i:~/workspace/source-mpi$ ldd matmul_mpi_3.exe
> libm.so.6 => /lib/libm.so.6 (0x00002ae35ca50000)
> libmpi.so.0 => /home/joe/local/lib/libmpi.so.0
> (0x00002ae35ccd1000)
> libopen-rte.so.0 => /home/joe/local/lib/libopen-rte.so.0
> (0x00002ae35cfe8000)
> libopen-pal.so.0 => /home/joe/local/lib/libopen-pal.so.0
> (0x00002ae35d2b3000)
> librt.so.1 => /lib/librt.so.1 (0x00002ae35d514000)
> libdl.so.2 => /lib/libdl.so.2 (0x00002ae35d71d000)
> libnsl.so.1 => /lib/libnsl.so.1 (0x00002ae35d921000)
> libutil.so.1 => /lib/libutil.so.1 (0x00002ae35db3b000)
> libpthread.so.0 => /lib/libpthread.so.0 (0x00002ae35dd3e000)
> libc.so.6 => /lib/libc.so.6 (0x00002ae35df59000)
> /lib64/ld-linux-x86-64.so.2 (0x00002ae35c832000)
>
> and it might even run ...
>
> joe at pegasus-i:~/workspace/source-mpi$ ./matmul_mpi_3.exe
> D[tid=0]: running on machine = pegasus-i
> D: checking arguments: N_args=1
> D: arg[0] = ./matmul_mpi_3.exe
> Allocating memory ...
> array size in MB = 7.629 MB
> (remember, you have 2 of these)normalization a: 0.05510, b: 0.00173
> 0 : loop_min = 0, loop_max = 1000
> ...
>
> Do you have some sort of LD_LIBRARY_PATH set up? Or something set in
> /etc/ld.so.config that points to where these things are? Remember,
> mpirun/mpiexec's alternative purpose in life is to set up the correct
> run time environment for you, so you might want to see what is going
> on with the environment in your equivalent command.
>
>
More information about the Beowulf
mailing list