[Beowulf] fftw2, mpi, from 32 bit to 64 and fortran
Ricardo Reis
rreis at aero.ist.utl.pt
Sat Aug 2 04:25:27 PDT 2008
Hi
Thanks for replying. Answering all the questions:
- This is a debian box, X86_64 native. So all that is compiled is
naturally 64 bit;
- I've compiled myself the fftw-2.5.1 because the fftw3 has only
experimental MPI suport, without Fortran bindings. I've asked if the
project has stoped because the last release (fftw, 3.2 alpha) is dated
Nov. 13, 2007
- I'm using openmpi, from the debian package. I've also compiled openmpi
by hand and the same problem happens. I've compiled the latest LAM
(although had to explicit the 4.1 version of gcc suite because I've found
a problem with the 4.3. It says g++ isn't boolean capable). I can run
other mpi codes in this machine (a pseudo-spectral DNS code I've
parallized myself) with this openmpi instalation;
- Using LAM it works for 1 processor. It blews up for more than 2. I can
run my DNS code with lam without problem.
- The only 64 bit caveat on the fftw notes relates to the declaration of
the plan variables that should be integer(8). I've carefully done that. I
even got to the extreme of placing -fdefault-integer-8 in the compilation
flags of this code;
- I can run this code as serial or threaded without problems;
- The 32 bit test was my laptop, a 32 bit machine. The 64 bit on the 64
bit machine. No libraries are transported (svn co and make and so on...)
- Yes, I've managed to run the tests (but they are C programs allas!).
- The program only blows up when going to do the fft r2c (my first
transform). Before that it is able to do another mpi functions.
- Gus, Ode Triunfal by Alvaro de Campos is one of my favourite poems. The
early XX century machine emotion fever of electricity. The furious hunger
to be alive and eating the world full :)
- I've tried it on another debian box, X86_64, with openmpi from
debian and the same problem happens...
- if I compile with -fdefault-integer-8 this is the error message
5068.0 $ mpirun -np 2 ~/bin/spec2.mpi
Launching MPI program with 2 proc.
[tenorio:21099] *** Process received signal ***
[tenorio:21100] *** Process received signal ***
[tenorio:21099] Signal: Segmentation fault (11)
[tenorio:21099] Signal code: (128)
[tenorio:21099] Failing at address: (nil)
[tenorio:21099] [ 0] /lib/libpthread.so.0 [0x7f13ca893a90]
[tenorio:21099] [ 1] /usr/lib/libopen-pal.so.0(_int_malloc+0x962) [0x7f13cb3057c2]
[tenorio:21099] [ 2] /usr/lib/libopen-pal.so.0(malloc+0x8f) [0x7f13cb3068ef]
[tenorio:21099] [ 3] /home/rreis/bin/spec2.mpi(MAIN__+0x79a) [0x40eb0a]
[tenorio:21099] [ 4] /home/rreis/bin/spec2.mpi(main+0x2c) [0x46d3cc]
[tenorio:21099] [ 5] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f13ca5501a6]
[tenorio:21099] [ 6] /home/rreis/bin/spec2.mpi [0x407d59]
[tenorio:21099] *** End of error message ***
[tenorio:21100] Signal: Segmentation fault (11)
[tenorio:21100] Signal code: (128)
[tenorio:21100] Failing at address: (nil)
[tenorio:21100] [ 0] /lib/libpthread.so.0 [0x7f858af35a90]
[tenorio:21100] [ 1] /usr/lib/libopen-pal.so.0(_int_malloc+0x962) [0x7f858b9a77c2]
[tenorio:21100] [ 2] /usr/lib/libopen-pal.so.0(malloc+0x8f) [0x7f858b9a88ef]
[tenorio:21100] [ 3] /home/rreis/bin/spec2.mpi(MAIN__+0x79a) [0x40eb0a]
[tenorio:21100] [ 4] /home/rreis/bin/spec2.mpi(main+0x2c) [0x46d3cc]
[tenorio:21100] [ 5] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f858abf21a6]
[tenorio:21100] [ 6] /home/rreis/bin/spec2.mpi [0x407d59]
[tenorio:21100] *** End of error message ***
mpirun noticed that job rank 0 with PID 21099 on node tenorio exited on
signal 11 (Segmentation fault).
1 additional process aborted (not shown)
- if I take the flag out
5070.0 $ mpirun -np 2 ~/bin/spec2.mpi
Launching MPI program with 2 proc.
Read field (DONE)
[tenorio:21234] *** Process received signal ***
[tenorio:21234] Signal: Segmentation fault (11)
[tenorio:21234] Signal code: Address not mapped (1)
[tenorio:21234] Failing at address: 0x4840
[tenorio:21234] [ 0] /lib/libpthread.so.0 [0x7fd57da65a90]
[tenorio:21234] [ 1] /home/rreis/bin/spec2.mpi(rfftwnd_f77_mpi_+0x16) [0x40f676]
[tenorio:21234] [ 2] /home/rreis/bin/spec2.mpi(MAIN__+0xb69) [0x40f1fe]
[tenorio:21234] [ 3] /home/rreis/bin/spec2.mpi(main+0x2c) [0x46d6bc]
[tenorio:21234] [ 4] /lib/libc.so.6(__libc_start_main+0xe6) [0x7fd57d7221a6]
[tenorio:21234] [ 5] /home/rreis/bin/spec2.mpi [0x407d59]
[tenorio:21234] *** End of error message ***
mpirun noticed that job rank 0 with PID 21234 on node tenorio exited on
signal 11 (Segmentation fault).
1 additional process aborted (not shown)
Maybe I should try mpich or compile the openmpi with all bells and
whistles and give it another run...
greets,
Ricardo Reis
'Non Serviam'
PhD student @ Lasef
Computational Fluid Dynamics, High Performance Computing, Turbulence
http://www.lasef.ist.utl.pt
&
Cultural Instigator @ Rádio Zero
http://www.radiozero.pt
http://www.flickr.com/photos/rreis/
More information about the Beowulf
mailing list