[Beowulf] fftw2, mpi, from 32 bit to 64 and fortran

Ricardo Reis rreis at aero.ist.utl.pt
Sat Aug 2 04:25:27 PDT 2008


Hi

Thanks for replying. Answering all the questions:

  - This is a debian box, X86_64 native. So all that is compiled is 
naturally 64 bit;

  - I've compiled myself the fftw-2.5.1 because the fftw3 has only 
experimental MPI suport, without Fortran bindings. I've asked if the 
project has stoped because the last release (fftw, 3.2 alpha) is dated 
Nov. 13, 2007

  - I'm using openmpi, from the debian package. I've also compiled openmpi 
by hand and the same problem happens. I've compiled the latest LAM 
(although had to explicit the 4.1 version of gcc suite because I've found 
a problem with the 4.3. It says g++ isn't boolean capable). I can run 
other mpi codes in this machine (a pseudo-spectral DNS code I've 
parallized myself) with this openmpi instalation;

  - Using LAM it works for 1 processor. It blews up for more than 2. I can 
run my DNS code with lam without problem.

  - The only 64 bit caveat on the fftw notes relates to the declaration of 
the plan variables that should be integer(8). I've carefully done that. I 
even got to the extreme of placing -fdefault-integer-8 in the compilation 
flags of this code;

  - I can run this code as serial or threaded without problems;

  - The 32 bit test was my laptop, a 32 bit machine. The 64 bit on the 64 
bit machine. No libraries are transported (svn co and make and so on...)

  - Yes, I've managed to run the tests (but they are C programs allas!).

  - The program only blows up when going to do the fft r2c (my first 
transform). Before that it is able to do another mpi functions.

  - Gus, Ode Triunfal by Alvaro de Campos is one of my favourite poems. The 
early XX century machine emotion fever of electricity. The furious hunger 
to be alive and eating the world full :)

  - I've tried it on another debian box, X86_64, with openmpi from 
debian and the same problem happens...


  - if I compile with -fdefault-integer-8 this is the error message

5068.0 $ mpirun -np 2 ~/bin/spec2.mpi
Launching MPI program with     2 proc.
[tenorio:21099] *** Process received signal ***
[tenorio:21100] *** Process received signal ***
[tenorio:21099] Signal: Segmentation fault (11)
[tenorio:21099] Signal code:  (128)
[tenorio:21099] Failing at address: (nil)
[tenorio:21099] [ 0] /lib/libpthread.so.0 [0x7f13ca893a90]
[tenorio:21099] [ 1] /usr/lib/libopen-pal.so.0(_int_malloc+0x962) [0x7f13cb3057c2]
[tenorio:21099] [ 2] /usr/lib/libopen-pal.so.0(malloc+0x8f) [0x7f13cb3068ef]
[tenorio:21099] [ 3] /home/rreis/bin/spec2.mpi(MAIN__+0x79a) [0x40eb0a]
[tenorio:21099] [ 4] /home/rreis/bin/spec2.mpi(main+0x2c) [0x46d3cc]
[tenorio:21099] [ 5] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f13ca5501a6]
[tenorio:21099] [ 6] /home/rreis/bin/spec2.mpi [0x407d59]
[tenorio:21099] *** End of error message ***
[tenorio:21100] Signal: Segmentation fault (11)
[tenorio:21100] Signal code:  (128)
[tenorio:21100] Failing at address: (nil)
[tenorio:21100] [ 0] /lib/libpthread.so.0 [0x7f858af35a90]
[tenorio:21100] [ 1] /usr/lib/libopen-pal.so.0(_int_malloc+0x962) [0x7f858b9a77c2]
[tenorio:21100] [ 2] /usr/lib/libopen-pal.so.0(malloc+0x8f) [0x7f858b9a88ef]
[tenorio:21100] [ 3] /home/rreis/bin/spec2.mpi(MAIN__+0x79a) [0x40eb0a]
[tenorio:21100] [ 4] /home/rreis/bin/spec2.mpi(main+0x2c) [0x46d3cc]
[tenorio:21100] [ 5] /lib/libc.so.6(__libc_start_main+0xe6) [0x7f858abf21a6]
[tenorio:21100] [ 6] /home/rreis/bin/spec2.mpi [0x407d59]
[tenorio:21100] *** End of error message ***
mpirun noticed that job rank 0 with PID 21099 on node tenorio exited on 
signal 11 (Segmentation fault).
1 additional process aborted (not shown)

  - if I take the flag out

5070.0 $ mpirun -np 2 ~/bin/spec2.mpi
Launching MPI program with     2 proc.
Read field (DONE)
[tenorio:21234] *** Process received signal ***
[tenorio:21234] Signal: Segmentation fault (11)
[tenorio:21234] Signal code: Address not mapped (1)
[tenorio:21234] Failing at address: 0x4840
[tenorio:21234] [ 0] /lib/libpthread.so.0 [0x7fd57da65a90]
[tenorio:21234] [ 1] /home/rreis/bin/spec2.mpi(rfftwnd_f77_mpi_+0x16) [0x40f676]
[tenorio:21234] [ 2] /home/rreis/bin/spec2.mpi(MAIN__+0xb69) [0x40f1fe]
[tenorio:21234] [ 3] /home/rreis/bin/spec2.mpi(main+0x2c) [0x46d6bc]
[tenorio:21234] [ 4] /lib/libc.so.6(__libc_start_main+0xe6) [0x7fd57d7221a6]
[tenorio:21234] [ 5] /home/rreis/bin/spec2.mpi [0x407d59]
[tenorio:21234] *** End of error message ***
mpirun noticed that job rank 0 with PID 21234 on node tenorio exited on 
signal 11 (Segmentation fault).
1 additional process aborted (not shown)


  Maybe I should try mpich or compile the openmpi with all bells and 
whistles and give it another run...

  greets,

  Ricardo Reis

  'Non Serviam'

  PhD student @ Lasef
  Computational Fluid Dynamics, High Performance Computing, Turbulence
  http://www.lasef.ist.utl.pt

  &

  Cultural Instigator @ Rádio Zero
  http://www.radiozero.pt

  http://www.flickr.com/photos/rreis/


More information about the Beowulf mailing list