Bproc or BeoMPI

Dan Smith dcs at iastate.edu
Wed Mar 14 12:39:02 PST 2001


I wasn't sure where to send this question as it might be a Bproc thing or
maybe just an MPI thing, so I apologize to those who will see it twice.

I am playing with the MPI_Gather procedure under BeoMPI and I am getting
error messages that look like the following: 

p1_25945:  p4_error: interrupt SIGSEGV: 11
p0_25943:  p4_error: interrupt SIGSEGV: 11

This is just using two processors.  I get as many error messages as
processors running the program.  The program still does pretty much what it
is supposed to do, though.  Are these BProc errors or MPI errors and where
can I find info on these error codes?  

A couple of notes:  

1) This only occurs when the value being passed to the root process is
   (1+i)*2, where i is the process number (i = 0..#_of_processors-1).  If I
   use (1+i)*3 or something other than multiplying by 2, then I do not get the
   errors.  

2) Process number 6 also fails to pass the correct value to the root process
   every time, even when I don't get the error messages. It always passes
   the number 7.

3) I shut down each slave node one at a time and ran the program each time. 
   I still the error messgaes and the wrong value passed by process 6.

This is really baffling.

Any help is greatly appreciated. 

Dan

---
Daniel C. Smith                  |    Iowa State University
Graduate Assistant               |    Department of Physics and Astronomy
dcs at iastate.edu                  |    Ames, IA 50011




More information about the Beowulf mailing list