[Beowulf] An annoying MPI problem
Joe Landman
landman at scalableinformatics.com
Tue Jul 8 19:01:48 PDT 2008
Hi folks
Dealing with an MPI problem that has me scratching my head. Quite
beowulfish, as thats where this code runs.
Short version: The code starts and runs. Reads in its data. Starts
its iterations. And then somewhere after this, it hangs. But not
always at the same place. It doesn't write state data back out to the
disk, just logs. Rerunning it gets it to a different point, sometimes
hanging sooner, sometimes later. Seems to be the case on multiple
different machines, with different OSes. Working on comparing MPI
distributions, and it hangs with IB as well as with shared memory and
tcp sockets.
Right now we are using OpenMPI 1.2.6, and this code does use
allreduce. When it hangs, an strace of the master process shows lots of
polling:
c1-1:~ # strace -p 8548
Process 8548 attached - interrupt to quit
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x2b061f65c9b2, [CHLD], SA_RESTORER|SA_RESTART,
0x2b062049b130}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6,
events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10,
events=POLLIN}], 6, 0) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
rt_sigaction(SIGCHLD, {0x2b061f65c9b2, [CHLD], SA_RESTORER|SA_RESTART,
0x2b062049b130}, NULL, 8) = 0
[spin forever]
...
So it looks like the process is waiting for the appropriate posting on
the internal scoreboard, and just hanging in a tight loop until this
actually happens.
But these hangs usually happen at the same place each time for a logic
error.
This is what I have seen in the past from other MPI codes where you have
enough sends and receives, but everyone posts their send before their
receive ... ordering is important of course.
But the odd thing about this code is that it worked fine 12 - 18 months
ago, and we haven't touched it since (nor has it changed). What has
changed is that we are now using OpenMPI 1.2.6.
So the code hasn't changed, and the OS on which it runs hasn't changed,
but the MPI stack has. Yeah, thats a clue.
Turning off openib and tcp doesn't make a great deal of impact. This is
also a clue.
I am looking now to trying mvapich2 and seeing how that goes. Using
Intel and gfortran compilers (Fortran/C mixed code).
Anyone see strange things like this with their MPI stacks? OpenMPI?
Mvapich2? I should try the Intel MPI as well (rebuilt mvapich2 as I
remember).
I'll try all the usual things (reduce the optimization level, etc).
Sage words of advice (and clue sticks) welcome.
Joe
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list