[Beowulf] Update on mpi problem

Joe Landman landman at scalableinformatics.com
Wed Jul 9 20:58:32 PDT 2008


Ok ... thought this would be interesting for some folks.  As a reminder, 
using Open-MPI 1.2.6 for a customer code, seeing different behavior than 
in the past.  Scratching my head over it (seemingly non-deterministic).

I tried using '--mca btl ^sm' (turn off shared memory usage) on the 
non-infiniband machine, and ... it runs.  Repeatedly.  To completion.

Ok, over to the Infiniband machine.  I tried using '--mca btl ^sm'.  No 
dice (the tcp and openib are still available).

Next I tried turning off the tcp (ethernet)

	--mca btl ^sm,tcp

Nope.  Still doesn't work right.  Hmmm....  One left.  Turn off openib 
(infiniband).


	--mca btl ^sm,openib

Yup.  It works.  Repeatedly.  To completion.

It looks like this is an MPI stack issue of some sort.  I'll ping the 
Open-MPI list and see what they think.

Thanks to all the suggestions and comments.

FWIW, I also pulled down the DDT tool from Allinea, with the thought of 
testing it, and seeing if I could figure out where the problem was with 
the code.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list