[Beowulf] Please help to setup Beowulf

Joe Landman landman at scalableinformatics.com
Fri Feb 20 10:00:01 PST 2009

Bogdan Costescu wrote:
> On Fri, 20 Feb 2009, Prentice Bisbal wrote:
>> You need to take a fresh look at SGE and Open MPI.
> Well, I'm subscribed to the devel lists of both these projects so I 
> don't really think that I need to take a fresh look at them :-)
>> Open MPI seems to be the new de facto standard MPI library
> While OpenMPI would be my MPI library of choice:
> - Myricom offers and supports MPICH over MX

... and OpenMPI does build atop MX as well ...

> - all IB clusters that I've had access to use MVAPICH as MPI lib of 
> choice; OpenMPI is there installed only as an afterthought or because 
> users ask for it; AFAIK there is no preference for OpenMPI in OFED

... no, but it is in there, and is generally better integrated into SGE 
than M[VA]PICH ...

> - there is no support for Quadrics cards in OpenMPI (unless they are 
> used as 10GE cards)

Not sure.  Best to ask on that list.

> So maybe you'd like to explain your choice of words regarding OpenMPI's 
> usage...
>> and you can compile it to be fully integrated with both SGE and Torque
> Full integration means different things for the 2 batch systems...
> LAM/MPI 7.x works with SGE mainly because of my efforts at that time; 
> after that point, one could also claim that LAM/MPI had full integration 
> with SGE. This still meant however that a rsh (SGE's own rsh based on 
> NetBSD one) was used to start processes on remote nodes.

We had significant problems with LAM/SGE integration at customer sites. 
  Eventually we wound up writing some back end monitor Perl magic that, 
in the event of a qdel, would do the right thing, as LAM didn't really 
converse all that well with SGE, even with the patches/scripts at the 
SGE sunsource site.

OpenMPI integration is *much* better.  Qdel just works.  Which is how it 
should be.

OpenMPI isn't perfect ... last benchmark I ran with it for Overflow CFD 
simply refused to work with it.  Had to resort to a particular version 
of mvapich2.

Had similar issues with other codes (padcirc, ...) we have been helping 
customers with.

None of the stacks are perfect, though my experience in dealing with the 
OpenMPI stack suggests that the developers are focused upon making it 
work well with schedulers, with compilers, with networking stacks.

This said, we tend to suggest our customers look for OpenMPI 
compatibility first.  HP MPI works pretty well also, though it (and 
other binary only stacks) tend to be hard linked against (older) 
particular Infiniband stacks ... makes support ... well ...  challenging.

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list