[Beowulf] first cluster [was [OMPI users] trouble using openmpi under slurm]

Douglas Guptill douglas.guptill at dal.ca
Fri Jul 9 09:43:13 PDT 2010


On Thu, Jul 08, 2010 at 09:43:48AM -0400, Gus Correa wrote:
> Douglas Guptill wrote:
>> On Wed, Jul 07, 2010 at 12:37:54PM -0600, Ralph Castain wrote:
>>
>>> No....afraid not. Things work pretty well, but there are places
>>> where things just don't mesh. Sub-node allocation in particular is
>>> an issue as it implies binding, and slurm and ompi have conflicting
>>> methods.
>>>
>>> It all can get worked out, but we have limited time and nobody cares
>>> enough to put in the effort. Slurm just isn't used enough to make it
>>> worthwhile (too small an audience).
>>
>> I am about to get my first HPC cluster (128 nodes), and was
>> considering slurm.  We do use MPI.
>>
>> Should I be looking at Torque instead for a queue manager?
>>
> Hi Douglas
>
> Yes, works like a charm along with OpenMPI.
> I also have MVAPICH2 and MPICH2, no integration w/ Torque,
> but no conflicts either.

Thanks, Gus.

After some lurking and reading, I plan this:
  Debian (lenny)
  + fai                   - for compute-node operating system install
  + Torque                - job scheduler/manager
  + MPI (Intel MPI)       - for the application
  + MPI (OpenMP)          - alternative MPI

Does anyone see holes in this plan?

Thanks,
Douglas
-- 
  Douglas Guptill                       voice: 902-461-9749
  Research Assistant, LSC 4640          email: douglas.guptill at dal.ca
  Oceanography Department               fax:   902-494-3877
  Dalhousie University
  Halifax, NS, B3H 4J1, Canada




More information about the Beowulf mailing list