[Beowulf] IB problem/using IB diagnostics
Prentice Bisbal
prentice at ias.edu
Fri Jun 19 09:48:37 PDT 2009
Gus Correa wrote:
> Prentice Bisbal wrote:
>> John Hearns wrote:
>>>
>>> 2009/6/18 Prentice Bisbal <prentice at ias.edu <mailto:prentice at ias.edu>>
>>>
>>> John Hearns wrote:
>>> > Can you log into node36 and run ibstat or ibstatus?
>>> >
>>>
>>> Looks good to me!
>>> Links are up and it sees a subnet manager. As Greg says, looks like
>>> something wonky in the script which is reporting
>>> the node status??
>>
>> It's actually an MPI job (HPL using OpenMPI) which is reporting the
>> problem.
>>
>> The head scratching continues...
>>
>
> Hi Prentice, list
>
> Just in case you haven't seen this ...
> Are you using OpenMPI 1.3.0 or 1.3.1?
> Those versions have a memory leak bug when using IB.
> The solution for the memory leak is to upgrade to 1.3.2.
> A workaround is to use -mca mpi_leave_pinned=0.
> See:
>
> http://www.open-mpi.org/community/lists/announce/2009/04/0030.php
> https://svn.open-mpi.org/trac/ompi/ticket/1853
>
> My HPL with OpenMPI 1.3.1 crashed when using lots of memory.
> I upgraded to 1.3.2, which fixed the problem,
> and I haven't looked at the error messages,
> so your problem may be different.
> However, memory leaks can produce weird errors, hard to diagnose.
>
I'm using OpenMPI 1.2.8
--
Prentice
More information about the Beowulf
mailing list