[Beowulf] IB problem/using IB diagnostics

Prentice Bisbal prentice at ias.edu
Fri Jun 19 09:48:37 PDT 2009


Gus Correa wrote:
> Prentice Bisbal wrote:
>> John Hearns wrote:
>>>
>>> 2009/6/18 Prentice Bisbal <prentice at ias.edu <mailto:prentice at ias.edu>>
>>>
>>>     John Hearns wrote:
>>>     > Can you log into node36 and run ibstat or ibstatus?
>>>     >
>>>
>>> Looks good to me!
>>> Links are up and it sees a subnet manager. As Greg says, looks like
>>> something wonky in the script which is reporting
>>> the node status??
>>
>> It's actually an MPI job (HPL using OpenMPI) which is reporting the
>> problem.
>>
>> The head scratching continues...
>>
> 
> Hi Prentice, list
> 
> Just in case you haven't seen this ...
> Are you using OpenMPI 1.3.0 or 1.3.1?
> Those versions have a memory leak bug when using IB.
> The solution for the memory leak is to upgrade to 1.3.2.
> A workaround is to use -mca mpi_leave_pinned=0.
> See:
> 
> http://www.open-mpi.org/community/lists/announce/2009/04/0030.php
> https://svn.open-mpi.org/trac/ompi/ticket/1853
> 
> My HPL with OpenMPI 1.3.1 crashed when using lots of memory.
> I upgraded to 1.3.2, which fixed the problem,
> and I haven't looked at the error messages,
> so your problem may be different.
> However, memory leaks can produce weird errors, hard to diagnose.
> 

I'm using OpenMPI 1.2.8


-- 
Prentice



More information about the Beowulf mailing list