[Beowulf] IB problem/using IB diagnostics

Gus Correa gus at ldeo.columbia.edu
Fri Jun 19 09:14:52 PDT 2009

Prentice Bisbal wrote:
> John Hearns wrote:
>> 2009/6/18 Prentice Bisbal <prentice at ias.edu <mailto:prentice at ias.edu>>
>>     John Hearns wrote:
>>     > Can you log into node36 and run ibstat or ibstatus?
>>     >
>> Looks good to me!
>> Links are up and it sees a subnet manager. As Greg says, looks like
>> something wonky in the script which is reporting
>> the node status??
> It's actually an MPI job (HPL using OpenMPI) which is reporting the
> problem.
> The head scratching continues...

Hi Prentice, list

Just in case you haven't seen this ...
Are you using OpenMPI 1.3.0 or 1.3.1?
Those versions have a memory leak bug when using IB.
The solution for the memory leak is to upgrade to 1.3.2.
A workaround is to use -mca mpi_leave_pinned=0.


My HPL with OpenMPI 1.3.1 crashed when using lots of memory.
I upgraded to 1.3.2, which fixed the problem,
and I haven't looked at the error messages,
so your problem may be different.
However, memory leaks can produce weird errors, hard to diagnose.

My $0.02.

Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

More information about the Beowulf mailing list