[Beowulf] IB problem/using IB diagnostics

Paulo Afonso Lopes pal at di.fct.unl.pt
Fri Jun 19 10:28:20 PDT 2009


>
> It's actually an MPI job (HPL using OpenMPI) which is reporting the
> problem.
>
> The head scratching continues...
>

It seems, from the ongoing discussion , that you do not have a hw problem,
but an (open)MPI one; I  have seen openMPI failing because some user-level
(or kernel; in my case it was user) verbs/etc. library missing.

Sugestions:

1) check the job runs, with say,  -mca btl ^udapl (exclude UDAPL and see
if it runs) or  e.g., -mca btl openib,tcp,sm,self

or

2) more tediously, check that all libraries present in a non-failing node
are available in the failing one... There is a "Getting Started with
InfiniBand" page which has the names of the libraries/products that you
should have loaded to have a fully functioning IB stack - it solved my
problem :-)

HTH

paulo

-- 
Paulo Afonso Lopes                        | Tel: +351- 21 294 8536
Departamento de Informática               | 294 8300 ext.10702
Faculdade de Ciências e Tecnologia        | Fax: +351- 21 294 8541
Universidade Nova de Lisboa               | e-mail: pal at di.fct.unl.pt
2829-516 Caparica, PORTUGAL






More information about the Beowulf mailing list