[Beowulf] IB problem/using IB diagnostics
Paulo Afonso Lopes
pal at di.fct.unl.pt
Fri Jun 19 10:28:20 PDT 2009
>
> It's actually an MPI job (HPL using OpenMPI) which is reporting the
> problem.
>
> The head scratching continues...
>
It seems, from the ongoing discussion , that you do not have a hw problem,
but an (open)MPI one; I have seen openMPI failing because some user-level
(or kernel; in my case it was user) verbs/etc. library missing.
Sugestions:
1) check the job runs, with say, -mca btl ^udapl (exclude UDAPL and see
if it runs) or e.g., -mca btl openib,tcp,sm,self
or
2) more tediously, check that all libraries present in a non-failing node
are available in the failing one... There is a "Getting Started with
InfiniBand" page which has the names of the libraries/products that you
should have loaded to have a fully functioning IB stack - it solved my
problem :-)
HTH
paulo
--
Paulo Afonso Lopes | Tel: +351- 21 294 8536
Departamento de Informática | 294 8300 ext.10702
Faculdade de Ciências e Tecnologia | Fax: +351- 21 294 8541
Universidade Nova de Lisboa | e-mail: pal at di.fct.unl.pt
2829-516 Caparica, PORTUGAL
More information about the Beowulf
mailing list