[Beowulf] Cannot run offload codes with Intel Xeon Phi cards deployed via xCAT

Christopher Samuel samuel at unimelb.edu.au
Tue Aug 20 20:51:10 PDT 2013

Hash: SHA1

Hi xCAT and Beowulf folks,

Has anyone successfully run an offload code (say Intel's
xhpl_offload_intel64) using Xeon Phi's that have been deployed via xCAT?

We've got a user trying to use a prerelease version of NAMD that does
offload but it fails saying it can't find the cards.

Setting OFFLOAD_REPORT=2 shows errors of:

Error: Engine_connect
Error: Engine_connect
Error: Engine_connect
Error: Engine_connect

and strace reveals that ioctl() on the filehandle returned by an open
on /dev/mic/scif returns ECONNREFUSED:

5672  open("/dev/mic/scif", O_RDWR)     = 3
5672  ioctl(3, 0xc0087303, 0x7fff1c780f20) = -1 ECONNREFUSED
(Connection refused)

Reading the Intel "mic" kernel driver source code that's only returned
at one point in the driver, and the include file explains it as:

 * - The destination was not listening for connections or refused the
 *   connection request.

So it's sounding like the uos that's getting deployed on the MICs is
not complete, or something isn't getting loaded on boot.

I've replicated the same error with xhpl_offload_intel64 (which used
to work before we changed to xCAT deploying the Phi's) so it's not the
users code.

Any ideas?

Can I ask someone with a Xeon Phi working for offload purposes to send
me the output of "lsmod" and "ps -aef" please?   That way I can diff
it with what I have to see what (if anything) I'm missing.

- -- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/


More information about the Beowulf mailing list