[Beowulf] Cannot run offload codes with Intel Xeon Phi cards deployed via xCAT
Christopher Samuel
samuel at unimelb.edu.au
Tue Aug 20 20:51:10 PDT 2013
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi xCAT and Beowulf folks,
Has anyone successfully run an offload code (say Intel's
xhpl_offload_intel64) using Xeon Phi's that have been deployed via xCAT?
We've got a user trying to use a prerelease version of NAMD that does
offload but it fails saying it can't find the cards.
Setting OFFLOAD_REPORT=2 shows errors of:
[SOURCE][0x9377bc80][1834028774450][engine.cpp:186][COILOG_LEVEL_ERROR][ConnectToDaemon]:
Error: Engine_connect
[SOURCE][0x9377bc80][2055063906528][engine.cpp:186][COILOG_LEVEL_ERROR][ConnectToDaemon]:
Error: Engine_connect
[SOURCE][0x9377bc80][2276654460069][engine.cpp:186][COILOG_LEVEL_ERROR][ConnectToDaemon]:
Error: Engine_connect
[SOURCE][0x9377bc80][2497819045011][engine.cpp:186][COILOG_LEVEL_ERROR][ConnectToDaemon]:
Error: Engine_connect
and strace reveals that ioctl() on the filehandle returned by an open
on /dev/mic/scif returns ECONNREFUSED:
5672 open("/dev/mic/scif", O_RDWR) = 3
[...]
5672 ioctl(3, 0xc0087303, 0x7fff1c780f20) = -1 ECONNREFUSED
(Connection refused)
Reading the Intel "mic" kernel driver source code that's only returned
at one point in the driver, and the include file explains it as:
*- ECONNREFUSED
* - The destination was not listening for connections or refused the
* connection request.
So it's sounding like the uos that's getting deployed on the MICs is
not complete, or something isn't getting loaded on boot.
I've replicated the same error with xhpl_offload_intel64 (which used
to work before we changed to xCAT deploying the Phi's) so it's not the
users code.
Any ideas?
Can I ask someone with a Xeon Phi working for offload purposes to send
me the output of "lsmod" and "ps -aef" please? That way I can diff
it with what I have to see what (if anything) I'm missing.
cheers!
Chris
- --
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlIUOS4ACgkQO2KABBYQAh9vMwCfRiucAKqy8UV2/6G02aqGyPxa
eWcAn3jTTEUbPNwoZT7ELdOUZ2pWldj/
=I/fR
-----END PGP SIGNATURE-----
More information about the Beowulf
mailing list