[Beowulf] HPL Runtime error

Craig Tierney ctierney at hypermall.net
Thu Apr 20 08:41:12 PDT 2006


Langford, Lester wrote:
> Hello all,
> 
> I am kind of new to cluster operation, but have built 4 small sized (16 
> – 48 nodes) for other researchers.
> Now, I’m trying to run HPL on our 48-node dual Opteron 246 cluster.  Got 
> it to build
> the xhpl file, but when I try a test run I get the following error:  
> 
> a48:/local-io/linux_bench/hpl/bin/Linux_Op_246 # mpirun -np 4 xhpl
> /local-io/linux_bench/hpl/bin/Linux_Op_246/xhpl: Command not found.
> p0_4305:  p4_error: Child process exited while making connection to 
> remote process on ath64: 0
> p0_4305: (46.070312) net_send: could not write to fd=4, errno = 32

Is the /local-io filesystem shared amongst all nodes, or is it the
local disk?  The executable needs to be on a shared filesystem where
all applications can see it.

This is probably why you got the "command not found' error.

Craig


> 
> ath64 is a workstation I am not using for this task.
> 
> Any and all help on this matter would be greatly appreciated.
> Thanks,
> 
> Les Langford
> 
> 
> Lester Langford
> Technology Development & Transfer
> NASA Test Operations Group
> Jacobs Sverdrup/ERC
> Bldg 8306
> Stennis Space Center, MS 39529
> 
> *- *Lester.Langford at ssc.nasa.gov
> ( (228) 688-7221
> Fax     (228) 688-1106
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list