[Beowulf] p4_error: net_recv read: probable EOF on socket: 1

Mark Hahn hahn at physics.mcmaster.ca
Mon May 8 11:04:34 PDT 2006


> p4_error:interrupt SIGSEGV: 11

well, some program tried to access inappropriate memory.
note that this _can_ be due to hardware problems (overheating,
bad memory, etc).

> p4_error: net_recv read:  probable EOF on socket: 1

afaik, this is from a different node and just means that it noticed
that its socket closed to the peer who SEGV'ed.

> This error occurs after running the code for several hours using all
> processors in my cluster.  I have seen several postings similar to this
> on the web, however, I have not seen any posted solutions.  My

for a good reason - the problem is probably particular to the cluster,
not general to the software...

> Mpich_1.2.1 compiled w/ Portland compilers

that said, it seems inappropriate to be running a quite old version.
wow, that actually dates from 09/05/2000, at least according to the 
timestamps on the mpich ftp server...




More information about the Beowulf mailing list