[Beowulf] TCP connect error: ECONNREFUSED.
Jörg Saßmannshausen
jorg.sassmannshausen at strath.ac.uk
Mon Mar 30 06:14:50 PDT 2009
Dear all,
I am having this rather anoying problem with the parallel execution of
one of the programs (GAMESS US version) on our cluster. The error
message is:
TCP connect error: ECONNREFUSED.
TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208.
A fatal error occurred on DDI Process 0.
TCP connect error: ECONNREFUSED.
TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208.
A fatal error occurred on DDI Process 60.
TCP connect error: ECONNREFUSED.
TCP: Connect failed. comp10 -> comp02.chem.strath.ac.uk:42208.
A fatal error occurred on DDI Process 2.
TCP connect error: ECONNREFUSED.
[ ... ]
Eventually, the ddicick tips over and the whole thing crashes. The
program is using rsh (yes, I know, security, I did not install the
cluster!) and I can rsh comp10 -> comp02 and there is no firewall
installed between the nodes (at least, not that I am aware of). Trying
to run the same job with the same number of nodes will fail X times and
at X+1 suddenly work. I could not work out a pattern for that (other
that I get exponentially annoyed). Right now, there is only one gigabit
network connecting the cluster, so nfs, mpi etc. is all running over one
interface (again, I did not set up the cluster).
I have run out of ideas of where to look. I checked (as quickly as
possible) at some nodes with netstat, the ddicick program is acutally
running. Changing to ssh did not solve the problem.
I would appreciate any feedback as it is highly anyoing to wait Y days
to get the job running and then it crashes.
All the best from Glasgow!
Jörg
--
*************************************************************
Jörg Saßmannshausen
Research Fellow
University of Strathclyde
Department of Pure and Applied Chemistry
295 Cathedral St.
Glasgow
G1 1XL
email: jorg.sassmannshausen at strath.ac.uk
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
More information about the Beowulf
mailing list