[Beowulf] Clic 2.0 lockup problems

Timo Mechler mechti01 at luther.edu
Tue Oct 26 22:09:48 PDT 2004

Hi all,

I just finished installing Clic 2.0 on a cluster of 1 server and 12 nodes.
 After running the setup_auto_cluster script I got everything installed. 
I created a "cluster user" and proceeded to test out some of the included
mpi sample code.  This ran fine.  I next tried to start this code remotely
(through SSH), but when I did this, the server locked up and had to be
rebooted.  It actually locked up while connecting via SSH, not when
executing the sample mpi code.  Any idea what might cause this?  The
server has 3 network interfaces:

eth0 - administration
eth1 - outside (internet)
eth2 - message passing (computing)

(The nodes each have 2 interfaces, one for administration and one for
message passing)

It also seems that when I logon as the "cluster user" or root, and try to
access an external website (e.g. google), the server will lockup and need
to be rebooted again.

Any idea why I'm experiencing these lockups?  Is something configured
incorrectly?  Is it a faulty network card?  I was able to access outside
websites fine before I ran the setup scripts.

Thanks in advance for your help.


-Timo Mechler

Timo R. Mechler
mechti01 at luther.edu

