[Beowulf] Clic 2.0 lockup problems

Reuti reuti at staff.uni-marburg.de
Wed Oct 27 11:28:11 PDT 2004


Hi,
 
> I just finished installing Clic 2.0 on a cluster of 1 server and 12 nodes.
>  After running the setup_auto_cluster script I got everything installed. 
> I created a "cluster user" and proceeded to test out some of the included
> mpi sample code.  This ran fine.  I next tried to start this code remotely
> (through SSH), but when I did this, the server locked up and had to be
> rebooted.  It actually locked up while connecting via SSH, not when
> executing the sample mpi code.  Any idea what might cause this?  The
> server has 3 network interfaces:
> 
> eth0 - administration
> eth1 - outside (internet)
> eth2 - message passing (computing)
> 
> (The nodes each have 2 interfaces, one for administration and one for
> message passing)
> 
> It also seems that when I logon as the "cluster user" or root, and try to
> access an external website (e.g. google), the server will lockup and need
> to be rebooted again.
> 
> Any idea why I'm experiencing these lockups?  Is something configured
> incorrectly?  Is it a faulty network card?  I was able to access outside
> websites fine before I ran the setup scripts.

you tried to login to the server on all of the three interfaces, to check 
whether it's really completely down and not only one interface? - Reuti



More information about the Beowulf mailing list