[Beowulf] Clic 2.0 lockup problems
reuti at staff.uni-marburg.de
Wed Oct 27 11:28:11 PDT 2004
> I just finished installing Clic 2.0 on a cluster of 1 server and 12 nodes.
> After running the setup_auto_cluster script I got everything installed.
> I created a "cluster user" and proceeded to test out some of the included
> mpi sample code. This ran fine. I next tried to start this code remotely
> (through SSH), but when I did this, the server locked up and had to be
> rebooted. It actually locked up while connecting via SSH, not when
> executing the sample mpi code. Any idea what might cause this? The
> server has 3 network interfaces:
> eth0 - administration
> eth1 - outside (internet)
> eth2 - message passing (computing)
> (The nodes each have 2 interfaces, one for administration and one for
> message passing)
> It also seems that when I logon as the "cluster user" or root, and try to
> access an external website (e.g. google), the server will lockup and need
> to be rebooted again.
> Any idea why I'm experiencing these lockups? Is something configured
> incorrectly? Is it a faulty network card? I was able to access outside
> websites fine before I ran the setup scripts.
you tried to login to the server on all of the three interfaces, to check
whether it's really completely down and not only one interface? - Reuti
More information about the Beowulf