[Beowulf] compute node reboots with bproc/beo tools
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
V D vipuld at gmail.comSun Aug 29 22:11:49 PDT 2004
- Previous message: [Beowulf] PVM or ???
- Next message: [Beowulf] compute node reboots with bproc/beo tools
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi folks, I have a 5-node Beowulf cluster, with 4 identical "compute" nodes (with IDE disk, VIA processor, etc.) & 1 "master" node (more RAM, more powerful VIA processor), connected by an unmanaged ethernet switch. However, if I use either Scyld 28cz7 (version 3.1.9 bproc) software or ClusterMatic 4 (version 4.0.0pre3 of bproc) software and associated beoserv/beoboot tools on the cluster (master node), only 2 of the 4 identical compute nodes come and stay up in the cluster. The other 2 nodes reboot every 2-6 minutes, either during node_up (apparently while insmod/bpsh of some module/library) or after coming up. These 2 nodes stay up fine if I boot them up with on-disk Linux image with networking enabled. However, as soon as I use beo tools to control the booting from a "master" node, they have this strange reboot behavior, and the master realizes the lost connection soon after. The hardware is relatively new (I guess in this case only CPU, RAM and NIC really matter), the BIOS RAM tests succeed every time, the OS images get downloaded via PXE/beoboot and boot phase 2 image fine; but the strange thing is that it is always the same 2 physical compute nodes that fail in this way under both software systems. I have stripped down the config and fstab scripts for the compute nodes to bare minimums. Has anyone seen such behavior before? Any hints on how to debug this problem? Any help will be greatly appreciated to convert my current 3-node into the maximum 5-node cluster! Thanks. V
- Previous message: [Beowulf] PVM or ???
- Next message: [Beowulf] compute node reboots with bproc/beo tools
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
