[Beowulf] first cluster

madskaddie at gmail.com madskaddie at gmail.com
Sat Jul 24 11:18:15 PDT 2010

I manage a small cluster with a central image for the execution hosts
(fully decoupled from the master/ login nodes). To deal with direct
access to nodes:
 - Every user has an "*" on the password field of the /etc/shadow file
in the execution hosts images
 - Access through ssh to the exec hosts is enabled to work only with
passwords (no certificate files)
 - Direct access to nodes: gridengine's (GE) qrsh
 - MPI via GE parallel environments

Things to be solved:
 - Monitoring of the resources usage; Now is only possible to query by
using GE qhost or looking at ganglia. But the latency is quite high :/
(anything above instantaneous is high latency)
 - Administration can be boring sometimes because I need to input the
password. I'll study a bit of PAM rules to bypass or learn the tcl
Expect tool (or equivalent libs in other  languages)

Gil Brandão

