[Beowulf] number of admins

David Kewley kewley at gps.caltech.edu
Mon Jun 6 15:23:19 PDT 2005

Hi all,

We expect to get a large new cluster here, and I'd like to draw on the 
expertise on this list to educate management about the personnel 

The cluster is expected to be:

~1000 Dell PE1850 dual CPU compute nodes
master & other auxiliary nodes on similar hardware
1024-port Myrinet
Nortel stacked-switches-based GigE network
many-TB SAN built on Data Direct & Ibrix
Platform Rocks
Platform LSF HPC Rocks roll
Moab added later, quite possibly
tape library backup (software TBD)
NFS service to public workstations
nine man-weeks of Dell installation support
10 man-days of Ibrix installation support

The users will be something like:

~10 local academic groups, perhaps 60 users total
several different locally-written or -customized codebases
at least one near-real-time application with public exposure

We have some experience already with a 160-node Dell cluster that has 
some of the basic elements listed above, but several of the pieces will 
be totally new, and some of the pieces we already have will need 
greater care.

My questions to you are:

* How many sysadmins should we plan to have once the cluster is stable?
* Is there indeed any such thing as a "stable" cluster of this sort, and 
if so, should we get additional help during the initial phase of the 
project, when things are less stable (help beyond the vendor 
installation support listed above)?
* If we need more help in the initial phases, how might we go about 
finding people?  Contract workers?  Commercial or private 
* Should we look for any specific non-obvious skillset, or would skilled 
sysadmins be adequate?

And finally:

* If we only have one sysadmin, someone who is bright and capable, but 
is learning as they go, is that too small a support staff?
* If one such sysadmin is too little, then what would you expect the 
impact on the users to be?

I have been giving my opinion to management, but I'd really like to get 
(relatively unbiased) professional opinions from outside as well.  I 
thank you for any comments you can make!


