[Beowulf] number of admins
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Brian D. Ropers-Huilman bropers at cct.lsu.eduWed Jun 8 10:23:19 PDT 2005
- Previous message: [Beowulf] number of admins
- Next message: [Beowulf] number of admins
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 A 1,024 node cluster is sizable. However, given that you are running Rocks, it is likely that you can get by with one sysadmin, so long as they know their way around the HPC world. But, I wouldn't stop there. I would augment that sysadmin with what I term a Scientific Computing support person who could aid in the admin, but is mostly responsible for the software stack and optimizing communications and storage for the users. I have a 512 node Linux cluster, a 128 node Linux cluster, a 32 node Apple cluster, a 16 node Linux cluster, and a new 32 processor Altix/Prism, which is on the way. I have a total of 3 sysadmins and currently only 1 Sci. Comp. support person, though I still have an opening there. I also have an opening for a true Help Desk support person to handle the more mundane aspects of serving user's needs on these systems. This staff works with ~5 undergraduate student workers as well, though that number could be smaller or further augmented by a couple of good graduate students. Your 1,024 node system will likely never run a 2,048 processor job, other than your initial HPL if you pursue that. I say this because there _will_ be hardware issues. I do not have experience with Dell's HPC systems, but I know George Jones is doing a heck of a job getting them out there so I have to believe they work well. In terms of the Myrinet and other software, yes, the system should be quite stable given today's software stacks. You ask about non-obvious skill sets. I would bring in someone who's good at scripting, which is not necessarily something a sysadmin will have. Any sysadmin will be able to do some level of scripting, but you'll want someone who is quite skilled in this area. This person can help you automate processes on the system such as: name space management, additional usage reports, disk scrubbers, automatic documentation of the installed software, and the like. We do everything via LDAP and have a series of command-line PHP scripts for managing user space and other things. I'd be willing to talk more off-line too if you're interested. David Kewley said the following on 2005.06.06 17:23: > Hi all, > > We expect to get a large new cluster here, and I'd like to draw on the > expertise on this list to educate management about the personnel > needed. > > The cluster is expected to be: > > ~1000 Dell PE1850 dual CPU compute nodes > master & other auxiliary nodes on similar hardware > 1024-port Myrinet > Nortel stacked-switches-based GigE network > many-TB SAN built on Data Direct & Ibrix > Platform Rocks > Platform LSF HPC Rocks roll > Moab added later, quite possibly > tape library backup (software TBD) > NFS service to public workstations > nine man-weeks of Dell installation support > 10 man-days of Ibrix installation support > > The users will be something like: > > ~10 local academic groups, perhaps 60 users total > several different locally-written or -customized codebases > at least one near-real-time application with public exposure > > We have some experience already with a 160-node Dell cluster that has > some of the basic elements listed above, but several of the pieces will > be totally new, and some of the pieces we already have will need > greater care. > > My questions to you are: > > * How many sysadmins should we plan to have once the cluster is stable? > * Is there indeed any such thing as a "stable" cluster of this sort, and > if so, should we get additional help during the initial phase of the > project, when things are less stable (help beyond the vendor > installation support listed above)? > * If we need more help in the initial phases, how might we go about > finding people? Contract workers? Commercial or private > consultancies? > * Should we look for any specific non-obvious skillset, or would skilled > sysadmins be adequate? > > And finally: > > * If we only have one sysadmin, someone who is bright and capable, but > is learning as they go, is that too small a support staff? > * If one such sysadmin is too little, then what would you expect the > impact on the users to be? > > I have been giving my opinion to management, but I'd really like to get > (relatively unbiased) professional opinions from outside as well. I > thank you for any comments you can make! > > David > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf - -- Brian D. Ropers-Huilman .:. Asst. Director .:. HPC and Computation Center for Computation & Technology (CCT) bropers at cct.lsu.edu Johnston Hall, Rm. 350 +1 225.578.3272 (V) Louisiana State University +1 225.578.5362 (F) Baton Rouge, LA 70803-1900 USA http://www.cct.lsu.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCpymGwRr6eFHB5lgRAs+DAKCUyh4nBq5AecBpqlQLNu/cEsn2RACg+Vwq aXqIFfr70DqO/40lOyQl93E= =8lgx -----END PGP SIGNATURE-----
- Previous message: [Beowulf] number of admins
- Next message: [Beowulf] number of admins
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
