[Beowulf] Why Do Clusters Suck?

Stuart Midgley stuart.midgley at anu.edu.au
Tue Mar 22 15:34:08 PST 2005

On 22/03/2005, at 7:36, Douglas Eadline - ClusterWorld Magazine wrote:
> So why do clusters suck?

 From my position, this issue is really complex.  In the Australian 
scene, the main reason "clusters suck" has nothing to do with distros, 
hardware or associated software.  It is more an issue with support 
staff.  It is easy to buy hardware, software and download a distro.  
However, it is very difficult to get good support staff.

Clusters, by their nature and design, are not simple beasts.  When 
everything is running well, you can manage them with almost no staff.  
However, when something goes wrong the diagnostic/resolution cycle can 
be long and very complex.

An error in an MPI program could be the actual user code, the MPI 
layer, a system software issue, the interconnect, some hardware failure 
or a combination of all three.  Getting good staff to understand and 
handle all these layers is difficult.  Spending $100k will get you a 
reasonable sized cluster on the floor within a few weeks, which will 
last say 3 years.  Yet, in the staff space $100k doesn't even get a 
good system administrator for a single year.  And, a system 
administrator is not always what is required.  They may not have a good 
understanding of MPI/applications etc.

How to make clusters less sucky?  Well, for a large cluster 
users/system administrators, decent training would be a good start.  
Training which takes people through the process of building, 
installing, breaking and fixing a cluster.  Of course, then there is 
the MPI/application side of things which would be another course.  Try 
to wrap 10years worth of system/computational experience up into a 5 
days course ;)


   Dr Stuart Midgley                   |  stuart.midgley at anu.edu.au
   Supercomputer Facility              |  smidgley at netspace.net.au
   Leonard Huxley Building 56          |  +61 (0)2 6125 5988   Work
   Australian National University      |  +61 (0)2 6125 8199   Fax
   CANBERRA   ACT   0200               |  +61 (0)4 1125 2488   Mob

More information about the Beowulf mailing list