cluster frustrations

Peter Lindgren Peter.Lindgren at
Wed Jan 16 07:07:26 PST 2002

Last year I began trying to port some of my applications to a cluster. Those applications are very partitionable, so it seemed like a great opportunity. I did a lot of Internet research looking for easy solutions for building clusters. Not having found compendium sites yet, I did it the hard way, slowly building up a  list of candidate packages. Scyld was pretty easy to find and also easy to get from LinuxCentral or Cheapbytes. I discovered Oscar and Rocks and SCE and IBM's CSK, and downloaded them all. I studied an MPI primer, and adapted my code for a cluster. I wanted to install different cluster systems, try my particular application and really see which one worked best for me.

I got 10 PCs diverted to the attempt. Our networking people furnished a switch. A guy from our help desk hooked them all up. Then I was on my own...

BUT, I'm not a Unix/Linux administrator. Even though I've installed and played with Linux a number of times on workstations both at home and at work, I've had a lot of trouble getting the cluster working.  I've learned a lot and had some minor successes, but it still just seems too hard.

I found the mailing lists for some of these packages. I've followed them with interest and have sometimes gotten perfect to-the-point help (and sometimes no response at all.) I got Scyld Beowulf running (with occasional help from a couple of Unix admins as well as from some guys on the Beowulf list), enough to show that my application could work on a cluster.  Each time I tried to install another or later version, however, I had more problems. Right now, things aren't stable and my application often bombs before finishing, although it worked before.

I've tried to install Rocks a number of times. I got through (once) to where the compute nodes were up, but I haven't been able to get the latest version to work yet. In fairness, I haven't tried contacting their list even though they seem willing to help - I'm just too discouraged or shy I guess. 

It sure doesn't look like I will get multiple systems installed to do my comparisons. I haven't been able to find any published reviews or comparisons either.

So here are my pleas: 

USER community: has anyone independently tested these systems? In particular, paying attention to their ease-of-installation and configuration by those who aren't Unix experts. I'm sure the various groups are TRYING to make installation/configuration as simple as possible, but how far have they gotten?

DEVELOPER community: There are potential users out there who would benefit from cluster computing, but who aren't Unix experts themselves, and don't have such an available expert on staff. I'm not saying a completely non-technical user should be able to do this, but how about a reasonably intelligent engineer/scientist/programmer?

EVERYONE: should I:
just stop expecting to be able to do this myself as a non-admin?
stop expecting such systems will just work when you put in the CD?
get used to banging my head on the wall for a few days or weeks?
get over my reluctance to keep asking for help on the lists?
get an admin devoted to my project?

A reference showing how many OTHER people can manage to install clusters:
proving I must be the village idiot.

Some references with links to multiple systems:

Papers by individual project teams that discuss other projects:

P.S. The best analogy I've made to how this feels is attempting to fix your own car. It seems promising, you've maybe done a few things in the past that worked. Now, you've gotten in big trouble. You really ought to just take it in to your mechanic and hope they can fix it (but it's so embarrassing!) And, really you're so mad you just want to take it out on the car - maybe push it off a nearby cliff...

More information about the Beowulf mailing list