cluster frustrations

Patrick Geoffray patrick at
Wed Jan 16 16:35:43 PST 2002


Joachim Worringen wrote:
> But they don't get it to run reliably with
> the current Linux/GM/MPICH versions which of course should run faster,
> better, nicer. I don't blame Linux or Myrinet for these problems -

Obviously, you do. Inciting another flame war ?

I have searched in the log of the Myricom support and I found one 
help ticket from Ulrich Detert (help ticket #7197, Wed Jun 13 15:07:59 
2001) with the configuration that you describe and a piece of MPI code 
supposed to trigger a malfunction. This code was run the same day with 
recent GM and MPICH-GM releases and shown no problem whatsoever. I have 
tried a few minutes ago with the current software, and again no 
problem. The help ticket was closed Fri Sep 14 10:13:28 2001 with no 
reply from the customer.
So if you really experienced problems with this machine, please 
contact help at, this is the first step toward happiness.

> just want to show that even people capable of running Crays, SP-2s,
> Paragon, any kind of workstatons etc. have a hard time setting up and
> maintaining a Linux cluster. And the next update is usually the next

You cannot compare Crays/SP2 with do-it-yourself Linux clusters. When 
you buy a Cray or a SP(2,3), you get a machine that experts build for you, 
you get softwares that experts install for you, you get often someone 
on-site to take your hand the first month or even during the life of the 
machine. The only problem is that you pay a lot for that.

Linux clusters are not easy to install, it's wrong to believe they are. 
To have access to the Myricom support archive, I can tell you that a 
large number of problems are related to customers trying the 
do-it-yourself way, with no cluster experience, only Windows background, 
who do not know exactely what's a kernel and how to compile one, who 
believe that Redhat is pure Linux and have never heard about the MPI 
Not surprisingly, customers using a third party, either a big vendor 
like IBM or a small one like many people on this list, where people 
know what they are doing, have usually a much smoother experience. 
But you still have to pay a little for that.

So the do-it-yourself way ? Why not, but if it fails, call the mechanics.


|   Patrick Geoffray, Ph.D.      patrick at 
|   Myricom, Inc.      
|   Cell:  865-389-8852          685 Emory Valley Rd (B)
|   Phone: 865-425-0978          Oak Ridge, TN 37830

More information about the Beowulf mailing list