cluster frustrations
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joachim Worringen joachim at lfbs.RWTH-Aachen.DEThu Jan 17 00:12:24 PST 2002
- Previous message: cluster frustrations
- Next message: cluster frustrations
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Patrick Geoffray wrote: > > Joachim, > > Joachim Worringen wrote: > > But they don't get it to run reliably with > > the current Linux/GM/MPICH versions which of course should run faster, > > better, nicer. I don't blame Linux or Myrinet for these problems - > > Obviously, you do. Inciting another flame war ? No, I never intend to incite flame wars, but discussions. I can tell you a lot of stories about mal-functioning self-made SCI clusters, but I have no hands-on experience with such a cluster being operated in a similar (production) environment, because such customers usually chose Scali-made systems. And I prefer to talk about hands-on experience, not second-hand stories. The Scali-equipped systems I know of run well now, although this hasn't always been like this (mostly due to bugs/strange features in the last generation hardware, LC2). But Scali systems, to stick with these, are well-defined platforms, running qualified kernels etc., which (if not using such) is one source of problems. [...] > So if you really experienced problems with this machine, please > contact help at myri.com, this is the first step toward happiness. I had reproducable application aborts when running PMB with 32 processes. I informed Ulrich Detert about this, and he confirmed the problems. Up to now, they stick with 2.2 (which runs stable, but not as fast it could), which does *not* mean, that such a system wouldn't work with 2.4 and current GM - it's only that these guys did try to find that "golden configuration" during their update (or by chance did hit the one dirty configuration) and didn't succeed. Once again: I don't doubt that there do exist Myrinet systems which run perfectly. There just may be a lot of chances (with self-made clusters in general) to make mistakes, hindering stable operation. > You cannot compare Crays/SP2 with do-it-yourself Linux clusters. Exactly. Paying less money means investing more time. Which may be equivalent to money. Joachim -- | _ RWTH| Joachim Worringen |_|_`_ | Lehrstuhl fuer Betriebssysteme, RWTH Aachen | |_)(_`| http://www.lfbs.rwth-aachen.de/~joachim |_)._)| fon: ++49-241-80.27609 fax: ++49-241-80.22339
- Previous message: cluster frustrations
- Next message: cluster frustrations
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
