[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?

Rahul Nabar rpnabar at gmail.com
Mon Apr 6 09:48:03 PDT 2009

On Mon, Apr 6, 2009 at 7:46 AM, Chris Samuel <csamuel at vpac.org> wrote:
> Even though we could reproduce it on 64-bit Debian
> and 32-bit CentOS they wouldn't escalate the issue
> until we could reproduce it on RHEL5 - which we did
> today.

Thanks for sharing the anecdote Chris. I wonder if there is any clause
in the contracts restricting us to run certain OS's.

So long as we are using a "reputable" well-tested OS I find it unfair
that the vendors engage in so much arm-twisting. Is there any
scientific evidence that the core kernels of Debian or Fedora or CenOS
(that *is* essentially RHEL isn't it?) are any less reliable then
RHEL? What is / are the distros of choice on the Beowulf community?
Just getting a feel.

I have tried out cutting edge distros meant for scientific
applications like ScientificLinux or ComputeNodeLinux but I've found
it more practical to stick to a larger, well used distro. No doubt I
might take some performance cuts on the benchmarks but the simple
reality of a larger user community out there makes it easier to debug
stuff and get well-tested apps and code that will run on my Distro
out-of-the box.


