[Beowulf] Repeated Dell SC1435 crash / hang. How to get the vendor to resolve the issue when 20% of the servers fail in first year?

Gerry Creager gerry.creager at tamu.edu
Mon Apr 6 12:51:41 PDT 2009

Dell does have experience... and contracts... with RH for RHEL.  They do 
not recognize CentOS officially.

Rahul Nabar wrote:
> On Mon, Apr 6, 2009 at 7:46 AM, Chris Samuel <csamuel at vpac.org> wrote:
>> Even though we could reproduce it on 64-bit Debian
>> and 32-bit CentOS they wouldn't escalate the issue
>> until we could reproduce it on RHEL5 - which we did
>> today.
> Thanks for sharing the anecdote Chris. I wonder if there is any clause
> in the contracts restricting us to run certain OS's.

Dell have told us they won't support us or accept reports of problems 
running anything but RHEL on our cluster.  We therefore have one admin 
who has been brainwashed into believing RHEL is actually spelled 
"CentOS".  He calls in hardware problems and can tell them we're running 
what he believes is RHEL...

> So long as we are using a "reputable" well-tested OS I find it unfair
> that the vendors engage in so much arm-twisting. Is there any
> scientific evidence that the core kernels of Debian or Fedora or CenOS
> (that *is* essentially RHEL isn't it?) are any less reliable then
> RHEL? What is / are the distros of choice on the Beowulf community?
> Just getting a feel.

I tend to run CentOS.  I also tend to upgrade packages on my cluster no 
more frequently than annually. But that's often too frequent, so once 
every two years or so would seem closer to accurate.  After I get it 
working, I want the compute nodes stable, not necessarily current.

> I have tried out cutting edge distros meant for scientific
> applications like ScientificLinux or ComputeNodeLinux but I've found
> it more practical to stick to a larger, well used distro. No doubt I
> might take some performance cuts on the benchmarks but the simple
> reality of a larger user community out there makes it easier to debug
> stuff and get well-tested apps and code that will run on my Distro
> out-of-the box.

SL isn't much different from CentOS save a couple of extra packages that 
make its core users' lives a little less interesting after initial 
install by putting things in place they're most likely to need. 
Otherwise, they look a lot like CentOS or any other stable Enterprise 


More information about the Beowulf mailing list