[Beowulf] Building new cluster - estimate

Bill Broadley bill at cse.ucdavis.edu
Tue Jul 29 10:14:07 PDT 2008

Bogdan Costescu wrote:
> On Tue, 29 Jul 2008, Chris Samuel wrote:
>> 1) Use a mainline kernel, we've found benefit of that
>> over stock CentOS kernels.
> Care to comment on this statement ?

2.6.18 (RHEL-5.2) is currently almost 2 years old.  One improvement since then 
that I use heavily is ECC scrubbing, I don't like to have RAID arrays without 
it, silent errors can accumulate otherwise.  It's also created a ugly nest of 
backports inside and outside of redhat.  So things like sky2 gigE adapters are 
ugly to support (and don't have a driver disk), and are especially hard to fix 
when you have to modify the installer (CD or PXE) to work.  I've seen similar 
with intel e1000s (which are always changing), infinipath, areca cards, etc.

There have also been tweaks for NUMA, quad core, and related.  I'm guessing 
that's why, er, one of the largest new clusters went with Fedora (TAC?).

In general I'd say that the new kernels do much better on modern hardware than 
the ugly situation of downloading a random RPM, or waiting for official 
support.  Seems like quite a few companies (ati, 3ware, areca, intel, amd, and 
many others I'm sure) are trying hard to improve the mainline kernel drivers.

I understand why RHEL doesn't change the kernel (stability, testing, etc.), 
but not sure it's the best fit for HPC type applications, especially with the 
pace of hardware changes these days.

