[Beowulf] RAID for home beowulf

Greg Kurtzer gmkurtzer at gmail.com
Mon Oct 12 11:50:10 PDT 2009


On Sun, Oct 11, 2009 at 1:29 PM, Nifty Tom Mitchell
<niftyompi at niftyegg.com> wrote:
> On Mon, Oct 05, 2009 at 12:02:11PM +0200, Tomislav Maric wrote:
>> Nifty Tom Mitchell wrote:
>> > On Sun, Oct 04, 2009 at 01:08:27PM +0200, Tomislav Maric wrote:
>> >> Mark Hahn wrote:
>> >>>> I've seen Centos mentioned a lot in connection to HPC, am I making a
>> >>>> mistake with Ubuntu??
>> >>> distros differ mainly in their desktop decoration.  for actually
>> >>> getting cluster-type work done, the distro is as close to irrelevant
>> >>> as imaginable.  a matter of taste, really.  it's not as if the distros
>> >>> provide the critical components - they merely repackage the kernel,
>> >>> libraries, middleware, utilities.  wiring yourself to a distro does
>> >>> affect when you can or have to upgrade your system, though.
>> >>>

I seem to not have the original sources to this thread, but this is
something that I thought I should chime in on.

The underlying components that make up a distribution are in-fact an
important component to an HPC system in its entirety. There are many
reasons for this, but I will focus on just a few that I hope don't
strike too much of a religious chord with people while at the same
time letting me rant a bit. ;-)



1) HPC people are quite familiar with building their scientific apps
with optimized compilers and libraries. If an application is linking
against any OS libraries (yes, including the C library) it would
probably make sense to make sure those have been compiled with an
optimal build environment. Most distributions do not do this, as for a
single standalone system the results may or not even be noticeable. I
have been part of large benchmark projects to evaluate the differences
of the distributions. In a nutshell, differences become more obvious
at scale.

2) Distributions focused on non-HPC targets may not include tools,
libraries or even functions that would be beneficial for HPC. And in
addition to that, they may not be included in a way that makes it very
usable. For example, Just because a distribution contains a package
does not mean that is what people should use. For example, using a
distribution supplied version of Open MPI would be an injustice to the
majority of cluster users, but many distributions consider themselves
HPC ready because they have some HPC capable libs.

It is more important to have a solution for creating a suitable HPC
environment. For example, in Caos NSA the core OS is RPM based but we
also utilize a source based "ports-like" tree for building scientific
packages, and then we integrate with Environment Modules to make them
available for the users. So one can do:

# cd /usr/src/cports/packages/openmpi/1.3.3
# make install COMPILERS=intel
# make clean
# make install COMPILERS=gcc
# su - user
$ module load openmpi/1.3.3-intel
$ mpicc -show
icc ........
$ module unload openmpi
$ module load openmpi/1.3.3-gcc
$ mpicc -show
gcc ........


3) HPC distributions should be focused on being lightweight and
efficient. Bloat free stateless environments are important for keeping
node operating systems quiet and supportive of HPC code. Lightweight
and bloat free does not mean an ancient and featureless core
environment either.

4) Even the kernel for most distributions is tuned for desktop use
which tries to give the fairest share of CPU time to all processes
(obviously not HPC supportive).

5) Clusters are not worth BETA quality code. Unstable environments
with no long term plans for upstream support makes it a ridiculous
solution for anybody trying to build a production environment. The
number of unsuitable solutions that we had to "rescue" because they
were running Fedora and totally unmaintainable by the people that
integrated them is just silly.


It really is a shame that religion plays such a big part of what OS
someone would use because each OS (even non-Linux) are good for
certain things. I can understand wanting to leverage an economy of
scale with a homogenous environment, but there is a particular point
where the economy of scale is no longer justified when shoe-horning a
non-suitable solution onto a cluster. Where that line is really
depends on the admins, users, the size of the system, and what they
baseline their benchmark for success at.

Just my $0.02 for what it's worth.

Greg


-- 
Greg M. Kurtzer
Chief Technology Officer
HPC Systems Architect
Infiscale, Inc. - http://www.infiscale.com




More information about the Beowulf mailing list