[Beowulf] cluster softwares supporting parallel CFD computing

Mark Hahn hahn at physics.mcmaster.ca
Mon Sep 4 11:11:30 PDT 2006


> through 10/100/1000baseT Gigabit Ethernet. Initially we have decided
> to setup cluster of 1 master and 3 compute nodes. Thus totaling 16

I think you should consider making the head node separate - in this case,
I'd go with fewer cores, more disk and perhaps less memory.  you _can_
get away without a distinct head (ie, also run compute on the login node),
but this is likley to be a problem unless you have very rare logins.

fundamentally, I think of cluster nodes having several distinct roles:

1. compute nodes.  of course!  likley the best cpu and memory resources.
there's no reason why these can't be tuned quite differently as well - 
for instance, 100 Hz scheduler ticks might make sense; certainly fewer 
daemons, sysctls, etc.

2. login node(s).  login workloads are quite different from compute, and 
pretty interfering.  you don't really want to trash your CFD performance 
just because some of the worker processes happen to be competing with 
someone running gcc or top.

3. admin node(s).  if you can, it's quite nice to have separate admin nodes,
which are not available for user login.  this can probably lead to some 
security and/or stability benefits, as well.  functionally, it could be wise
to split up further into fileservice, DB-service, monitoring and scheduling,
though not all of these want or need distinct nodes.

> monitoring. I have studied a bit about "scyld".
> Could any of tell me about that? I am little convinced with "scyld". I

Scyld is a commercial product based on a pretty elegant concept:
processes can be migrated among compute nodes, and in most ways appear
as if they're running on the head node.  this is pretty unusual for a 
cluster, where your only contact with jobs is normally through the 
queueing system.

> I wonder to see too many terms about "scyld" like. Are they
> same/integrated in one package or different ones?

Scyld is a single commercial product which represents a particular kind of
beowulf cluster.  (I use "beowulf" here, not capitalized, to refer to the 
generic concept of off-the-shelf, linux-based compute cluster.)

> SHOULD we use the master node for performing computational work along
> with cluster-handling?

depends.  cluster-handling should have trivial overhead, but if you let
users login, compile, etc, it could be significant.  it's often extremely
damaging to parallel performance if workers do not get CPU when they need it,
since you can wind up with all the other workers waiting on the one worker
which happens to have its CPU stolen by a user's gcc/etc.

> Could any of you please suggest me full software list staring from OS
> to clustering software, Message Passing Libraries, Fortran 95
> compilers and mathematical libraries, to fully establish our Beowulf
> cluster for parallel CFD computing. We would be interested in buying
> the softwares.

if you really, really want to pay, you should pay for someone to build 
and integrate the whole thing for you, not just pay for software.  none of 
these software components need cost anything, though.



More information about the Beowulf mailing list