[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes

Thu Sep 3 13:16:48 PDT 2009

On Wed, Sep 2, 2009 at 11:18 PM, Mark Hahn<hahn at mcmaster.ca> wrote:
>> That brings me to another important question. Any hints on speccing
>> the head-node?
>
> I think you imply a single, central admin/master/head node.  this is a very
> bad idea.  first, it's generally a bad idea to have users on a fileserver.
>  next, it's best to keep cluster-infrastructure
> (monitoring, management, pxe, scheduling) on a dedicated admin machine.
> for 300 compute nodes, it might be a good idea to provide more than one
> login node (for editing, compilation, etc).

To expand on Mark's comment...

I would SPEC >=2 systems for head/masters and either spread the load
of the required services (e.g. management, monitoring and other
sysadmin tasks and put scheduling on the other) OR put all of the
services on a single master and then run a shadow master for
redundancy. I would not put users on either of these systems.

If you were using Perceus.....

I would either create an interactive VNFS capsule (include compilers,
additional libs, etc..) or make a large more bloated compute VNFS
capsule and use that on all of the nodes.

In this scenario, all nodes could run stateless *and* diskful so if
you need to change the number of interactive nodes you can do it with
a simple command sequence:

# perceus vnfs import /path/to/interactive.vnfs
# perceus node set vnfs interactive n000[0-4]

and/or

# perceus vnfs import /path/to/compute.vnfs
# perceus node set vnfs compute n0[004-299]

Have your cake and eat it too. :)

The file system needs to be built to handle the load of the apps. 300
nodes means you can go from the low end (Linux RAID and NFS) to a
higher end NFS solution, or upper end of a parallel file system or
maybe even one of each (NFS and parallel) as they solve some different
requirements.

-- 
Greg Kurtzer
http://www.infiscale.com/
http://www.perceus.org/
http://www.caoslinux.org/