[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes

Mark Hahn hahn at mcmaster.ca
Wed Sep 2 23:18:20 PDT 2009


> That brings me to another important question. Any hints on speccing
> the head-node?

I think you imply a single, central admin/master/head node.  this is 
a very bad idea.  first, it's generally a bad idea to have users on 
a fileserver.  next, it's best to keep cluster-infrastructure
(monitoring, management, pxe, scheduling) on a dedicated admin machine.
for 300 compute nodes, it might be a good idea to provide more than 
one login node (for editing, compilation, etc).

> Especially the kind of storage I put in on the head
> node. I need around 1 Terabyte of storage. In the past I've uses
> RAID5+SAS in the server.

1 TB is, I assume you know, half a disk these days (ie, trivial).
for a 300-node cluster, I'd configure at least 10x and probably 
100x that much.  (my user community is pretty diverse, though,
and with a wide range of IO habits.)

> Mostly for running jobs that access their I/O
> via files stored centrally.

it would be wise to get some sort of estimates of the actual numbers - 
even the total size of all files accessed by a job and its average 
runtime would let you figure an average data rate.

> For muscle I was thinking of a Nehalem E5520 with 16 GB RAM. Should I

I don't think I'd use such a nice machine for any of fileserver, admin or
login nodes.  for admin, it's not needed.  for login it'll be unused a lot of
the time.  for fileservers, you want to sweat the IO system, not the CPU 
or memory.

> boost the RAM up? Or any other comments. It is tricky to spec the
> central node.

spec'ing a single one may be, but a single one is a bad idea...

> Or is it more advisable to go for storage-box external to the server
> for NFS-stores and then figure out a fast way of connecting it to the
> server. Fiber perhaps?

10G (Cu or SiO2, doesn't matter) is the right choice 
for an otherwise-gigabit cluster.



More information about the Beowulf mailing list