[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes
hahn at mcmaster.ca
Wed Sep 2 23:18:20 PDT 2009
> That brings me to another important question. Any hints on speccing
> the head-node?
I think you imply a single, central admin/master/head node. this is
a very bad idea. first, it's generally a bad idea to have users on
a fileserver. next, it's best to keep cluster-infrastructure
(monitoring, management, pxe, scheduling) on a dedicated admin machine.
for 300 compute nodes, it might be a good idea to provide more than
one login node (for editing, compilation, etc).
> Especially the kind of storage I put in on the head
> node. I need around 1 Terabyte of storage. In the past I've uses
> RAID5+SAS in the server.
1 TB is, I assume you know, half a disk these days (ie, trivial).
for a 300-node cluster, I'd configure at least 10x and probably
100x that much. (my user community is pretty diverse, though,
and with a wide range of IO habits.)
> Mostly for running jobs that access their I/O
> via files stored centrally.
it would be wise to get some sort of estimates of the actual numbers -
even the total size of all files accessed by a job and its average
runtime would let you figure an average data rate.
> For muscle I was thinking of a Nehalem E5520 with 16 GB RAM. Should I
I don't think I'd use such a nice machine for any of fileserver, admin or
login nodes. for admin, it's not needed. for login it'll be unused a lot of
the time. for fileservers, you want to sweat the IO system, not the CPU
> boost the RAM up? Or any other comments. It is tricky to spec the
> central node.
spec'ing a single one may be, but a single one is a bad idea...
> Or is it more advisable to go for storage-box external to the server
> for NFS-stores and then figure out a fast way of connecting it to the
> server. Fiber perhaps?
10G (Cu or SiO2, doesn't matter) is the right choice
for an otherwise-gigabit cluster.
More information about the Beowulf