[Beowulf] recommendations for a good ethernet switch for connecting ~300 compute nodes

Rahul Nabar rpnabar at gmail.com
Thu Sep 3 03:59:37 PDT 2009


On Thu, Sep 3, 2009 at 1:18 AM, Mark Hahn<hahn at mcmaster.ca> wrote:
Thanks a lot for all the great comments, guys!

> I think you imply a single, central admin/master/head node.  this is a very
> bad idea.  first, it's generally a bad idea to have users on a fileserver.
>  next, it's best to keep cluster-infrastructure
> (monitoring, management, pxe, scheduling) on a dedicated admin machine.
> for 300 compute nodes, it might be a good idea to provide more than one
> login node (for editing, compilation, etc).

Absolutely.  I ought to use the term "head node(s)" What I want to
spec and figure out is how many central machines are warranted and how
I should differentially configure each.

>
> 1 TB is, I assume you know, half a disk these days (ie, trivial).
> for a 300-node cluster, I'd configure at least 10x and probably 100x that
> much.  (my user community is pretty diverse, though,
> and with a wide range of IO habits.)

We have a different long term store. So this machine is only holding
running, staging and other jobs. Users are warned that data is not
backed up and subject to periodic flushing.

Yet, you are right. I was being overly stingy. Bad estimate. I have a
similar smaller cluster and I double checked usage on that one just
now. If I scale that up to 300 nodes I should probably be shooting for
4.5 to 5 Terabytes of storage.

>
> I don't think I'd use such a nice machine for any of fileserver, admin or
> login nodes.  for admin, it's not needed.  for login it'll be unused a lot
> of
> the time.  for fileservers, you want to sweat the IO system, not the CPU or
> memory.

Yes, I used it for lack of knowledge of a more suitable but puny
candidate. Any suggestions on a more puny machine? Besides
ovevrspeccing the proc central node doesn't change my cost much
relative to entire cluster.

>
> 10G (Cu or SiO2, doesn't matter) is the right choice for an
> otherwise-gigabit cluster.
>

10 G storage node to switch, and alternatively 10 G storage-box-switch, correct?

-- 
Rahul




More information about the Beowulf mailing list